JP2019109615A

JP2019109615A - Classification device, learning device, classification method, learning method, and computer program

Info

Publication number: JP2019109615A
Application number: JP2017240937A
Authority: JP
Inventors: 盛朗佐々木; Morio Sasaki
Original assignee: LAWSON Inc
Current assignee: LAWSON Inc
Priority date: 2017-12-15
Filing date: 2017-12-15
Publication date: 2019-07-04
Anticipated expiration: 2037-12-15
Also published as: JP6642858B2

Abstract

To provide a classification device, a learning device, a classification method, a learning method, and a computer program capable of further easily classifying information.SOLUTION: A classification device includes: a morpheme analysis unit which divides a character string into morphemes; and a first classification information determination unit which performs prescribed machine learning on the basis of a plurality of pieces of training data including the character strings and classification results of the character strings so as to obtain first learning result data from a learning result data storage unit which records the first learning result data including frequency by which the first classification information representing the feature of the character string has been given to the character string, and determines the first classification information of the character strings on the basis of the first learning result data and the morphemes.SELECTED DRAWING: Figure 1

Description

本発明は、分類装置、学習装置、分類方法、学習方法及びコンピュータプログラムに関する。 The present invention relates to a classification device, a learning device, a classification method, a learning method, and a computer program.

商品の売買を行うことで発行されるレシートには、商品名や販売額が記載されている。収集されたレシートから商品名や販売額等の商品に関する情報を取得し、分類することで、競合他社の商品の売れ行きや、売上高を把握することができる。そこで、商品に関する情報を分類した結果を記憶するためのマスタが構築・維持されてきた。 The receipt issued by buying and selling the item describes the item name and the sales amount. By acquiring information on products such as product names and sales amounts from the collected receipts and classifying them, it is possible to grasp the sales and sales of products of competitors. Therefore, a master has been constructed and maintained for storing the result of classifying information on goods.

特開２０１１−１０３０３８号公報JP, 2011-103038, A

しかしながら、上記の方法では、新商品が発売される都度、マスタを更新する必要があるため、マスタの構築・維持には多くの労力がかかる場合がある。また、レシート毎にカタカナ表記又はひらがな表記等の表記揺れ等によって、同じ商品であっても異なる商品として間違えて分類される場合もある。このような場合、マスタを更新したり、商品の分類間違いを個別に修正したりしなければ、商品に関する情報を正しく分類することができない場合があった。 However, in the above method, since it is necessary to update the master each time a new product is put on the market, construction and maintenance of the master may take a lot of effort. In addition, even if the same product is classified as a different product, it may be erroneously classified as a different product due to writing fluctuation such as katakana writing or hiragana writing for each receipt. In such a case, there has been a case where the information on the product can not be correctly classified unless the master is updated or the classification error of the product is corrected individually.

上記事情に鑑み、本発明は、より簡単に情報を分類できる分類装置、学習装置、分類方法、学習方法及びコンピュータプログラムを提供することを目的としている。 In view of the above circumstances, the present invention has an object to provide a classification device, a learning device, a classification method, a learning method, and a computer program that can classify information more easily.

本発明の一態様は、文字列を形態素に分割する形態素解析部と、文字列と前記文字列の分類結果とを含む複数の訓練データに基づいて所定の機械学習をすることで、前記文字列の特徴を表す第１分類情報がこれまで文字列に付与されてきた頻度を含む第１学習結果データを記録する学習結果データ記憶部から前記第１学習結果データを取得し、前記第１学習結果データと前記形態素とに基づいて、前記文字列の前記第１分類情報を決定する第１分類情報決定部と、を備える分類装置である。 In one aspect of the present invention, the character string is obtained by performing predetermined machine learning based on a plurality of training data including a morphological analysis unit that divides a character string into morphemes, and a character string and a classification result of the character string. The first learning result data is acquired from the learning result data storage unit that records the first learning result data including the frequency at which the first classification information representing the feature of the character string has been added to the character string so far, and the first learning result And a first classification information determination unit that determines the first classification information of the character string based on data and the morpheme.

本発明の一態様は、上記の分類装置であって、前記形態素のうちいずれか２つを含む形態素ペアを、所定の条件に基づいて決定する形態素ペア決定部を更に備え、前記第１分類情報決定部は、前記訓練データに基づいて所定の機械学習をすることで、前記訓練データの形態素ペアに含まれる形態素に基づいて、前記第１分類情報が、これまで文字列に付与されてきた回数を含む第２学習結果データを記録する前記学習結果データ記憶部から前記第２学習結果データを取得し、前記第２学習結果データと前記形態素ペアとに基づいて前記文字列の前記第１分類情報を決定する。 One embodiment of the present invention is the classification device described above, further comprising a morpheme pair determination unit that determines a morpheme pair including any two of the morphemes based on a predetermined condition, and the first classification information The determination unit performs predetermined machine learning based on the training data, and the number of times the first classification information has been given to the character string based on morphemes included in morpheme pairs of the training data. Acquiring the second learning result data from the learning result data storage unit that records the second learning result data including the second learning result data, and based on the second learning result data and the morpheme pair, the first classification information of the character string Decide.

本発明の一態様は、上記の分類装置であって、前記訓練データに基づいて所定の機械学習をすることで、決定された第１分類情報に対応付けられた複数の第２分類情報のうち、前記文字列の特徴を表す第２分類情報を決定するための第３学習結果データを記録する前記学習結果データ記憶部から前記第３学習結果データを取得し、前記第３学習結果データと前記形態素とに基づいて、前記第２分類情報を決定する第２分類情報決定部をさらに備える。 One embodiment of the present invention is the classification device described above, wherein, by performing predetermined machine learning based on the training data, a plurality of second classification information items associated with the determined first classification information. Acquiring third learning result data from the learning result data storage unit that records third learning result data for determining second classification information representing characteristics of the character string, and the third learning result data and the third learning result data The information processing apparatus further includes a second classification information determination unit that determines the second classification information based on morphemes.

本発明の一態様は、上記の分類装置であって、前記形態素を所定の条件に基づいて、他の形態素に置換する形態素置換部を更に備える。 One embodiment of the present invention is the classification device described above, further comprising a morpheme substitution unit that substitutes the morpheme with another morpheme based on a predetermined condition.

本発明の一態様は、上記の分類装置であって、前記形態素のうち前記文字列と最も意味が近い形態素を決定し、特定した前記形態素と意味が近い他の形態素とを含む形態素ペアを決定する形態素ペア決定部を更に備え、前記第１分類情報決定部は、前記訓練データに基づいて所定の機械学習をすることで、前記訓練データの形態素ペアに含まれる形態素に基づいて、前記第１分類情報が、これまで文字列に付与されてきた回数を含む第２学習結果データを記録する前記学習結果データ記憶部から前記第２学習結果データを取得し、前記第２学習結果データと前記形態素ペアとに基づいて、前記文字列の前記第１分類情報を決定する。 One embodiment of the present invention is the classification device described above, wherein a morpheme having a meaning closest to the character string is determined among the morphemes, and a morpheme pair including the specified morpheme and another morpheme having a meaning similar is determined. The first classification information determination unit further performs predetermined machine learning based on the training data, whereby the first classification information determination unit performs the first classification information determination based on the morpheme included in the morpheme pair of the training data. The second learning result data is acquired from the learning result data storage unit that records the second learning result data including the number of times classification information has been added to the character string so far, the second learning result data and the morpheme The first classification information of the character string is determined based on the pair.

本発明の一態様は、文字列を形態素に分割する形態素解析部と、文字列と前記文字列の分類結果とを含む複数の訓練データを取得し、所定の機械学習をすることで、前記文字列の特徴を表す第１分類情報がこれまで文字列に付与されてきた頻度を含む第１学習結果データを生成する学習結果データ生成部と、を備える、学習装置である。 One aspect of the present invention acquires a plurality of training data including a morphological analysis unit that divides a character string into morphemes, a character string and a classification result of the character string, and performs predetermined machine learning to obtain the character. And a learning result data generation unit that generates first learning result data including the frequency at which the first classification information representing the feature of the string has been assigned to the character string so far.

本発明の一態様は、分類装置が、文字列を形態素に分割する形態素解析ステップと、分類装置が、文字列と前記文字列の分類結果とを含む複数の訓練データに基づいて所定の機械学習をすることで、前記文字列の特徴を表す第１分類情報がこれまで文字列に付与されてきた頻度を含む第１学習結果データを記録する学習結果データ記憶部から前記第１学習結果データを取得し、前記第１学習結果データと前記形態素とに基づいて、前記文字列の前記第１分類情報を決定する第１分類情報決定ステップと、を有する、分類方法である。 In one aspect of the present invention, a classification device divides a character string into morphemes, a morphological analysis step; and a classification device includes predetermined character learning based on a plurality of training data including a character string and a classification result of the character string. The first learning result data is output from the learning result data storage unit that records the first learning result data including the frequency at which the first classification information representing the feature of the character string has been added to the character string by doing A first classification information determination step of acquiring the first classification information of the character string based on the first learning result data and the morpheme.

本発明の一態様は、学習装置が、文字列を形態素に分割する形態素解析ステップと、学習装置が、文字列と前記文字列の分類結果とを含む複数の訓練データを取得し、所定の機械学習をすることで、前記文字列の特徴を表す第１分類情報がこれまで文字列に付与されてきた頻度を含む第１学習結果データを生成する学習結果データ生成ステップと、を有する、学習方法である。 In one aspect of the present invention, the learning device divides a character string into morphemes, and the learning device acquires a plurality of training data including a character string and a classification result of the character string, and a predetermined machine A learning result data generation step of generating first learning result data including a frequency at which the first classification information representing the feature of the character string has been added to the character string by performing learning It is.

本発明の一態様は、上記の分類装置としてコンピュータを機能させるためのコンピュータプログラム。 One embodiment of the present invention is a computer program for causing a computer to function as the classification device described above.

本発明の一態様は、上記の学習装置としてコンピュータを機能させるためのコンピュータプログラム。 One embodiment of the present invention is a computer program for causing a computer to function as the above-described learning device.

本発明により、より簡単に情報を分類することが可能となる。 The present invention makes it possible to more easily classify information.

第１の実施形態の情報処理装置１００の機能構成を表す機能ブロック図である。It is a functional block diagram showing the functional composition of information processor 100 of a 1st embodiment. 訓練データテーブルの一具体例を示す図である。It is a figure which shows one specific example of a training data table. 置換データテーブルの一具体例を示す図である。It is a figure which shows one specific example of a substitution data table. 分類情報テーブルの一具体例を示す図である。It is a figure which shows one specific example of a classification | category information table. マスタ情報テーブルの一具体例を示す図である。It is a figure which shows one specific example of a master information table. 形態素解析の一具体例を示す図である。It is a figure which shows one specific example of morphological analysis. 形態素に基づいて生成された第１カテゴリ行列の一具体例を示す図である。It is a figure which shows one specific example of the 1st category matrix produced | generated based on the morpheme. 形態素ペアに基づいて生成された第１カテゴリ行列の一具体例を示す図である。It is a figure which shows one specific example of the 1st category matrix produced | generated based on the morpheme pair. 形態素に基づいて生成された第２カテゴリ行列の一具体例を示す図である。It is a figure which shows one specific example of the 2nd category matrix produced | generated based on the morpheme. 形態素ペアに基づいて生成された第２カテゴリ行列の一具体例を示す図である。It is a figure which shows one specific example of the 2nd category matrix produced | generated based on the morpheme pair. 第１の実施形態の情報処理装置１００のカテゴリ行列の生成の処理の流れを示すフローチャートである。It is a flow chart which shows a flow of processing of generation of a category matrix of information processor 100 of a 1st embodiment. 第１の実施形態の情報処理装置１００の分類処理の流れを示すフローチャートである。It is a flow chart which shows a flow of classification processing of information processor 100 of a 1st embodiment. 第１の実施形態の情報処理装置１００の分類情報の決定の処理の流れを示すフローチャートである。It is a flow chart which shows a flow of processing of determination of classification information of information processor 100 of a 1st embodiment. 第１の実施形態の情報処理装置１００の分類情報の決定の処理の流れを示すフローチャートである。It is a flow chart which shows a flow of processing of determination of classification information of information processor 100 of a 1st embodiment. 第２の実施形態の情報処理装置１００ａの機能構成を表す機能ブロック図である。It is a functional block diagram showing the functional composition of information processor 100a of a 2nd embodiment.

図１は、第１の実施形態の情報処理装置１００の機能構成を表す機能ブロック図である。情報処理装置１００は、パーソナルコンピュータ、サーバ、スマートフォン又はタブレットコンピュータ等の装置である。情報処理装置１００は、入力された文字列情報に対して、任意の文字列情報の特徴を示す情報を付与する。文字列情報は、文章、単語又は数字等の文字であればどのような情報であってもよい。本実施形態の文字列情報は、例えば、売買によって発行されたレシートをＯＣＲ（Optical Character Recognition）で読み取られた情報を想定して説明する。本実施形態では、文字列情報は、売買に関する商品名、販売数、販売高を少なくとも含む情報であるとして説明するが、これに限定されない。文字列情報は、例えば、ユーザによって入力された文字列であってもよいし、インターネット等から取得された文字列であってもよい。 FIG. 1 is a functional block diagram showing a functional configuration of the information processing apparatus 100 according to the first embodiment. The information processing apparatus 100 is an apparatus such as a personal computer, a server, a smartphone or a tablet computer. The information processing apparatus 100 adds information indicating characteristics of arbitrary character string information to the input character string information. The character string information may be any information as long as it is a text, a word or a character such as a number. The character string information of the present embodiment will be described, for example, on the assumption that a receipt issued by trading is read by an optical character recognition (OCR). In the present embodiment, the character string information is described as information including at least a product name, sales number, and sales amount related to trading, but is not limited thereto. The character string information may be, for example, a character string input by the user, or may be a character string acquired from the Internet or the like.

情報処理装置１００は、バスで接続されたプロセッサやメモリや補助記憶装置などを備え、分類プログラム又は学習プログラムを実行することによって、通信部１０１、入力部１０２、表示部１０３、訓練データ記憶部１０４、置換データ記憶部１０５、分類情報記憶部１０６、カテゴリ行列記憶部１０７、マスタ情報記憶部１０８及び制御部１０９を備える装置として機能する。なお、情報処理装置１００の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されてもよい。分類プログラム又は学習プログラムプログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置である。分類プログラム又は学習プログラムプログラムは、電気通信回線を介して送信されてもよい。 The information processing apparatus 100 includes a processor, a memory, an auxiliary storage device, and the like connected by a bus, and executes the classification program or the learning program to thereby execute the communication unit 101, the input unit 102, the display unit 103, and the training data storage unit 104. Functions as an apparatus including the replacement data storage unit 105, the classification information storage unit 106, the category matrix storage unit 107, the master information storage unit 108, and the control unit 109. Note that all or part of the functions of the information processing apparatus 100 may be realized using hardware such as an application specific integrated circuit (ASIC), a programmable logic device (PLD), or a field programmable gate array (FPGA). . The classification program or the learning program program may be recorded on a computer readable recording medium. The computer readable recording medium is, for example, a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in a computer system. The classification program or the learning program program may be transmitted via a telecommunication line.

通信部１０１は、ネットワークインタフェースである。通信部１０１はネットワークを介して、外部の装置と通信する。外部の装置とは、例えば、パーソナルコンピュータ、サーバ、スマートフォン、タブレットコンピュータ等の装置であってもよいし、画像形成装置又は画像処理装置等であってもよい。通信部１０１は、例えば無線ＬＡＮ（Local Area Network）、有線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）又はＬＴＥ（Long Term Evolution）（登録商標）等の通信方式で通信してもよい。 The communication unit 101 is a network interface. The communication unit 101 communicates with an external device via a network. The external apparatus may be, for example, an apparatus such as a personal computer, a server, a smartphone, a tablet computer, or the like, or may be an image forming apparatus, an image processing apparatus, or the like. The communication unit 101 may communicate using a communication method such as wireless LAN (Local Area Network), wired LAN, Bluetooth (registered trademark), or Long Term Evolution (LTE) (registered trademark), for example.

入力部１０２は、タッチパネル、マウス及びキーボード等の入力装置を用いて構成される。入力部１０２は、入力装置を情報処理装置１００に接続するためのインタフェースであってもよい。この場合、入力部１０２は、入力装置において入力された入力信号から入力データ（例えば、情報処理装置１００に対する指示を示す指示情報）を生成し、情報処理装置１００に入力する。 The input unit 102 is configured using an input device such as a touch panel, a mouse, and a keyboard. The input unit 102 may be an interface for connecting an input device to the information processing apparatus 100. In this case, the input unit 102 generates input data (for example, instruction information indicating an instruction to the information processing apparatus 100) from an input signal input to the input device, and inputs the input data to the information processing apparatus 100.

表示部１０３は、ＣＲＴ（Cathode Ray Tube）ディスプレイ、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイ等の出力装置である。表示部１０３は、出力装置を情報処理装置１００に接続するためのインタフェースであってもよい。この場合、表示部１０３は、映像データから映像信号を生成し自身に接続されている映像出力装置に映像信号を出力する。 The display unit 103 is an output device such as a cathode ray tube (CRT) display, a liquid crystal display, and an organic electro luminescence (EL) display. The display unit 103 may be an interface for connecting an output device to the information processing apparatus 100. In this case, the display unit 103 generates a video signal from the video data and outputs the video signal to a video output device connected to itself.

訓練データ記憶部１０４は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。訓練データ記憶部１０４は、訓練データテーブルを記憶する。訓練データテーブルは、学習結果データを生成するために用いられる文字列情報を記録するテーブルである。入力された文字列情報に付与された第１分類情報及び第２分類情報を記録するテーブルである。 The training data storage unit 104 is configured using a storage device such as a magnetic hard disk drive or a semiconductor storage device. The training data storage unit 104 stores a training data table. The training data table is a table for recording character string information used to generate learning result data. It is a table which records the 1st classification information and the 2nd classification information given to the inputted character string information.

図２は、訓練データテーブルの一具体例を示す図である。訓練データテーブルは、訓練データレコードを有する。訓練データレコードは、年月、店舗名、商品名、第１分類情報、第２分類情報、販売高及び販売数の各値を有する。年月は、文字列情報に含まれる情報である。年月は、売買がされた年月を表す。店舗名は、文字列情報に含まれる情報である。店舗名は、売買がされた店舗の名称を表す。商品名は、文字列情報に含まれる情報である。商品名は、売買がされた商品の名称を表す。商品名には、ＯＣＲによって、誤って認識された文字が含まれてもよい。第１分類情報は、商品名がどのようなカテゴリに含まれる商品であるかを表す。第２分類情報は、第１分類情報に基づいて、商品名をより具体的に説明する情報を表す。第１分類情報及び第２分類情報は１：ｎの関係で、対応付けられる。第１分類情報と第２分類情報とは、入力部１０２を介してユーザによって指定されてもよい。第１分類情報と第２分類情報との組み合わせは、一意に定められる。販売高は、商品名に表される商品の販売された金額を表す。販売数は、商品名に表される商品の販売された数を表す。 FIG. 2 is a diagram showing one specific example of the training data table. The training data table has training data records. The training data record has values of year, month, store name, product name, first classification information, second classification information, sales amount and sales number. Year and month are information included in the string information. The year and month represent the year and month when the sale was made. The store name is information included in the character string information. The store name represents the name of the store that has been bought and sold. The product name is information included in the character string information. The item name represents the name of the item for sale. The product name may include characters that are misrecognized by the OCR. The first classification information indicates in which category the product name is a product. The second classification information represents information that more specifically describes the product name based on the first classification information. The first classification information and the second classification information are associated in a 1: n relationship. The first classification information and the second classification information may be designated by the user via the input unit 102. The combination of the first classification information and the second classification information is uniquely determined. Sales represents the sold amount of the product represented by the product name. The number of sales represents the number of products sold in the product name.

図２に示される例では、訓練データテーブルの最上段の訓練データレコードは、年月の値が“２０１７０６”、店舗名の値が“店舗２００”、商品名の値が“ｄ手巷おにぎり日高昆布”、第１分類情報の値が“食品”、第２分類情報の値が“米類”、販売高の値が“３３０”、販売数の値が“３”である。従って、訓練データテーブルの最上段の訓練データレコードによると、入力された文字列情報の年月が“２０１７年６月”であり、店舗名が“店舗２００”であり、商品名が“ｄ手巷おにぎり日高昆布”である場合、第１分類情報として“食品”、第２分類情報として“米類”が付与されることがわかる。また、商品名“ｄ手巷おにぎり日高昆布”の販売高は“３３０”であり、販売数は“３”であることがわかる。なお、図２に示される訓練データテーブルは一具体例に過ぎない。そのため、図２とは異なる態様で訓練データテーブルが構成されてもよい。例えば、訓練データテーブルは、第２分類情報を有しなくてもよいし、第３分類情報を有するように構成されてもよい。 In the example shown in FIG. 2, in the training data record at the top of the training data table, the year-month value is "201706", the store name value is "store 200", and the product name value is "d". The value of the first classification information is "food", the value of the second classification information is "rice", the value of sales is "330", and the value of the number of sales is "3". Therefore, according to the training data record at the top of the training data table, the year and month of the input character string information is "June 2017", the store name is "store 200", and the product name is "d" In the case of rice ball rice ball Hidaka kelp, it is understood that "food" is given as the first classification information and "rice" is given as the second classification information. In addition, it can be seen that sales of the trade name "d Teshima rice ball, Hidaka kelp" is "330" and the number of sales is "3". The training data table shown in FIG. 2 is only one specific example. Therefore, the training data table may be configured in a mode different from that of FIG. For example, the training data table may not have the second classification information or may be configured to have the third classification information.

図２に戻り、情報処理装置１００の説明を続ける。置換データ記憶部１０５は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。置換データ記憶部１０５は、置換データテーブルを記憶する。置換データテーブルは、文字を異なる文字に置き換える規則を記録するテーブルである。 Returning to FIG. 2, the description of the information processing apparatus 100 will be continued. The replacement data storage unit 105 is configured using a storage device such as a magnetic hard disk drive or a semiconductor storage device. The replacement data storage unit 105 stores a replacement data table. The replacement data table is a table that records rules for replacing characters with different characters.

図３は、置換データテーブルの一具体例を示す図である。置換データテーブルは、置換データレコードを有する。置換データレコードは、置換前及び置換後の各値を有する。置換前は、置換前の文字を表す。置換後は、置換前の文字が置換される文字を表す。 FIG. 3 is a diagram showing a specific example of the replacement data table. The replacement data table has replacement data records. The replacement data record has each value before and after replacement. Before substitution represents the character before substitution. After substitution, the character before substitution represents the character to be substituted.

図３に示される例では、置換データテーブルの最上段の置換データレコードは、置換前の値が“手巷”、置換後の値が“手巻”である。従って、置換データテーブルの最上段の置換データレコードによると、与えられた文字が“手巷”である場合、“手巷”は“手巻”に置き換えられることがわかる。なお、図３に示される置換データテーブルは一具体例に過ぎない。そのため、図３とは異なる態様で置換データテーブルが構成されてもよい。 In the example shown in FIG. 3, in the replacement data record at the top of the replacement data table, the value before replacement is “manual” and the value after replacement is “handroll”. Therefore, according to the replacement data record at the top of the replacement data table, it can be understood that the "manual" is replaced with the "handroll" when the given character is the "manual". The replacement data table shown in FIG. 3 is only one specific example. Therefore, the replacement data table may be configured in a mode different from that of FIG.

図２に戻り、情報処理装置１００の説明を続ける。分類情報記憶部１０６は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。分類情報記憶部１０６は、分類情報テーブルを記憶する。分類情報テーブルは、第１分類情報と第２分類情報との対応付けを記録するテーブルである。 Returning to FIG. 2, the description of the information processing apparatus 100 will be continued. The classification information storage unit 106 is configured using a storage device such as a magnetic hard disk drive or a semiconductor storage device. The classification information storage unit 106 stores a classification information table. The classification information table is a table for recording the correspondence between the first classification information and the second classification information.

図４は、分類情報テーブルの一具体例を示す図である。分類情報テーブルは、分類情報レコードを有する。分類情報レコードは、第１分類情報及び第２分類情報の各値を有する。図４に示される例では、分類情報テーブルの最上段の分離情報レコードは、第１分類情報の値が“食品”、第２分類情報の値が“米類”、“麺類”、“パン類”、“肉類”、“野菜類”、・・・である。従って、分類情報テーブルの最上段の分類情報レコードによると、第１分類情報として“食品”が付与された場合、第２分類情報として、“米類”、“麺類”、“パン類”、“肉類”、“野菜類”、・・・のいずれかが付与されることがわかる。なお、図４に示される分類情報テーブルは一具体例に過ぎない。そのため、図４とは異なる態様で分類情報テーブルが構成されてもよい。 FIG. 4 is a diagram showing a specific example of the classification information table. The classification information table has classification information records. The classification information record has values of the first classification information and the second classification information. In the example shown in FIG. 4, in the separation information record at the top of the classification information table, the value of the first classification information is “food”, the value of the second classification information is “rice”, “noodles”, “bread” "," Meats "," Vegetables ", .... Therefore, according to the classification information record at the top of the classification information table, when "food" is given as the first classification information, "rice", "noodles", "breads", It can be seen that any of "meats", "vegetables", ... is applied. The classification information table shown in FIG. 4 is only one specific example. Therefore, the classification information table may be configured in a mode different from FIG.

図２に戻り、情報処理装置１００の説明を続ける。カテゴリ行列記憶部１０７は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。カテゴリ行列記憶部１０７は、第１カテゴリ行列又は第２カテゴリ行列を記憶する。第１カテゴリ行列は、文字列情報に対して付与される第１分類情報を決定するための第１決定情報に関連する値を保持する行列である。第１決定情報は、訓練データに含まれる文字列情報に対して付与された第１分類情報の分類結果の頻度を表す情報である。第２カテゴリ行列は、文字列情報に対して付与される第２分類情報を決定するための第２決定情報に関連する値を保持する行列である。第２決定情報は、訓練データに含まれる文字列情報に対して付与された第２分類情報の分類結果の頻度を表す情報である。第１カテゴリ行列及び第２カテゴリ行列の詳細な説明に関しては後述する。カテゴリ行列記憶部１０７は、学習結果データ記憶部の一態様である。 Returning to FIG. 2, the description of the information processing apparatus 100 will be continued. The category matrix storage unit 107 is configured using a storage device such as a magnetic hard disk drive or a semiconductor storage device. The category matrix storage unit 107 stores a first category matrix or a second category matrix. The first category matrix is a matrix that holds values associated with the first determination information for determining the first classification information given to the character string information. The first determination information is information representing the frequency of the classification result of the first classification information added to the character string information included in the training data. The second category matrix is a matrix that holds values associated with second determination information for determining second classification information to be attached to character string information. The second determination information is information representing the frequency of the classification result of the second classification information added to the character string information included in the training data. The detailed description of the first category matrix and the second category matrix will be described later. The category matrix storage unit 107 is an aspect of the learning result data storage unit.

マスタ情報記憶部１０８は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。マスタ情報記憶部１０８は、マスタ情報テーブルを記憶する。マスタ情報テーブルは、文字列情報と文字列情報に付与された第１分類情報又は第２分類情報との対応関係が記録されるテーブルである。 The master information storage unit 108 is configured using a storage device such as a magnetic hard disk drive or a semiconductor storage device. The master information storage unit 108 stores a master information table. The master information table is a table in which the correspondence between the character string information and the first classification information or the second classification information attached to the character string information is recorded.

図５は、マスタ情報テーブルの一具体例を示す図である。マスタ情報テーブルは、マスタ情報レコードを有する。マスタ情報レコードは、商品名、第１分類情報及び第２分類情報の各値を有する。商品名は、商品の名前を表す。商品名は、文字列情報の一態様である。第１分類情報は、商品名に付与された第１分類情報を表す。第２分類情報は、商品名に付与された第２分類情報を表す。 FIG. 5 is a diagram showing a specific example of the master information table. The master information table has master information records. A master information record has each value of goods name, 1st classification information, and 2nd classification information. The product name represents the name of the product. The product name is an aspect of character string information. The first classification information represents first classification information attached to the item name. The second classification information represents second classification information attached to the product name.

図５に示される例では、マスタ情報テーブルの最上段のマスタ情報レコードは、商品名の値が“手巻おにぎり日高昆布”、第１分類情報の値が“食品”、第２分類情報の値が“米類”である。従って、マスタ情報テーブルの最上段のマスタ情報レコードによると、与えられた文字列情報が“手巻おにぎり日高昆布”である場合、第1分類情報は“食品”であり、第２分類情報は“米類”であることがわかる。なお、図５に示されるマスタ情報テーブルは一具体例に過ぎない。そのため、図５とは異なる態様でマスタ情報テーブルが構成されてもよい。例えば、マスタ情報テーブルは、商品名が、文字列情報として与えられた回数を記憶するように構成されてもよい。 In the example shown in FIG. 5, in the master information record at the top of the master information table, the value of the product name is "handroll rice ball Hidaka kelp", the value of the first classification information is "food", and the second classification information The value is "rice". Therefore, according to the master information record at the top of the master information table, when the given character string information is "handroll rice ball Hidaka kelp", the first classification information is "food" and the second classification information is It turns out that it is "rice". The master information table shown in FIG. 5 is only one specific example. Therefore, the master information table may be configured in a mode different from FIG. For example, the master information table may be configured to store the number of times the product name is given as character string information.

図２に戻り、情報処理装置１００の説明を続ける。制御部１０９は、情報処理装置１００の各部の動作を制御する。制御部１０９は、例えばプロセッサ及びメモリを備えた装置により実行される。制御部１０９は、分類プログラム又は学習プログラムを実行することによって、データ取得部１１０、形態素解析部１１１、形態素置換部１１２、形態素ペア決定部１１３、学習部１１４及び分類部１１５として機能する。 Returning to FIG. 2, the description of the information processing apparatus 100 will be continued. The control unit 109 controls the operation of each unit of the information processing apparatus 100. The control unit 109 is executed by an apparatus including a processor and a memory, for example. The control unit 109 functions as a data acquisition unit 110, a morpheme analysis unit 111, a morpheme replacement unit 112, a morpheme pair determination unit 113, a learning unit 114, and a classification unit 115 by executing a classification program or a learning program.

データ取得部１１０は、文字列情報を取得する。データ取得部１１０は、文字列情報を入力部１０２を介して受け付けてもよいし、通信部１０１から受け付けてもよい。データ取得部１１０は、例えば、売買のレシート等をＯＣＲで読み取った情報を文字列情報として取得してもよい。データ取得部１１０は、取得した文字列情報を分類部１１５に出力する。データ取得部１１０は、取得した文字列情報を訓練データ記憶部１０４に記録してもよい。 The data acquisition unit 110 acquires character string information. The data acquisition unit 110 may receive character string information via the input unit 102 or may receive it from the communication unit 101. For example, the data acquisition unit 110 may acquire, as character string information, information obtained by reading a sales receipt or the like by OCR. The data acquisition unit 110 outputs the acquired character string information to the classification unit 115. The data acquisition unit 110 may record the acquired character string information in the training data storage unit 104.

形態素解析部１１１は、文字列情報に対して形態素解析を行う。形態素解析は、文字列情報を形態素に分解する処理である。形態素解析は、分解された形態素に対して品詞を判別する処理である。形態素解析には、公知のアルゴリズムが用いられてもよい。形態素は、文字列情報が分割されることで得られる。形態素は、何らかの意味を持つ最小単位の文字である。 The morphological analysis unit 111 performs morphological analysis on character string information. Morphological analysis is processing to decompose character string information into morphemes. Morphological analysis is a process of determining part of speech with respect to decomposed morphemes. Known algorithms may be used for morphological analysis. The morpheme is obtained by dividing the string information. A morpheme is a character of the smallest unit with some meaning.

形態素置換部１１２は、所定の形態素を、異なる形態素に置き換える。所定の形態素は、置換データテーブルの置換前カラムに保持される形態素である。形態素置換部１１２は、形態素が置換データテーブルの置換前カラムに記憶されているか否かを判定する。形態素が置換前カラムに記憶されている場合、形態素置換部１１２は、形態素を置換後カラムの形態素に置換する。形態素が置換前カラムに記憶されていない場合、形態素置換部１１２は、形態素を異なる形態素に置換しない。 The morpheme replacing unit 112 replaces a predetermined morpheme with a different morpheme. The predetermined morpheme is a morpheme held in the pre-replacement column of the replacement data table. The morpheme replacing unit 112 determines whether the morpheme is stored in the pre-replacement column of the replacement data table. When the morpheme is stored in the pre-replacement column, the morpheme replacement unit 112 replaces the morpheme with the morpheme of the post-replacement column. If the morpheme is not stored in the pre-replacement column, the morpheme replacement unit 112 does not replace the morpheme with a different morpheme.

図６は、形態素解析の一具体例を示す図である。図６では、文字列情報として“ｄ手巷おにぎり日高昆布”が取得された場合を説明する。形態素解析部１１１は、文字列情報を形態素に分割する。形態素解析部１１１は、“ｄ手巷おにぎり日高昆布”を“ｄ”、“手巷”、“おにぎり”、“日高”及び“昆布”の形態素に分解する。形態素置換部１１２は、形態素が置換データテーブルの置換前カラムに、記憶されているか否かを判定する。図３及び図６によると形態素置換部１１２は、形態素“手巷”を、“手巻”に置換する。形態素解析部１１１は、形態素毎に、品詞を判別する。図６によると、形態素解析部１１１は、ｄを、“記号”、手巻を、“名詞”、おにぎりを、“名詞”、日高を、“名詞”、昆布を、“名詞”であると判別する。なお、形態素解析部１１１は、記号と判別された形態素を削除する。 FIG. 6 is a diagram showing a specific example of morphological analysis. In FIG. 6, the case where "d hand-made rice ball Nidaka kelp" is acquired as character string information is demonstrated. The morphological analysis unit 111 divides character string information into morphemes. The morphological analysis unit 111 decomposes “d hand-roll rice ball Hidaka kelp” into “d”, “hand-rice ball”, “rice ball”, “hidaka” and “kelp” morphemes. The morpheme replacing unit 112 determines whether the morpheme is stored in the pre-replacement column of the replacement data table. According to FIGS. 3 and 6, the morpheme substitution unit 112 substitutes the morpheme "hand-roll" with "hand-roll". The morphological analysis unit 111 determines the part of speech for each morpheme. According to FIG. 6, the morphological analysis unit 111 determines that d is “symbol”, handroll is “noun”, rice ball is “noun”, Hidaka is “noun”, and kelp is “noun”. Determine. The morpheme analysis unit 111 deletes the morpheme determined as the symbol.

図２に戻り、情報処理装置１００の説明を続ける。形態素ペア決定部１１３は、形態素ペアを決定する。形態素ペアは、分解された形態素のうち、文字列情報の先頭及び末尾に位置づけられる形態素の組み合わせである。例えば、文字列情報が“ｄ手巻おにぎり日高昆布”である場合、形態素は、“ｄ”、“手巻”、“おにぎり”、“日高”及び“昆布”である。“ｄ”は、記号であるため削除される。したがって、形態素ペア決定部１１３は、“手巻”及び“昆布”を形態素ペアとして決定する。形態素ペア決定部１１３は、形態素解析部１１１から受け付けた形態素に基づいて、形態素ペアを決定する。 Returning to FIG. 2, the description of the information processing apparatus 100 will be continued. The morpheme pair determining unit 113 determines morpheme pairs. The morpheme pair is a combination of morphemes positioned at the beginning and end of the character string information among the decomposed morphemes. For example, when the character string information is "d hand-rolled rice ball Hidaka kelp", the morphemes are "d", "hand-roll", "rice ball", "hidaka" and "kelp". “D” is deleted because it is a symbol. Therefore, the morpheme pair determining unit 113 determines “hand-roll” and “kobu” as morpheme pairs. The morpheme pair determining unit 113 determines a morpheme pair based on the morpheme received from the morpheme analyzing unit 111.

学習部１１４は、学習プログラムを実行することで、第１カテゴリ行列生成部１１６及び第２カテゴリ行列生成部１１７として機能する。分類部１１５は、分類プログラムを実行することで、マスタ情報検索部１１８、第１分類情報決定部１１９、第２分類情報決定部１２０及びマスタ情報生成部１２１として機能する。 The learning unit 114 functions as a first category matrix generation unit 116 and a second category matrix generation unit 117 by executing a learning program. The classification unit 115 functions as a master information search unit 118, a first classification information determination unit 119, a second classification information determination unit 120, and a master information generation unit 121 by executing a classification program.

第１カテゴリ行列生成部１１６は、形態素又は形態素ペアに基づいて、所定の機械学習によって第１カテゴリ行列を生成する。形態素は、形態素解析部１１１によって分解された形態素が用いられてもよい。第１カテゴリ行列は、形態素に付与された第１分類情報の頻度を表す行列である。本機械学習によって生成された第１カテゴリ行列は、第１学習結果データの一態様である。形態素ペアは、形態素ペア決定部１１３によって決定された形態素ペアが用いられてもよい。まず、第１カテゴリ行列生成部１１６が、形態素に基づく第１カテゴリ行列を生成する場合の機械学習について説明する。 The first category matrix generation unit 116 generates a first category matrix by predetermined machine learning based on a morpheme or a morpheme pair. As the morpheme, a morpheme decomposed by the morpheme analysis unit 111 may be used. The first category matrix is a matrix that represents the frequency of the first classification information attached to the morpheme. The first category matrix generated by the present machine learning is an aspect of the first learning result data. The morpheme pair determined by the morpheme pair determining unit 113 may be used as the morpheme pair. First, machine learning in the case where the first category matrix generation unit 116 generates a first category matrix based on morphemes will be described.

第１カテゴリ行列生成部１１６は、分類情報記憶部１０６に記憶される分類情報テーブルから、第１分類情報を取得する。第１カテゴリ行列生成部１１６は、形態素解析部１１１から形態素を取得する。形態素解析部１１１は、訓練データ記憶部１０４に記憶される訓練データテーブルの商品名カラムに対して形態素解析を行うことで、形態素を生成する。形態素解析部１１１は、生成した形態素を第１カテゴリ行列生成部１１６に出力する。第１カテゴリ行列生成部１１６は、第１カテゴリ行列の行として第１分類情報を、第１カテゴリ行列の列として形態素を、設定する。 The first category matrix generation unit 116 acquires first classification information from the classification information table stored in the classification information storage unit 106. The first category matrix generation unit 116 acquires morphemes from the morpheme analysis unit 111. The morpheme analysis unit 111 generates a morpheme by performing a morpheme analysis on the product name column of the training data table stored in the training data storage unit 104. The morphological analysis unit 111 outputs the generated morpheme to the first category matrix generation unit 116. The first category matrix generation unit 116 sets the first classification information as a row of the first category matrix and the morpheme as a column of the first category matrix.

第１カテゴリ行列生成部１１６は、取得した各形態素の数を計数する。例えば、第１カテゴリ行列生成部１１６は、形態素解析部１１１から形態素“手巻”を取得した数が“３”である場合、形態素“手巻”の数は“３”と計数する。第１カテゴリ行列生成部１１６は、形態素毎に、各第１分類情報に属する形態素の割合を算出する。第１分類情報の値は、訓練データテーブルにて、商品名に対応付けられた第１分類情報が用いられる。具体的には、第１カテゴリ行列生成部１１６は、形態素毎に、各第１分類情報に対応付けられた形態素の個数を、その形態素の計数された総数で除算することで、割合を算出する。例えば、形態素“手巻”の総数が７であり、そのうち、第１分類情報“食品”に対応付けられた形態素“手巻”が６、第１分類情報“書籍”に対応付けられた形態素“手巻”が１である場合、第１カテゴリ行列生成部１１６は、“食品”に属する形態素の割合は、“０．８６”、“書籍”に属する形態素の割合は、“０．１４”、他の第１分類情報に属する形態素の割合は、“０．００”であると算出する。第１カテゴリ行列生成部１１６は、算出された値を、第１カテゴリ行列の各セルに設定する。第１カテゴリ行列生成部１１６は、第１カテゴリ行列をカテゴリ行列記憶部１０７に記録する。なお、第１カテゴリ行列生成部１１６は、割合を算出するに当たり、販売数を考慮するように構成されてもよい。例えば、第１カテゴリ行列生成部１１６は、訓練データテーブルに販売数として“３”が記録されている場合、形態素の数を“３”として計数してもよい。 The first category matrix generation unit 116 counts the number of acquired morphemes. For example, when the number of acquired morpheme “handrolls” from the morphological analysis unit 111 is “3”, the first category matrix generation unit 116 counts the number of morphemes “handrolls” as “3”. The first category matrix generation unit 116 calculates, for each morpheme, the proportion of morphemes belonging to each piece of first classification information. As the value of the first classification information, the first classification information associated with the product name is used in the training data table. Specifically, for each morpheme, first category matrix generation section 116 calculates the ratio by dividing the number of morphemes associated with each piece of first classification information by the total number of the morphemes counted. . For example, the total number of morpheme "handroll" is 7, among which morpheme "handroll" associated with the first classification information "food" is 6, morpheme associated with the first classification information "book" When the “hand-roll” is 1, the first category matrix generation unit 116 determines that the proportion of morphemes belonging to “food” is “0.86”, the proportion of morphemes belonging to “books” is “0.14”, The proportion of morphemes belonging to the other first classification information is calculated as “0.00”. The first category matrix generation unit 116 sets the calculated value in each cell of the first category matrix. The first category matrix generation unit 116 records the first category matrix in the category matrix storage unit 107. The first category matrix generation unit 116 may be configured to consider the number of sales when calculating the ratio. For example, when “3” is recorded as the number of sales in the training data table, the first category matrix generation unit 116 may count the number of morphemes as “3”.

図７は、形態素に基づいて生成された第１カテゴリ行列の一具体例を示す図である。第１カテゴリ行列の行は、“形態素”、“食品”、“飲料”、“書籍”、“文具”、“日用品”、・・・等の第１分類情報が表される。第１カテゴリ行列の列は、“塩”、“文芸”、“カツ”、“バナナ”、“ラーメン”、・・・等の形態素が表される。 FIG. 7 is a diagram showing a specific example of the first category matrix generated based on morphemes. Rows of the first category matrix represent first classification information such as “morpheme”, “food”, “beverage”, “book”, “stationery”, “daily goods”, and so on. The columns of the first category matrix represent morphemes such as "salt", "literary arts", "cutlet", "banana", "ramen", and so on.

図７に示される例では、第１カテゴリ行列の最上段の行では、形態素の値が“塩”、食品の値が“０．４２”、飲料の値が“０．３３”、書籍の値が“０．０１”、文具の値が“０．０２”、日用品の値が“０．０４”、・・・である。従って、第１カテゴリ行列の最上段の行によると、形態素に“塩”が含まれる場合、食品である可能性は“０．４２”、飲料である可能性は“０．３３”、書籍である可能性は“０．０１”、文具である可能性は“０．０２”、日用品である可能性は“０．０４”、・・・であることがわかる。 In the example shown in FIG. 7, in the top row of the first category matrix, the morpheme value is “salt”, the food value is “0.42”, the beverage value is “0.33”, and the book value Is “0.01”, the value of stationery is “0.02”, the value of daily necessities is “0.04”,. Therefore, according to the top row of the first category matrix, when the morpheme contains "salt", the possibility of being a food is "0.42", the possibility of being a beverage is "0.33", and One possibility is "0.01", the possibility of being a stationery is "0.02", the possibility of being a commodity is "0.04", and so on.

次に、第１カテゴリ行列生成部１１６が、形態素ペアに基づく第１カテゴリ行列を生成する場合の機械学習について説明する。本機械学習により生成された第１カテゴリ行列は、第２学習結果データの一態様である。第１カテゴリ行列生成部１１６は、形態素ペア決定部１１３から形態素ペアを取得する。第１カテゴリ行列生成部１１６は、第１カテゴリ行列の行として第１分類情報を、第１カテゴリ行列の列として形態素ペアを、設定する。 Next, machine learning in the case where the first category matrix generation unit 116 generates a first category matrix based on morpheme pairs will be described. The first category matrix generated by the present machine learning is an aspect of the second learning result data. The first category matrix generation unit 116 acquires morpheme pairs from the morpheme pair determination unit 113. The first category matrix generation unit 116 sets the first classification information as the rows of the first category matrix, and sets the morpheme pairs as the columns of the first category matrix.

第１カテゴリ行列生成部１１６は、取得した各形態素ペアの数を計数する。例えば、第１カテゴリ行列生成部１１６は、形態素ペア“手巻”、“昆布”を取得した数が“３”である場合、形態素ペア“手巻”、“昆布”の数は“３”と計数する。第１カテゴリ行列生成部１１６は、形態素ペア毎に、各第１分類情報に属する形態素ペアの割合を算出する。具体的には、第１カテゴリ行列生成部１１６は、各形態素ペアに対して、各第１分類情報に対応付けられた形態素ペアの個数を、その形態素ペアの計数された総数で除算することで、割合を算出する。例えば、形態素ペア“手巻”、“昆布”の総数が３であり、そのうち、第１分類情報“食品”に対応付けられた形態素ペア“手巻”、“昆布”が３である場合、“食品”に属する形態素ペアの割合は、“１．００”であり、他の第１分類情報に属する形態素ペアの割合は、“０．００”である。第１カテゴリ行列生成部１１６は、算出された割合を、第１カテゴリ行列の各セルに設定する。第１カテゴリ行列生成部１１６は、第１カテゴリ行列をカテゴリ行列記憶部１０７に記録する。なお、第１カテゴリ行列生成部１１６は、割合を算出するに当たり、販売数を考慮するように構成されてもよい。例えば、第１カテゴリ行列生成部１１６は、訓練データテーブルに販売数として“３”が記録されている場合、形態素の数を“３”として計数してもよい。 The first category matrix generation unit 116 counts the number of acquired morpheme pairs. For example, when the first category matrix generation unit 116 determines that the number of morpheme pairs “hand-rolled” and “Konbu” acquired is “3”, the number of morpheme pairs “hand-rolled” and “kelp” is “3”. Count The first category matrix generation unit 116 calculates, for each morpheme pair, a ratio of morpheme pairs belonging to each piece of first classification information. Specifically, for each morpheme pair, first category matrix generation section 116 divides the number of morpheme pairs associated with each piece of first classification information by the total number of counted morpheme pairs. , Calculate the ratio. For example, when the total number of morpheme pairs "hand-roll" and "kelp" is 3, and the morpheme pair "hand-roll" and "kelp" corresponding to the first classification information "food" are 3, for example, " The proportion of morpheme pairs belonging to "food" is "1.00", and the proportion of morpheme pairs belonging to the other first classification information is "0.00". The first category matrix generation unit 116 sets the calculated ratio to each cell of the first category matrix. The first category matrix generation unit 116 records the first category matrix in the category matrix storage unit 107. The first category matrix generation unit 116 may be configured to consider the number of sales when calculating the ratio. For example, when “3” is recorded as the number of sales in the training data table, the first category matrix generation unit 116 may count the number of morphemes as “3”.

図８は、形態素ペアに基づいて生成された第１カテゴリ行列の一具体例を示す図である。第１カテゴリ行列の行は、“第１形態素”、“第２形態素”、“食品”、“飲料”、“書籍”、“文具”、“日用品”、・・・等の第１分類情報が表される。第１カテゴリ行列の列は、第１形態素の列が“Ａ社”、“Ａ社”、“Ａ社”、“Ｂ社”、・・・等、第２形態素の列が“ヌードル”、“焼きそば”、“ミニ”、“ペン”・・・等が表される。第１形態素は、形態素ペアのうち、先頭の形態素を表す。第２形態素は、形態素ペアのうち末尾の形態素を表す。 FIG. 8 is a diagram showing a specific example of the first category matrix generated based on morpheme pairs. The rows of the first category matrix indicate the first classification information such as “first morpheme”, “second morpheme”, “food”, “drink”, “book”, “stationery”, “daily necessities”,. expressed. The columns of the first category matrix are the columns of the first morpheme "company A", "company A", "company A", "company A", "company B", etc., and the columns of the second morpheme "noodle", " Yakisoba "," mini "," pen ", etc. are represented. The first morpheme represents the top morpheme of the morpheme pair. The second morpheme represents the morpheme at the end of the morpheme pair.

図８に示される例では、第１カテゴリ行列の最上段の行では、第１形態素の値が“Ａ社”、第２形態素の値が“ヌードル”、食品の値が“０．９７”、飲料の値が“０．０１”、書籍の値が“０．０１”、文具の値が“０．００”、日用品の値が“０．００”、・・・である。従って、第１カテゴリ行列の最上段の行によると、形態素ペアが“Ａ社”、“ヌードル”である場合、食品である可能性は“０．９７”、飲料である可能性は“０．０１”、書籍である可能性は“０．０１”、文具である可能性は“０．００”、日用品である可能性は“０．００”、・・・であることがわかる。 In the example shown in FIG. 8, in the top row of the first category matrix, the value of the first morpheme is “A company”, the value of the second morpheme is “noodle”, the value of the food is “0.97”, The beverage value is "0.01", the book value is "0.01", the stationery value is "0.00", the daily necessities value is "0.00", and so on. Therefore, according to the top row of the first category matrix, when the morpheme pair is "company A" and "noodle", the possibility of being a food is "0.97" and the possibility of being a beverage is "0. 01 ", the possibility of being a book is" 0.01 ", the possibility of being a stationery is" 0.00 ", the possibility of being a daily necessities is" 0.00 ", and so on.

第２カテゴリ行列生成部１１７は、形態素又は形態素ペアと第１カテゴリ行列生成部１１６によって生成された第１カテゴリ行列とに基づいて、所定の機械学習によって第２カテゴリ行列を生成する。形態素は、形態素解析部１１１によって分解された形態素が用いられてもよい。第２カテゴリ行列は、形態素に付与された第２分類情報の頻度を表す行列である。本機械学習によって生成された第２カテゴリ行列は、第３学習結果データの一態様である。形態素ペアは、形態素ペア決定部１１３によって決定された形態素ペアが用いられてもよい。まず、第２カテゴリ行列生成部１１７が、形態素に基づく第２カテゴリ行列を生成する場合の機械学習について説明する。 The second category matrix generation unit 117 generates a second category matrix by predetermined machine learning based on the morpheme or the morpheme pair and the first category matrix generated by the first category matrix generation unit 116. As the morpheme, a morpheme decomposed by the morpheme analysis unit 111 may be used. The second category matrix is a matrix that represents the frequency of the second classification information assigned to the morpheme. The second category matrix generated by the present machine learning is an aspect of the third learning result data. The morpheme pair determined by the morpheme pair determining unit 113 may be used as the morpheme pair. First, machine learning in the case where the second category matrix generation unit 117 generates a second category matrix based on morphemes will be described.

第２カテゴリ行列生成部１１７は、分類情報記憶部１０６に記憶される分類情報テーブルから、決定された第１分類情報に対応付けられた第２分類情報を取得する。第２カテゴリ行列生成部１１７は、形態素解析部１１１から形態素を取得する。第２カテゴリ行列生成部１１７は、第１カテゴリ行列の行として第２分類情報を、第２カテゴリ行列の列として形態素を、設定する。 The second category matrix generation unit 117 acquires, from the classification information table stored in the classification information storage unit 106, the second classification information associated with the determined first classification information. The second category matrix generation unit 117 acquires morphemes from the morpheme analysis unit 111. The second category matrix generation unit 117 sets second classification information as a row of the first category matrix and a morpheme as a column of the second category matrix.

第２カテゴリ行列生成部１１７は、取得した各形態素の数を、第２分類情報毎に計数する。例えば、第２カテゴリ行列生成部１１７は、第２分類情報が“米類”である形態素“手巻”を取得した数が“１”である場合、形態素“手巻”の第２分類情報“米類”の数は“１”と計数する。さらに、第２カテゴリ行列生成部１１７は、第２分類情報が“麺類”である形態素“手巻”を取得した数が“０”である場合、形態素“手巻”の第２分類情報“麺類”の数は“０”と計数する。第２カテゴリ行列生成部１１７は、計数された値を、第２カテゴリ行列の各セルに設定する。第２カテゴリ行列生成部１１７は、第２カテゴリ行列をカテゴリ行列記憶部１０７に記録する。なお、第２カテゴリ行列生成部１１７は、計数するに当たり、販売数を考慮するように構成されてもよい。例えば、第２カテゴリ行列生成部１１７は、訓練データテーブルに第２分類情報が“米類”である形態素“手巻”の販売数として“３”が記録されている場合、形態素の数を“３”として計数してもよい。 The second category matrix generation unit 117 counts the number of acquired morphemes for each second classification information. For example, if the number of acquisitions of the morpheme "handroll" whose second classification information is "rice" is "1", the second category matrix generation unit 117 generates second classification information "of the morpheme" handroll ". The number of “rices” is counted as “1”. Furthermore, if the number of acquisitions of the morpheme “handroll” whose second classification information is “noodles” is “0”, the second category matrix generation unit 117 generates the second classification information “noodles” of the morpheme “handroll”. The number of "" counts as "0". The second category matrix generation unit 117 sets the counted value to each cell of the second category matrix. The second category matrix generation unit 117 records the second category matrix in the category matrix storage unit 107. The second category matrix generation unit 117 may be configured to consider the number of sales when counting. For example, if “3” is recorded as the number of sales of morpheme “handroll” whose second classification information is “rice” in the training data table, second category matrix generation section 117 sets the number of morphemes to “ It may be counted as 3 ".

図９は、形態素に基づいて生成された第２カテゴリ行列の一具体例を示す図である。第２カテゴリ行列の行は、“形態素”、“米類”、“麺類”、“パン類”、“肉類”、“野菜類”、・・・等の第１分類情報“食品”に対応付けられた第２分類情報が表される。第１カテゴリ行列の列は、“焼きそば”、“チャーシュー”、“とんこつ”、“二八”、“ミート”、・・・等の形態素が表される。 FIG. 9 is a diagram showing a specific example of the second category matrix generated based on morphemes. Rows of the second category matrix correspond to the first classification information “food” such as “morpheme”, “rice”, “noodles”, “breads”, “meats”, “vegetables”,. The second classified information is represented. The columns of the first category matrix represent morphemes such as “yakisoba”, “chishu”, “tonkotsu”, “28”, “meet”, and so on.

図９に示される例では、第１カテゴリ行列の最上段の行では、形態素の値が“焼きそば”、米類の値が“０”、麺類の値が“１２”、パン類の値が“６”、肉類の値が“２”、野菜類の値が“３”、・・・である。従って、第１カテゴリ行列の最上段の行によると、形態素に“焼きそば”が含まれる場合、米類である可能性は“０”、麺類である可能性は“１２”、パン類である可能性は“６”、肉類である可能性は“２”、野菜類である可能性は“３”、・・・であることがわかる。 In the example shown in FIG. 9, in the top row of the first category matrix, the morpheme value is "Yakisoba", the rice value is "0", the noodle value is "12", and the bread value is " 6 ", the value of meat is" 2 ", the value of vegetables is" 3 ", and so on. Therefore, according to the top row of the first category matrix, if the morpheme contains "fried soba", the possibility of being rice is "0", the possibility of being noodles is "12", and it is possible to be bread The sex is "6", the possibility of being meat is "2", the possibility of being vegetables is "3", and so on.

次に、第２カテゴリ行列生成部１１７が、形態素ペアに基づく第２カテゴリ行列を生成する場合の機械学習について説明する。本機械学習によって生成された第２カテゴリ行列は、第４学習結果データの一態様である。第２カテゴリ行列生成部１１７は、形態素ペア決定部１１３から形態素ペアを取得する。第２カテゴリ行列生成部１１７は、第２カテゴリ行列の行として第２分類情報を、第２カテゴリ行列の列として形態素ペアを、設定する。 Next, machine learning in the case where the second category matrix generation unit 117 generates a second category matrix based on morpheme pairs will be described. The second category matrix generated by the present machine learning is an aspect of the fourth learning result data. The second category matrix generation unit 117 acquires morpheme pairs from the morpheme pair determination unit 113. The second category matrix generation unit 117 sets second classification information as a row of the second category matrix and sets a morpheme pair as a column of the second category matrix.

第２カテゴリ行列生成部１１７は、取得した各形態素ペアの数を、第２分類情報毎に計数する。第２分類情報は、取得した形態素ペアに関する文字列情報に付与された第１分類情報に対応付けられる分類情報である。例えば、第２カテゴリ行列生成部１１７は、第２分類情報が“米類”、形態素ペア“手巻”、“昆布”を取得した数が“１”である場合、第２分類情報が“米類”に関する形態素ペア“手巻”、“昆布”の数は“１”と計数する。また、第２カテゴリ行列生成部１１７は、第２分類情報が“麺類”、形態素ペア“手巻”、“昆布”を取得したが“０”である場合、第２分類情報が“麺類”に関する形態素ペア“手巻”、“昆布”の数は“０”と計数する。第２カテゴリ行列生成部１１７は、計数した値を、第２カテゴリ行列に設定する。第２カテゴリ行列生成部１１７は、第２カテゴリ行列をカテゴリ行列記憶部１０７に記録する。なお、第２カテゴリ行列生成部１１７は、計数するに当たり、販売数を考慮するように構成されてもよい。例えば、第２カテゴリ行列生成部１１７は、訓練データテーブルに第２分類情報が“米類”である形態素“手巻”の販売数として“３”が記録されている場合、形態素の数を“３”として計数してもよい。 The second category matrix generation unit 117 counts the number of acquired morpheme pairs for each second classification information. The second classification information is classification information associated with the first classification information attached to the character string information related to the acquired morpheme pair. For example, if the second category information is “rices”, the morpheme pair “handroll”, and the number of acquired “Kunbu” is “1”, the second category matrix generation unit 117 determines that the second category information is “rice”. The number of morpheme pairs "handroll" and "kelp" related to "class" is counted as "1". In addition, when the second category information generation unit 117 acquires the second classification information “noodles”, the morpheme pair “handroll”, and “kelp” but is “0”, the second classification information relates to “noodles”. The number of morpheme pairs "handroll" and "kelp" is counted as "0". The second category matrix generation unit 117 sets the counted value as a second category matrix. The second category matrix generation unit 117 records the second category matrix in the category matrix storage unit 107. The second category matrix generation unit 117 may be configured to consider the number of sales when counting. For example, if “3” is recorded as the number of sales of morpheme “handroll” whose second classification information is “rice” in the training data table, second category matrix generation section 117 sets the number of morphemes to “ It may be counted as 3 ".

図１０は、形態素ペアに基づいて生成された第２カテゴリ行列の一具体例を示す図である。第２カテゴリ行列の行は、“第１形態素”、“第２形態素”、“米類”、“麺類”、“パン類”、“肉類”、“野菜類”、・・・等の第１分類情報に対応付けられた第２分類情報が表される。第１カテゴリ行列の列は、第１形態素の列が“Ａ社”、“Ａ社”、“Ａ社”、“Ａ社”、“Ｂ社”、・・・等、第２形態素の列が“ヌードル”、“焼きそば”、“ミニ”、“しょうゆ”・・・等が表される。第１形態素は、形態素ペアのうち、先頭の形態素を表す。第２形態素は、形態素ペアのうち末尾の形態素を表す。 FIG. 10 is a diagram showing a specific example of the second category matrix generated based on morpheme pairs. The rows of the second category matrix are the first morphemes, the second morphemes, the rices, the noodles, the breads, the meats, the vegetables, etc. Second classification information associated with the classification information is represented. The columns of the first category matrix have columns of the first morpheme "company A", "company A", "company A", "company A", "company A", "company B", etc. "Noodles", "Yosoba", "Mini", "Soy sauce" ... etc. are displayed. The first morpheme represents the top morpheme of the morpheme pair. The second morpheme represents the morpheme at the end of the morpheme pair.

図９に示される例では、第１カテゴリ行列の最上段の行では、第１形態素の値が“Ａ社”、第２形態素の値が“ヌードル”、米類の値が“０”、麺類の値が“９”、パン類の値が“０”、肉類の値が“０”、野菜類の値が“０”、・・・である。従って、第２カテゴリ行列の最上段の行によると、形態素ペアが“Ａ社”、“ヌードル”である場合、米類である可能性は“０”、麺類である可能性は“９”、パン類である可能性は“０”、肉類である可能性は“０”、野菜類である可能性は“０”、・・・であることがわかる。 In the example shown in FIG. 9, in the top row of the first category matrix, the value of the first morpheme is “A company”, the value of the second morpheme is “noodle”, the value of rice is “0”, and noodles The value of “9”, the value of bread is “0”, the value of meat is “0”, the value of vegetables is “0”,. Therefore, according to the top row of the second category matrix, if the morpheme pair is "company A" or "noodle", the possibility of being rice is "0", the possibility of being noodles is "9", It is understood that the possibility of being bread is “0”, the possibility of being meat is “0”, the possibility of being vegetables is “0”,.

マスタ情報検索部１１８は、マスタ情報テーブルが備える所定のカラムに、文字列情報が含まれるか否かを判定する。マスタ情報検索部１１８は、所定のカラムに記録される文字列と文字列情報とが一致する場合、文字列情報が含まれると判定する。マスタ情報検索部１１８は、所定のカラムに記録される文字列と文字列情報とが一致しない場合、文字列情報が含まれないと判定する。所定のカラムは、ユーザによって予め指定される。ユーザは、例えば商品の購入者であってもよいし、情報処理装置１００を運用する者であってもよい。本実施形態の場合、所定のカラムは、訓練データテーブルが備える“商品名”カラムである。所定のカラムは１つであってもよいし、複数であってもよい。 The master information search unit 118 determines whether character string information is included in a predetermined column included in the master information table. When the character string recorded in the predetermined column matches the character string information, the master information search unit 118 determines that the character string information is included. When the character string recorded in the predetermined column and the character string information do not match, the master information search unit 118 determines that the character string information is not included. The predetermined columns are previously designated by the user. The user may be, for example, a purchaser of a product or a person who operates the information processing apparatus 100. In the case of the present embodiment, the predetermined column is the "product name" column provided in the training data table. The number of predetermined columns may be one or more.

第１分類情報決定部１１９は、形態素又は形態素ペアとカテゴリ行列記憶部１０７に記憶された第１カテゴリ行列に基づいて、文字列情報に対する第１分類情報を決定する。まず、第１分類情報決定部１１９が、形態素に基づく第１カテゴリ行列に基づいて第１分類情報を決定する場合について説明する。第１分類情報決定部１１９は、第１カテゴリ行列から形態素解析部１１１によって解析された形態素の行を特定する。第１分類情報決定部１１９は、第１カテゴリ行列の列毎に特定された形態素の行の値を加算して合計値を算出する。第１分類情報決定部１１９は、算出された合計値のうち、最も大きな値を第１分類情報に決定する。 The first classification information determination unit 119 determines first classification information for the character string information based on the morphemes or morpheme pairs and the first category matrix stored in the category matrix storage unit 107. First, the case where the first classification information determination unit 119 determines the first classification information based on the first category matrix based on the morpheme will be described. The first classification information determination unit 119 specifies a row of morphemes analyzed by the morpheme analysis unit 111 from the first category matrix. The first classification information determination unit 119 adds the values of the rows of morphemes specified for each column of the first category matrix to calculate the total value. The first classification information determination unit 119 determines the largest value among the calculated total values as the first classification information.

図７に基づいて、文字列情報“塩ラーメン”が与えられた場合について説明する。形態素解析部１１１は、“塩ラーメン”を形態素“塩”及び“ラーメン”に分解する。第１分類情報決定部１１９は、形態素に基づく第１カテゴリ行列から、形態素“塩”及び“ラーメン”に関する行を特定する。形態素“塩”の場合、食品の割合は“０．４２”、飲料の割合は“０．３３”、書籍の割合は“０．０１”、文具の割合は“０．０２”、日用品の割合は“０．０４”・・・である。形態素“ラーメン”の場合、食品の割合は“０．４８”、飲料の割合は“０．２２”、書籍の割合は“０．０８”、文具の割合は“０．０５”、日用品の割合は“０．０６”・・・である。したがって、第１分類情報決定部１１９は、各列毎に和を算出する。図７によると、食品の値が“０．９０”となり、最大となるため、第１分類情報決定部１１９は、文字列情報“塩ラーメン”の第１分類情報を“食品”に決定する。 The case where the character string information "salt ramen" is given will be described based on FIG. The morphological analysis unit 111 decomposes “salt ramen” into morpheme “salt” and “ramen”. The first classification information determination unit 119 specifies the rows related to the morpheme “salt” and “ramen” from the first category matrix based on the morpheme. In the case of morpheme "salt", the proportion of food is "0.42", the proportion of beverage is "0.33", the proportion of books is "0.01", the proportion of stationery is "0.02", the proportion of daily necessities Is "0.04". In the case of the morpheme "ramen", the proportion of food is "0.48", the proportion of beverages is "0.22", the proportion of books is "0.08", the proportion of stationery is "0.05", the proportion of daily necessities Is "0.06". Therefore, the first classification information determination unit 119 calculates the sum for each column. According to FIG. 7, since the value of the food is “0.90”, which is the maximum, the first classification information determination unit 119 determines the first classification information of the character string information “salt ramen” as “food”.

次に、第１分類情報決定部１１９が、形態素ペアに基づく第１カテゴリ行列に基づいて第１分類情報を決定する場合について説明する。第１分類情報決定部１１９は、第１カテゴリ行列から形態素ペア決定部１１３によって決定された形態素ペアの行を特定する。第１分類情報決定部１１９は、特定された行の各列の値のうち、最も大きな値を第１分類情報に決定する。 Next, the case where the first classification information determination unit 119 determines the first classification information based on the first category matrix based on morpheme pairs will be described. The first classification information determination unit 119 identifies, from the first category matrix, rows of morpheme pairs determined by the morpheme pair determination unit 113. The first classification information determination unit 119 determines the largest value among the values of each column of the identified row as the first classification information.

図８に基づいて、文字列情報“Ａ社塩ヌードル”が与えられた場合について説明する。形態素解析部１１１は、“Ａ社塩ヌードル”を形態素“Ａ社”、“塩”及び“ヌードル”に分解する。形態素ペア決定部１１３は、形態素ペアを“Ａ社”及び“ヌードル”に決定する。第１分類情報決定部１１９は、形態素ペアに基づく第１カテゴリ行列から、形態素ペア“Ａ社”及び“ヌードル”に関する行を特定する。形態素ペア“Ａ社”及び“ヌードル”の場合、食品の割合は“０．９７”、飲料の割合は“０．０１”、書籍の割合は“０．０１”、文具の割合は“０．００”、日用品の割合は“０．００”・・・である。したがって、図８によると、食品の値が“０．９７”となり、最大となるため、第１分類情報決定部１１９は、文字列情報“Ａ社塩ヌードル”の第１分類情報を“食品”に決定する。 Based on FIG. 8, the case where the character string information "A company salt noodles" is given will be described. The morphological analysis unit 111 breaks down the "A company salt noodle" into the morpheme "A company", "salt" and "noodle". The morpheme pair determining unit 113 determines the morpheme pair to be “company A” and “noodle”. The first classification information determination unit 119 specifies the rows related to the morpheme pairs “Company A” and “Noodle” from the first category matrix based on the morpheme pairs. In the case of the morpheme pair “Company A” and “noodle”, the ratio of food is “0.97”, the ratio of beverage is “0.01”, the ratio of books is “0.01”, and the ratio of stationery is “0. 00 ", the ratio of daily necessities is" 0.00 ".... Therefore, according to FIG. 8, the value of the food is “0.97”, which is the maximum. Therefore, the first classification information determination unit 119 determines that the first classification information of the character string information “A company salt noodle” is “food”. Decide on.

なお、第１分類情報決定部１１９は、第１分類情報を決定するにあたり、形態素ペアの決定結果を優先するように構成されてもよいし、形態素の決定結果を優先するように構成されてもよい。例えば、第１分類情報決定部１１９は、形態素ペアの第１カテゴリ行列で算出された値のうち最も大きな値が第１閾値以上の値である場合、形態素ペアで決定された第１分類情報に決定してもよい。第１閾値は、例えば０．９５であってもよいし、どのような値であってもよい。また、第１分類情報決定部１１９は、形態素ペアに基づく第１カテゴリ行列で算出された値のうち、最も大きな値が第２閾値以上第１閾値未満である場合、形態素に基づく第１カテゴリ行列の算出結果のうち、形態素ペアで決定された第１分類情報に対して所定の数値を加算してもよい。所定の数値は例えば、０．１であってもよいし、形態素ペアに基づく第１カテゴリ行列で算出された値であってもよいし、形態素ペアの第１カテゴリ行列で算出された値に数値計算を施して、求められた値であってもよい。第２閾値は、例えば、０．６６であってもよいし、どのような値であってもよい。第１閾値及び第２閾値は、予めユーザによって指定されてもよい。 The first classification information determination unit 119 may be configured to prioritize the determination result of the morpheme pair in determining the first classification information, or may be configured to prioritize the determination result of the morpheme. Good. For example, when the largest value among the values calculated by the first category matrix of the morpheme pair is the first threshold value or more, the first classification information determination unit 119 determines the first classification information determined by the morpheme pair as the first classification information. You may decide. The first threshold may be, for example, 0.95 or any value. In addition, when the largest value among the values calculated by the first category matrix based on the morpheme pair is greater than or equal to the second threshold and less than the first threshold, the first classification information determination unit 119 determines the first category matrix based on the morpheme. A predetermined numerical value may be added to the first classification information determined by the morpheme pair among the calculation results of (4). The predetermined numerical value may be, for example, 0.1, may be a value calculated by the first category matrix based on morpheme pairs, or may be a value calculated by the first category matrix of morpheme pairs It may be a calculated value after the calculation. The second threshold may be, for example, 0.66 or any value. The first threshold and the second threshold may be designated by the user in advance.

第２分類情報決定部１２０は、形態素又は形態素ペアとカテゴリ行列記憶部１０７に記憶された第２カテゴリ行列に基づいて、文字列情報に対する第２分類情報を決定する。まず、第２分類情報決定部１２０が、形態素に基づく第２カテゴリ行列に基づいて第２分類情報を決定する場合について説明する。第２分類情報決定部１２０は、第２カテゴリ行列から形態素解析部１１１によって解析された形態素の行を特定する。第２分類情報決定部１２０は、第２カテゴリ行列の列毎に特定された形態素の行の値を加算して合計値を算出する。第２分類情報決定部１２０は、算出された合計値のうち、最も大きな値を第２分類情報に決定する。 The second classification information determination unit 120 determines second classification information for the character string information based on the morphemes or morpheme pairs and the second category matrix stored in the category matrix storage unit 107. First, the case where the second classification information determination unit 120 determines the second classification information based on the second category matrix based on the morpheme will be described. The second classification information determination unit 120 specifies the rows of morphemes analyzed by the morpheme analysis unit 111 from the second category matrix. The second classification information determination unit 120 adds the values of the rows of morphemes specified for each column of the second category matrix to calculate a total value. The second classification information determination unit 120 determines the largest value among the calculated total values as the second classification information.

図９に基づいて、文字列情報“塩ラーメン”が与えられた場合について説明する。形態素解析部１１１は、“塩ラーメン”を形態素“塩”及び“ラーメン”に分解する。第２分類情報決定部１２０は、形態素に基づく第２カテゴリ行列から、形態素“塩”及び“ラーメン”に関する行を特定する。形態素“塩”の場合、米類の値は“２”、麺類の値は“８”、パン類の値は“３”、肉類の値は“２”、野菜類の値は“３”・・・である。形態素“ラーメン”の場合、米類の値は“１”、麺類の値は“１３”、パン類の値は“０”、肉類の値は“４”、野菜類の値は“５”・・・である。したがって、第２分類情報決定部１２０は、列毎に和を算出する。図９によると、麺類の値が“２１”となり、最大となるため、第２分類情報決定部１２０は、文字列情報“塩ラーメン”の第２分類情報を“麺類”に決定する。 The case where the character string information "salt ramen" is given will be described based on FIG. The morphological analysis unit 111 decomposes “salt ramen” into morpheme “salt” and “ramen”. The second classification information determination unit 120 identifies the rows related to the morpheme “salt” and “ramen” from the second category matrix based on the morpheme. In the case of morpheme "salt", the value of rice is "2", the value of noodles is "8", the value of bread is "3", the value of meat is "2", the value of vegetables is "3". . In the case of the morpheme "ramen", the value of rice is "1", the value of noodles is "13", the value of bread is "0", the value of meat is "4", the value of vegetables is "5". . Therefore, the second classification information determination unit 120 calculates the sum for each column. According to FIG. 9, since the value of noodles is “21” and is the maximum, the second classification information determination unit 120 determines the second classification information of the character string information “salt ramen” as “noodles”.

次に、第２分類情報決定部１２０が、形態素ペアに基づく第２カテゴリ行列に基づいて第２分類情報を決定する場合について説明する。第２分類情報決定部１２０は、第２カテゴリ行列から形態素ペア決定部１１３によって決定された形態素ペアの行を特定する。第２分類情報決定部１２０は、特定された行の各列の値のうち、最も大きな値を第２分類情報に決定する。 Next, the case where the second classification information determination unit 120 determines the second classification information based on the second category matrix based on morpheme pairs will be described. The second classification information determination unit 120 specifies the rows of the morpheme pair determined by the morpheme pair determination unit 113 from the second category matrix. The second classification information determination unit 120 determines the largest value among the values of each column of the identified row as the second classification information.

図１０に基づいて、文字列情報“Ａ社塩ヌードル”が与えられた場合について説明する。文字列情報“Ａ社塩ヌードル”の形態素ペアは、“Ａ社”及び“ヌードル”である。第２分類情報決定部１２０は、形態素ペアに基づく第２カテゴリ行列から、形態素ペア“Ａ社”及び“ヌードル”に関する行を特定する。形態素ペア“Ａ社”及び“ヌードル”の場合、米類の値は“０”、麺類の値は“９”、パン類の値は“０”、肉類の値は“０”、野菜類の値は“０”・・・である。したがって、図１０によると、麺類の値が“９”となり、最大となるため、第２分類情報決定部１２０は、文字列情報“Ａ社塩ヌードル”の第２分類情報を“麺類”に決定する。 The case where the character string information "A company salt noodles" is given will be described based on FIG. The morpheme pairs of the character string information "A company salt noodle" are "A company" and "noodle". The second classification information determination unit 120 specifies the rows related to the morpheme pairs “Company A” and “Noodle” from the second category matrix based on the morpheme pairs. In the case of the morpheme pair “Company A” and “noodle”, the value of rice is “0”, the value of noodles is “9”, the value of bread is “0”, the value of meat is “0”, the value of vegetables is The value is "0". Therefore, according to FIG. 10, since the value of noodles is “9” and is the maximum, the second classification information determination unit 120 determines the second classification information of the character string information “A company salt noodles” as “noodles”. Do.

なお、第２分類情報決定部１２０は、第２分類情報を決定するにあたり、形態素ペアの決定結果を優先するように構成されてもよいし、形態素の決定結果を優先するように構成されてもよい。例えば、第２分類情報決定部１２０は、形態素ペアの第２カテゴリ行列で、最大の値となった第２分類情報の割合が、第３閾値以上の値を有する場合、形態素ペアで決定された第２分類情報に決定してもよい。第３閾値は、例えば０．９５であってもよいし、どのような値であってもよい。最大の値となった第２分類情報の割合とは、最大の値となった第２分類情報の値を、各第２分類情報の値の総和で除算した値である。また、第２分類情報決定部１２０は、形態素ペアの第２カテゴリ行列で、最大の値となった第２分類情報の割合が、第３閾値以上、第４閾値未満である場合、形態素ペアで決定された第２分類情報に対して、形態素の第２カテゴリ行列の算出結果に対して、所定の数値を加算してもよい。所定の数値は例えば、１００であってもよいし、形態素ペアの第２カテゴリ行列で算出された値であってもよいし、形態素ペアの第２カテゴリ行列で算出された値に数値計算を施して、求められた値であってもよい。第４閾値は、例えば、０．６６であってもよいし、どのような値であってもよい。第３閾値及び第４閾値は、予めユーザによって指定されてもよい。 In addition, when determining the second classification information, the second classification information determination unit 120 may be configured to prioritize the determination result of the morpheme pair, or may be configured to prioritize the determination result of the morpheme. Good. For example, if the second classification information determination unit 120 determines that the ratio of the second classification information having the maximum value in the second category matrix of morpheme pairs has a value equal to or greater than the third threshold, the second classification information determination unit 120 determines the morpheme pair. The second classification information may be determined. The third threshold may be, for example, 0.95, or any value. The ratio of the second classification information having the largest value is a value obtained by dividing the value of the second classification information having the largest value by the sum of the values of the respective second classification information. In addition, the second classification information determination unit 120 is configured to use the morpheme pair when the ratio of the second classification information having the largest value is greater than or equal to the third threshold and less than the fourth threshold in the second category matrix of morpheme pairs. A predetermined numerical value may be added to the calculation result of the second category matrix of morphemes with respect to the determined second classification information. The predetermined numerical value may be, for example, 100, may be a value calculated by the second category matrix of morpheme pairs, or is numerically calculated on the value calculated by the second category matrix of morpheme pairs It may be a calculated value. The fourth threshold may be, for example, 0.66 or any value. The third threshold and the fourth threshold may be designated by the user in advance.

マスタ情報生成部１２１は、マスタ情報レコードを生成する。マスタ情報生成部１２１は、決定された第１分類情報及び第２分類情報と取得された文字列情報に基づいて、マスタ情報レコードを生成する。マスタ情報生成部１２１は、マスタ情報レコードを、マスタ情報テーブルに記録する。 The master information generation unit 121 generates a master information record. The master information generation unit 121 generates a master information record based on the determined first classification information and second classification information and the acquired character string information. The master information generation unit 121 records a master information record in a master information table.

図１１は、第１の実施形態の情報処理装置１００のカテゴリ行列の生成の処理の流れを示すフローチャートである。データ取得部１１０は、訓練データ記憶部１０４から訓練データを取得する（ステップＳ１０１）。形態素解析部１１１は、文字列情報に対して形態素解析を行う（ステップＳ１０２）。形態素置換部１１２は、分解された形態素が置換データテーブルの置換前カラムに記憶されている場合、形態素を置換後カラムの形態素に置換する（ステップＳ１０３）。形態素解析部１１１は、形態素の品詞を判別する。形態素解析部１１１は、“記号”と判別された形態素を削除する（ステップＳ１０４）。形態素ペア決定部１１３は、形態素のうち、文字列情報の先頭及び末尾に位置づけられる形態素の組み合わせを形態素ペアとして決定する（ステップＳ１０５）。 FIG. 11 is a flowchart showing the flow of the process of generating a category matrix of the information processing apparatus 100 according to the first embodiment. The data acquisition unit 110 acquires training data from the training data storage unit 104 (step S101). The morphological analysis unit 111 performs morphological analysis on character string information (step S102). When the decomposed morpheme is stored in the pre-substitution column of the substitution data table, the morpheme substitution unit 112 substitutes the morpheme with the morpheme of the post-substitution column (step S103). The morphological analysis unit 111 determines the part of speech of the morpheme. The morphological analysis unit 111 deletes the morpheme determined as the “symbol” (step S104). The morpheme pair determining unit 113 determines a combination of morphemes positioned at the beginning and the end of the character string information as morpheme pairs among the morphemes (step S105).

第１カテゴリ行列生成部１１６は、形態素及び形態素ペアに基づいて、第１カテゴリ行列を生成する（ステップＳ１０６）。第１カテゴリ行列生成部１１６は、生成した第１カテゴリ行列をカテゴリ行列記憶部１０７に記録する（ステップＳ１０７）。第２カテゴリ行列生成部１１７は、形態素及び形態素ペアと第２分類情報とに基づいて第２カテゴリ行列を生成する（ステップＳ１０８）。第２カテゴリ行列生成部１１７は、生成した第２カテゴリ行列をカテゴリ行列記憶部１０７に記録する（ステップＳ１０９）。 The first category matrix generation unit 116 generates a first category matrix based on the morpheme and the morpheme pair (step S106). The first category matrix generation unit 116 records the generated first category matrix in the category matrix storage unit 107 (step S107). The second category matrix generation unit 117 generates a second category matrix based on the morpheme and the morpheme pair and the second classification information (step S108). The second category matrix generation unit 117 records the generated second category matrix in the category matrix storage unit 107 (step S109).

図１２は、第１の実施形態の情報処理装置１００の分類処理の流れを示すフローチャートである。データ取得部１１０は、通信部１０１又は入力部１０２を介して文字列情報を取得する（ステップＳ２０１）。マスタ情報検索部１１８は、マスタ情報テーブルの所定のカラムに、文字列情報が含まれるか否かを判定する（ステップＳ２０２）。所定のカラムは、例えば、商品名である。文字列情報が含まれる場合（ステップＳ２０２：ＹＥＳ）、情報処理装置１００は、処理を終了する。文字列情報が含まれない場合（ステップＳ２０２：ＮＯ）、分類部１１５は、分類情報を決定する（ステップＳ２０３）。 FIG. 12 is a flow chart showing the flow of classification processing of the information processing apparatus 100 of the first embodiment. The data acquisition unit 110 acquires character string information via the communication unit 101 or the input unit 102 (step S201). The master information search unit 118 determines whether character string information is included in a predetermined column of the master information table (step S202). The predetermined column is, for example, a trade name. When character string information is included (step S202: YES), the information processing apparatus 100 ends the process. When character string information is not included (step S202: NO), the classification unit 115 determines classification information (step S203).

図１３及び図１４は、第１の実施形態の情報処理装置１００の分類情報の決定の処理の流れを示すフローチャートである。データ取得部１１０は、通信部１０１又は入力部１０２を介して文字列情報を取得する（ステップＳ３０１）。形態素解析部１１１は、文字列情報に対して形態素解析を行う（ステップＳ３０２）。形態素置換部１１２は、分解された形態素が置換データテーブルの置換前カラムに記憶されている場合、形態素を置換後カラムの形態素に置換する（ステップＳ３０３）。形態素解析部１１１は、形態素の品詞を判別する。形態素解析部１１１は、“記号”と判別された形態素を削除する（ステップＳ３０４）。形態素ペア決定部１１３は、形態素のうち、文字列情報の先頭及び末尾に位置づけられる形態素の組み合わせを形態素ペアとして決定する（ステップＳ３０５）。 13 and 14 are flowcharts showing the flow of the process of determining classification information of the information processing apparatus 100 according to the first embodiment. The data acquisition unit 110 acquires character string information via the communication unit 101 or the input unit 102 (step S301). The morphological analysis unit 111 performs morphological analysis on character string information (step S302). When the decomposed morpheme is stored in the pre-substitution column of the substitution data table, the morpheme substitution unit 112 substitutes the morpheme with the morpheme of the post-substitution column (step S303). The morphological analysis unit 111 determines the part of speech of the morpheme. The morpheme analysis unit 111 deletes the morpheme determined as the “symbol” (step S304). The morpheme pair determining unit 113 determines a combination of morphemes positioned at the beginning and the end of the character string information as morpheme pairs among the morphemes (step S305).

第１分類情報決定部１１９は、形態素及び形態素ペアに基づく第１カテゴリ行列を取得する（ステップＳ３０６）。第１分類情報決定部１１９は、取得した第１カテゴリ行列から形態素解析部１１１によって解析された形態素の行を特定し、列毎に各行の値の合計値を算出する（ステップＳ３０７）。第１分類情報決定部１１９は、形態素ペアに基づいて生成された第１カテゴリ行列から形態素ペア決定部１１３によって決定された形態素ペアの行を特定し、各列の値の中で最大の値を特定する。 The first classification information determination unit 119 acquires a first category matrix based on morphemes and morpheme pairs (step S306). The first classification information determination unit 119 identifies rows of morphemes analyzed by the morpheme analysis unit 111 from the acquired first category matrix, and calculates a total value of values of each row for each column (step S307). The first classification information determination unit 119 identifies the row of morpheme pairs determined by the morpheme pair determination unit 113 from the first category matrix generated based on the morpheme pairs, and the maximum value among the values of each column is determined. Identify.

第１分類情報決定部１１９は、形態素ペアの第１カテゴリ行列で算出された値のうち最も大きな値が第１閾値以上の値であるか否かを判定する（ステップＳ３０８）。本フローチャートでは、第１閾値は０．９５である。値が０．９５以上である場合（ステップＳ２０８：ＹＥＳ）、第１分類情報決定部１１９は、形態素ペアの第１カテゴリ行列に保持される値のうち、特定した形態素ペアの行の中で最も大きな値を有する第１分類情報を、文字列情報の第１分類情報に決定し、ステップＳ３１３に遷移する（ステップＳ３０９）。値が０．９５以上でない場合（ステップＳ３０８：ＮＯ）、第１分類情報決定部１１９は、形態素ペアの第１カテゴリ行列に保持される値のうち、特定した形態素ペアの行の中で最も大きな値が第２閾値以上の値であるか否かを判定する（ステップＳ３１０）。本フローチャートでは、第２閾値は０．６６である。値が０．６６以上である場合（ステップＳ３１０：ＹＥＳ）、第１分類情報決定部１１９は、形態素の第１カテゴリ行列で算出された値のうち、形態素ペアで決定された第１分類情報に対して所定の数値を加算する（ステップＳ３１１）。第１分類情報決定部１１９は、算出された合計値のうち、最も大きな値を第１分類情報に決定する（ステップＳ３１２）。 The first classification information determination unit 119 determines whether or not the largest value among the values calculated by the first category matrix of morpheme pairs is a value equal to or more than the first threshold (step S308). In the flowchart, the first threshold is 0.95. If the value is 0.95 or more (step S208: YES), the first classification information determination unit 119 determines, among the values held in the first category matrix of morpheme pairs, one of the rows of the identified morpheme pairs The first classification information having a large value is determined as the first classification information of the character string information, and the process proceeds to step S313 (step S309). If the value is not 0.95 or more (step S308: NO), the first classification information determination unit 119 determines, among the values held in the first category matrix of morpheme pairs, the largest of the rows of the identified morpheme pairs It is determined whether the value is equal to or greater than the second threshold (step S310). In the flowchart, the second threshold is 0.66. If the value is equal to or greater than 0.66 (step S310: YES), the first classification information determination unit 119 selects the first classification information determined by the morpheme pair among the values calculated by the first category matrix of morphemes. Then, a predetermined numerical value is added (step S311). The first classification information determination unit 119 determines the largest value among the calculated total values as the first classification information (step S312).

第２分類情報決定部１２０は、形態素及び形態素ペアと決定された第１分類情報に基づいて、第２分類情報を決定する。第２分類情報決定部１２０は、形態素及び形態素ペアと第２分類情報とに基づいて第２カテゴリ行列を取得する（ステップＳ３１３）。第２分類情報決定部１２０は、形態素に基づいて生成された第２カテゴリ行列から形態素解析部１１１によって解析された形態素の行を特定し、列毎に値の合計値を算出する（ステップＳ３１４）。第２分類情報決定部１２０は、形態素ペアに基づいて生成された第２カテゴリ行列から形態素ペア決定部１１３によって決定された形態素ペアの行を特定し、各列の値のうち、最大の値を特定する。 The second classification information determination unit 120 determines second classification information based on the morpheme and the first classification information determined as the morpheme pair. The second classification information determination unit 120 acquires a second category matrix based on the morpheme and the morpheme pair and the second classification information (step S313). The second classification information determination unit 120 identifies the rows of the morphemes analyzed by the morpheme analysis unit 111 from the second category matrix generated based on the morphemes, and calculates the total value of the values for each column (step S314) . The second classification information determination unit 120 specifies the rows of the morpheme pairs determined by the morpheme pair determination unit 113 from the second category matrix generated based on the morpheme pairs, and the maximum value among the values of each column. Identify.

第２分類情報決定部１２０は、形態素ペアの第２カテゴリ行列で、最大の値となった第２分類情報の割合が、第３閾値以上の値であるか否かを判定する（ステップＳ３１５）。本フローチャートでは、第３閾値は０．９５である。値が０．９５以上である場合（ステップＳ３１５：ＹＥＳ）、第２分類情報決定部１２０は、形態素ペアの第２カテゴリ行列で算出された値のうち、最も大きな値を有する第２分類情報を、文字列情報の第２分類情報に決定し、ステップＳ３２０に遷移する（ステップＳ３１６）。値が０．９５以上でない場合（ステップＳ３１５：ＮＯ）、第２分類情報決定部１２０は、形態素ペアの第２カテゴリ行列で算出された値のうち最も大きな値が第４閾値以上の値であるか否かを判定する（ステップＳ３１７）。本フローチャートでは、第４閾値は０．６６である。値が０．６６以上である場合（ステップＳ３１７：ＹＥＳ）、第２分類情報決定部１２０は、形態素の第２カテゴリ行列の算出結果のうち、形態素ペアで決定された第２分類情報に対して所定の数値を加算する（ステップＳ３１８）。第２分類情報決定部１２０は、算出された合計値のうち、最も大きな値を有する第２分類情報を、文字列情報の第２分類情報として決定する（ステップＳ３１９）。 The second classification information determination unit 120 determines whether or not the ratio of the second classification information having the largest value in the second category matrix of morpheme pairs is a value equal to or more than the third threshold (step S315). . In the flowchart, the third threshold is 0.95. If the value is 0.95 or more (step S315: YES), the second classification information determination unit 120 selects the second classification information having the largest value among the values calculated in the second category matrix of morpheme pairs. The second classification information of the character string information is determined, and the process proceeds to step S320 (step S316). If the value is not 0.95 or more (step S315: NO), the second classification information determination unit 120 determines that the largest value among the values calculated in the second category matrix of morpheme pairs is the value equal to or more than the fourth threshold. It is determined whether or not (step S317). In the flowchart, the fourth threshold is 0.66. If the value is equal to or greater than 0.66 (step S317: YES), the second classification information determination unit 120 determines the second classification information determined by the morpheme pair among the calculation results of the second category matrix of morphemes. A predetermined numerical value is added (step S318). The second classification information determination unit 120 determines, as the second classification information of the character string information, second classification information having the largest value among the calculated total values (step S319).

マスタ情報生成部１２１は、決定された第１分類情報及び第２分類情報と取得された文字列情報とに基づいて、マスタ情報レコードを生成する（ステップＳ３２０）。マスタ情報生成部１２１は、マスタ情報レコードを、マスタ情報テーブルに記録する（ステップＳ３２１）。 The master information generation unit 121 generates a master information record based on the determined first classification information and second classification information and the acquired character string information (step S320). The master information generation unit 121 records the master information record in the master information table (step S321).

このように構成された情報処理装置１００では、第１カテゴリ行列生成部１１６と第２カテゴリ行列生成部１１７とが、訓練データに基づいて、第１カテゴリ行列と第２カテゴリ行列とを生成する。形態素解析部１１１が、与えられた文字列情報を形態素に分解する。形態素ペア決定部１１３は、文字列情報の先頭の形態素と、末尾の形態素と、を形態素ペアとして決定する。第１分類情報決定部１１９は、形態素及び形態素ペアと、第１カテゴリ行列と、に基づいて文字列情報の特徴を示す第１分類情報を決定する。さらに第２分類情報決定部１２０は、形態素及び形態素ペアと、決定された第１分類情報と、第２カテゴリ行列と、に基づいて第１分類情報に対応付けられた第２分類情報を決定する。したがって、より簡単に情報を分類することが可能となる。 In the information processing apparatus 100 configured as described above, the first category matrix generation unit 116 and the second category matrix generation unit 117 generate the first category matrix and the second category matrix based on the training data. The morphological analysis unit 111 decomposes the given character string information into morphemes. The morpheme pair determination unit 113 determines the morpheme at the beginning of the character string information and the morpheme at the end as a morpheme pair. The first classification information determination unit 119 determines first classification information indicating the feature of the character string information based on the morpheme and the morpheme pair and the first category matrix. Furthermore, the second classification information determination unit 120 determines second classification information associated with the first classification information based on the morpheme and the morpheme pair, the determined first classification information, and the second category matrix. . Therefore, it is possible to classify information more easily.

（第２の実施形態）
次に、第２の実施形態における情報処理システム１について説明する。図１５は、第２実施形態の情報処理システム１のシステム構成を表すシステム構成図である。情報処理システム１は、ネットワーク４００に設けられる。情報処理システム１は、入力された文字列情報に対して、任意の文字列情報の特徴を示す情報を付与する。情報処理システム１は、ネットワーク４００を介して互いに通信可能に接続される情報処理装置１００ａ及び端末装置３００を備える。ネットワーク４００は、どのようなネットワークで構築されてもよい。例えば、ネットワーク４００は、インターネットで構成されてもよい。 Second Embodiment
Next, an information processing system 1 according to the second embodiment will be described. FIG. 15 is a system configuration diagram showing a system configuration of the information processing system 1 of the second embodiment. The information processing system 1 is provided in the network 400. The information processing system 1 adds information indicating the feature of any character string information to the input character string information. The information processing system 1 includes an information processing device 100 a and a terminal device 300 which are communicably connected to each other via a network 400. The network 400 may be established by any network. For example, the network 400 may be configured by the Internet.

第２の実施形態における情報処理装置１００ａは、制御部１０９の代わりに制御部１０９ａを備え、分類部１１５の代わりに分類部１１５ａを備える点で第１の実施形態とは異なるが、それ以外の構成は同じである。以下、第１の実施形態と異なる点について説明する。 The information processing apparatus 100a according to the second embodiment is different from the first embodiment in that the information processing apparatus 100a includes a control unit 109a instead of the control unit 109 and includes a classification unit 115a instead of the classification unit 115. The configuration is the same. The differences from the first embodiment will be described below.

制御部１０９ａは、情報処理装置１００ａの各部の動作を制御する。制御部１０９ａは、例えばプロセッサ及びメモリを備えた装置により実行される。制御部１０９ａは、分類プログラム又は学習プログラムを実行することによって、データ取得部１１０、形態素解析部１１１、形態素置換部１１２、形態素ペア決定部１１３、学習部１１４及び分類部１１５ａとして機能する。 The control unit 109a controls the operation of each unit of the information processing apparatus 100a. The control unit 109a is executed by, for example, an apparatus including a processor and a memory. The control unit 109a functions as a data acquisition unit 110, a morpheme analysis unit 111, a morpheme replacement unit 112, a morpheme pair determination unit 113, a learning unit 114, and a classification unit 115a by executing a classification program or a learning program.

データ取得部１１０ａは、文字列情報を取得する。データ取得部１１０ａは、文字列情報を端末装置３００から取得する。データ取得部１１０ａは、取得した文字列情報を分類部１１５ａに出力する。データ取得部１１０ａは、取得した文字列情報を訓練データ記憶部１０４に記録してもよい。 The data acquisition unit 110a acquires character string information. The data acquisition unit 110 a acquires character string information from the terminal device 300. The data acquisition unit 110a outputs the acquired character string information to the classification unit 115a. The data acquisition unit 110a may record the acquired character string information in the training data storage unit 104.

分類部１１５ａは、分類プログラムを実行することで、マスタ情報検索部１１８ａ、第１分類情報決定部１１９、第２分類情報決定部１２０及びマスタ情報生成部１２１として機能する。 The classification unit 115a functions as a master information search unit 118a, a first classification information determination unit 119, a second classification information determination unit 120, and a master information generation unit 121 by executing a classification program.

マスタ情報検索部１１８ａは、第１実施形態のマスタ情報検索部１１８と同様の処理でマスタ情報テーブルが備える所定のカラムに、文字列情報が含まれるか否かを判定する。マスタ情報検索部１１８ａは、マスタ情報記憶部１０８に記憶されるマスタ情報テーブルに基づいて、文字列情報が含まれるか否かを判定する。マスタ情報テーブルに文字列情報が含まれる場合、マスタ情報検索部１１８ａは、文字列情報に対応付けられた第１分類情報及び第２分類情報を端末装置３００に送信する。マスタ情報テーブルに文字列情報が含まれない場合、分類部１１５ａは、分類情報の決定処理を実行する。マスタ情報検索部１１８ａは、分類情報の決定処理の実行結果によって決定された第１分類情報及び第２分類情報を端末装置３００に送信する。 The master information search unit 118a determines whether character string information is included in a predetermined column included in the master information table in the same process as the master information search unit 118 of the first embodiment. Based on the master information table stored in master information storage unit 108, master information search unit 118a determines whether or not character string information is included. When character string information is included in the master information table, the master information searching unit 118a transmits, to the terminal device 300, the first classification information and the second classification information associated with the character string information. When character string information is not included in the master information table, the classification unit 115a executes classification information determination processing. The master information search unit 118a transmits, to the terminal device 300, the first classification information and the second classification information determined by the execution result of the classification information determination process.

端末装置３００は、バスで接続されたプロセッサやメモリや補助記憶装置などを備え、分類要求プログラムを実行することによって、通信部３０１、入力部３０２、表示部３０３及び制御部３０４を備える装置として機能する。なお、端末装置３００の各機能の全て又は一部は、ＡＳＩＣやＰＬＤやＦＰＧＡ等のハードウェアを用いて実現されてもよい。分類プログラム又は学習プログラムプログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置である。分類プログラム又は学習プログラムプログラムは、電気通信回線を介して送信されてもよい。 The terminal device 300 includes a processor, a memory, an auxiliary storage device, and the like connected by a bus, and functions as a device including the communication unit 301, the input unit 302, the display unit 303, and the control unit 304 by executing a classification request program. Do. Note that all or part of the functions of the terminal device 300 may be realized using hardware such as an ASIC, a PLD, or an FPGA. The classification program or the learning program program may be recorded on a computer readable recording medium. The computer readable recording medium is, for example, a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in a computer system. The classification program or the learning program program may be transmitted via a telecommunication line.

通信部３０１は、ネットワークインタフェースである。通信部３０１はネットワーク４００を介して、外部の装置と通信する。外部の装置とは、例えば、情報処理装置１００ａ等の装置であってもよい。通信部３０１は、例えば無線ＬＡＮ、有線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ又はＬＴＥ等の通信方式で通信してもよい。 The communication unit 301 is a network interface. The communication unit 301 communicates with an external device via the network 400. The external device may be, for example, a device such as the information processing device 100a. The communication unit 301 may communicate by a communication method such as wireless LAN, wired LAN, Bluetooth, or LTE, for example.

入力部３０２は、タッチパネル、マウス及びキーボード等の入力装置を用いて構成される。入力部３０２は、入力装置を端末装置３００に接続するためのインタフェースであってもよい。この場合、入力部３０２は、入力装置において入力された入力信号から入力データ（例えば、端末装置３００に対する指示を示す指示情報）を生成し、端末装置３００に入力する。 The input unit 302 is configured using an input device such as a touch panel, a mouse, and a keyboard. The input unit 302 may be an interface for connecting an input device to the terminal device 300. In this case, the input unit 302 generates input data (for example, instruction information indicating an instruction to the terminal device 300) from an input signal input to the input device, and inputs the input data to the terminal device 300.

表示部３０３は、ＣＲＴディスプレイ、液晶ディスプレイ、有機ＥＬディスプレイ等の出力装置である。表示部３０３は、出力装置を端末装置３００に接続するためのインタフェースであってもよい。この場合、表示部３０３は、映像データから映像信号を生成し自身に接続されている映像出力装置に映像信号を出力する。 The display unit 303 is an output device such as a CRT display, a liquid crystal display, or an organic EL display. The display unit 303 may be an interface for connecting an output device to the terminal device 300. In this case, the display unit 303 generates a video signal from the video data and outputs the video signal to a video output device connected to itself.

制御部３０４は、端末装置３００の各部の動作を制御する。制御部３０４は、例えばプロセッサ及びメモリを備えた装置により実行される。制御部３０４は、分類要求プログラムを実行することによって、分類要求部３０５を備える装置として機能する。 The control unit 304 controls the operation of each unit of the terminal device 300. The control unit 304 is executed by, for example, an apparatus including a processor and a memory. The control unit 304 functions as an apparatus including the classification request unit 305 by executing the classification request program.

分類要求部３０５は、ユーザから入力部３０２を介して文字列情報を受付ける。分類要求部３０５は、受け付けた文字列情報を情報処理装置１００ａに送信する。分類要求部３０５は、情報処理装置１００ａから、第１分類情報及び第２分類情報を受信する。分類要求部３０５は、受信した第１分類情報及び第２分類情報を表示部３０３に表示させる。 The classification request unit 305 receives character string information from the user via the input unit 302. The classification request unit 305 transmits the received character string information to the information processing apparatus 100a. The classification request unit 305 receives the first classification information and the second classification information from the information processing apparatus 100a. The classification request unit 305 causes the display unit 303 to display the received first classification information and second classification information.

＜変形例＞
本実施形態では、形態素ペアは、文字列情報の先頭及び末尾の形態素であるとして説明したが、形態素ペアは、文字列情報の先頭及び末尾に限定されない。例えば、形態素ペア決定部１１３は、文字列情報の先頭から１番目と２番目との形態素を形態素ペアとして決定してもよい。 <Modification>
In the present embodiment, the morpheme pair is described as being the morpheme at the beginning and the end of the character string information, but the morpheme pair is not limited to the beginning and the end of the character string information. For example, the morpheme pair determining unit 113 may determine the first and second morphemes from the beginning of the character string information as a morpheme pair.

形態素ペア決定部１１３は、文字列情報が分解された形態素のうち、文字列情報の特徴を最も表す形態素を特定する。文字列情報の特徴を最も表す形態素とは、文字列情報と形態素とを比較して、文字列情報と意味が近い形態素である。形態素ペア決定部１１３は、文字列情報と意味が近い形態素を特定するに当たり、公知の方法を用いてもよい。形態素ペア決定部１１３は、特定した形態素と意味が近い他の形態素を特定する。形態素ペア決定部１１３は、特定した形態素と、特定した形態素と意味が近い他の形態素と、を形態素ペアとして決定する。形態素ペア決定部１１３は、形態素と意味が近い他の形態素を特定するに当たり、公知の方法を用いてもよい。 The morpheme pair determination unit 113 specifies a morpheme that most represents a feature of the character string information among morphemes in which the character string information is decomposed. The morpheme that most represents the feature of the character string information is a morpheme whose meaning is similar to that of the character string information by comparing the character string information and the morpheme. The morpheme pair determination unit 113 may use a known method in identifying morphemes whose meanings are similar to character string information. The morpheme pair determining unit 113 identifies another morpheme whose meaning is similar to the identified morpheme. The morpheme pair determining unit 113 determines the identified morpheme and another morpheme whose meaning is similar to the identified morpheme as a morpheme pair. The morpheme pair determining unit 113 may use a known method in identifying another morpheme having a meaning similar to that of the morpheme.

本実施形態では、訓練データは訓練データ記憶部１０４に記憶されているが、訓練データは、必要に応じて通信部１０１を介して外部から入力されてもよいし、入力部１０２を介してユーザから受け付けてもよい。 In the present embodiment, the training data is stored in the training data storage unit 104, but the training data may be input from the outside via the communication unit 101 as needed, or the user via the input unit 102. You may accept from

本実施形態では、分類処理と学習処理とは、同一の情報処理装置１００又は情報処理装置１００ａ内で実行されるものとして説明したが、これに限定されない。例えば、分類処理を実行する分類装置と、学習処理を実行する学習装置と、でそれぞれ構成されてもよいし、クラウドコンピューティング上に構成されてもよい。 In the present embodiment, the classification process and the learning process are described as being performed in the same information processing apparatus 100 or 100 a. However, the present invention is not limited to this. For example, the classification device that performs classification processing and the learning device that performs learning processing may be respectively configured, or may be configured on cloud computing.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design and the like within the scope of the present invention.

本発明は、商品の分類に利用可能である。 The present invention is applicable to the classification of goods.

１００…情報処理装置，１０１…通信部，１０２…入力部，１０３…表示部，１０４…訓練データ記憶部，１０５…置換データ記憶部，１０６…分類情報記憶部，１０７…カテゴリ行列記憶部，１０８…マスタ情報記憶部，１０９…制御部，１１０…データ取得部，１１１…形態素解析部，１１２…形態素置換部，１１３…形態素ペア決定部，１１４…学習部，１１５…分類部，１１６…第１カテゴリ行列生成部，１１７…第２カテゴリ行列生成部，１１８…マスタ情報検索部，１１９…第１分類情報決定部，１２０…第２分類情報決定部，１２１…マスタ情報生成部，１…情報処理システム，１００ａ…情報処理装置，１０９ａ…制御部，１１０ａ…データ取得部，１１５ａ…分類部，１１８ａ…マスタ情報検索部，３００…端末装置，３０１…通信部，３０２…入力部，３０３…表示部，３０４…制御部，３０５…分類要求部，４００…ネットワーク 100 ... information processing apparatus, 101 ... communication unit, 102 ... input unit, 103 ... display unit, 104 ... training data storage unit, 105 ... replacement data storage unit, 106 ... classification information storage unit, 107 ... category matrix storage unit, 108 ... Master information storage unit, 109 ... Control unit, 110 ... Data acquisition unit, 111 ... Morphological analysis unit, 112 ... Morpheme replacement unit, 113 ... Morpheme pair determination unit, 114 ... Learning unit, 115 ... Classification unit, 116 ... 1st Category matrix generation unit, 117: second category matrix generation unit, 118: master information search unit, 119: first classification information decision unit, 120: second classification information decision unit, 121: master information generation unit, 1: information processing System, 100a ... information processing apparatus, 109a ... control unit, 110a ... data acquisition unit, 115a ... classification unit, 118a ... ma Data information search section, 300 ... terminal apparatus, 301 ... communication unit, 302 ... input unit, 303 ... display unit, 304 ... controller, 305 ... classification request unit, 400 ... network

本発明の一態様は、分類対象の文字列を形態素に分割する形態素解析部と、学習対象の文字列の特徴を示す情報として前記学習対象の文字列に付与された第１分類情報と前記学習対象の文字列とを含む複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の文字列に含まれる形態素毎に、前記分類対象の文字列がいずれの第１分類情報に分類されるかを表す割合を含む学習結果データを記録する学習結果データ記憶部から前記学習結果データを取得し、前記学習結果データと、前記分類対象の文字列から前記形態素解析部によって分割された前記形態素と、に基づいて、前記分類対象の文字列の前記第１分類情報を決定する第１分類情報決定部と、を備える分類装置である。 One aspect of the present invention, the morpheme analysis unit for dividing the character string to be classified into morphemes, the first classification information given as the information indicating the feature of the learning target string to a string of the learned learning classification based on a plurality of training data to each morpheme included in the character string of the learned by the predetermined machine learning, to the first classification information the classification target string is one that contains a target string Gets the previous Kigaku learning result data a percentage representing how is the learning result data storage unit that records including learning result data, before Kigaku learning result data, the morphological analysis from the string of the classification target And a first classification information determination unit that determines the first classification information of the character string to be classified based on the morphemes divided by a unit.

本発明の一態様は、分類対象の名称を形態素に分割する形態素解析部と、学習対象の名称の特徴を示す情報として前記学習対象の名称に付与された第１分類情報と前記学習対象の文字列とを含む複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の文字列に含まれる形態素毎に、前記分類対象の名称がいずれの第１分類情報に分類されるかを表す頻度を含む学習結果データを記録する学習結果データ記憶部から前記学習結果データを取得し、前記学習結果データと、前記分類対象の名称から前記形態素解析部によって分割された前記形態素と、に基づいて、前記分類対象の名称の前記第１分類情報を決定する第１分類情報決定部と、前記分類対象の名称から得られた複数の前記形態素のうち先頭及び末尾に位置づけられる２つの形態素を含む形態素ペアを決定する形態素ペア決定部と、を備え、前記学習結果データは前記複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の名称に含まれる複数の形態素のうち先頭及び末尾に位置づけられる２つの形態素を含む形態素ペア毎に、前記分類対象の名称がいずれの第１分類情報に分類されるかを表す頻度を含む学習結果データであり、前記第１分類情報決定部は、前記分類対象の名称に基づいて前記形態素ペア決定部によって決定された前記形態素ペアと、前記分類対象の名称から前記形態素解析部によって分割された前記形態素と、前記学習結果データと、に基づいて、前記分類対象の名称の前記第１分類情報を決定する、分類装置である。 According to one aspect of the present invention, there is provided a morphological analysis unit that divides a classification target name into morphemes, first classification information added to the learning target name as information indicating characteristics of the learning target name, and the learning target character By performing predetermined machine learning based on a plurality of training data including a column, it is determined which first classification information the name of the classification target is classified for every morpheme included in the character string of the learning target Learning result data from the learning result data storage unit that records the learning result data including the frequency to be expressed, based on the learning result data and the morpheme divided by the morpheme analysis unit from the name of the classification target Te, a first classification information determination unit configured to determine the first classification information of the classified name, positioned at the beginning and end of the plurality of the morphemes derived from the name of the classification target One of comprising a morpheme pair determination unit that determines the morpheme pair including morphological, and the learning result data of the plurality included in the learning object name by a predetermined machine learning based on the plurality of training data Learning result data including a frequency indicating which first classification information the name of the classification target is to be classified for every morpheme pair including two morphemes positioned at the beginning and the end of morphemes, the first A classification information determination unit includes the morpheme pair determined by the morpheme pair determination unit based on the name of the classification target, the morpheme divided by the morpheme analysis unit from the classification target name, and the learning result data And a classification device that determines the first classification information of the name of the classification target .

本発明の一態様は、分類対象の文字列を形態素に分割する形態素解析部と、学習対象の文字列の特徴を示す情報として前記学習対象の文字列に付与された第１分類情報と前記学習対象の文字列とを含む複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の文字列に含まれる形態素毎に、前記分類対象の文字列がいずれの第１分類情報に分類されるかを表す頻度を含む学習結果データを記録する学習結果データ記憶部から前記学習結果データを取得し、前記学習結果データと、前記分類対象の文字列から前記形態素解析部によって分割された前記形態素と、に基づいて、前記分類対象の文字列の前記第１分類情報を決定する第１分類情報決定部と、を備え、前記訓練データは、前記付与された第１分類情報と、前記第１分類情報の下位分類である第２分類情報と、を含むデータであり、前記学習結果データは前記複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の文字列に含まれる形態素毎に、前記分類対象の文字列がいずれの第２分類情報に分類されるかを表す頻度を含む学習結果データであり、前記学習結果データと、前記分類対象の文字列から前記形態素解析部によって分割された前記形態素と、に基づいて、前記第２分類情報を決定する第２分類情報決定部をさらに備える、分類装置である。 According to one aspect of the present invention, there is provided a morphological analysis unit that divides a character string to be classified into morphemes, first classification information added to the character string to be learned as information indicating a feature of the character string to be learned, and the learning The character string to be classified is classified into any first classification information for each morpheme included in the character string to be learned by performing predetermined machine learning based on a plurality of training data including the character string to be targeted The learning result data is stored from the learning result data storage unit that records learning result data including a frequency indicating whether the learning result data is stored, and the learning result data is divided by the morpheme analysis unit from the classification result character string. A first classification information determination unit that determines the first classification information of the character string to be classified based on a morpheme; and the training data includes the attached first classification information; 1 classification Data including second classification information which is a subclass of the information, and the learning result data is obtained by performing predetermined machine learning based on the plurality of training data, and each morpheme contained in the character string of the learning target in a learning result data including the frequency indicating whether the classification target string is classified into one of the second classification information, before Kigaku learning result data and, from said character string of the classified morphological analysis unit based on the above morphological divided by, before Symbol further comprising a second classification information determining unit for determining a second classification information is a classification device.

本発明の一態様は、分類対象の文字列を形態素に分割する形態素解析部と、学習対象の文字列の特徴を示す情報として前記学習対象の文字列に付与された第１分類情報と前記学習対象の文字列とを含む複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の文字列に含まれる形態素毎に、前記分類対象の文字列がいずれの第１分類情報に分類されるかを表す頻度を含む学習結果データを記録する学習結果データ記憶部から前記学習結果データを取得し、前記学習結果データと、前記分類対象の文字列から前記形態素解析部によって分割された前記形態素と、に基づいて、前記分類対象の文字列の前記第１分類情報を決定する第１分類情報決定部と、前記分類対象の文字列から前記形態素解析部によって分割された前記形態素を所定の条件に基づいて、他の形態素に置換する形態素置換部と、を備える、分類装置である。 According to one aspect of the present invention, there is provided a morphological analysis unit that divides a character string to be classified into morphemes, first classification information added to the character string to be learned as information indicating a feature of the character string to be learned, and the learning The character string to be classified is classified into any first classification information for each morpheme included in the character string to be learned by performing predetermined machine learning based on a plurality of training data including the character string to be targeted The learning result data is stored from the learning result data storage unit that records learning result data including a frequency indicating whether the learning result data is stored, and the learning result data is divided by the morpheme analysis unit from the classification result character string. morpheme, based on a first classification information determination unit configured to determine the first classification information string of the classification target, the morphemes are divided from the character string of the classified by the morphological analysis unit Based on a predetermined condition comprises a morpheme replacement unit which replaces the other morphemes, and a classifier.

本発明の一態様は、分類対象の名称を形態素に分割する形態素解析部と、学習対象の名称の特徴を示す情報として前記学習対象の名称に付与された第１分類情報と前記学習対象の名称とを含む複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の文字列に含まれる形態素毎に、前記分類対象の名称がいずれの第１分類情報に分類されるかを表す頻度を含む学習結果データを記録する学習結果データ記憶部から前記学習結果データを取得し、前記学習結果データと、前記分類対象の名称から前記形態素解析部によって分割された前記形態素と、に基づいて、前記分類対象の名称の前記第１分類情報を決定する第１分類情報決定部と、前記分類対象の名称から得られた複数の前記形態素のうち前記分類対象の名称と最も意味が近い形態素を決定し、特定した前記形態素と意味が近い他の形態素とを含む形態素ペアを決定する形態素ペア決定部とを、備え、前記学習結果データは前記複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の名称に含まれる複数の形態素のうち先頭及び末尾に位置づけられる２つの形態素を含む形態素ペア毎に、前記分類対象の名称がいずれの第１分類情報に分類されるかを表す頻度を含む学習結果データであり、前記第１分類情報決定部は、前記学習結果データと、前記分類対象の名称に基づいて前記形態素ペア決定部によって決定された前記形態素ペアと、に基づいて、前記分類対象の名称の前記第１分類情報を決定する、分類装置である。 According to one aspect of the present invention, there is provided a morphological analysis unit that divides a classification target name into morphemes, first classification information added to the name of the learning target as information indicating characteristics of the learning target name, and a name of the learning target And performing predetermined machine learning on the basis of a plurality of training data including, to indicate which first classification information the name of the classification target is classified for each morpheme contained in the character string of the learning target The learning result data is acquired from a learning result data storage unit that records learning result data including a frequency, and based on the learning result data and the morpheme divided by the morpheme analysis unit from the name of the classification target a first classification information determination unit configured to determine the first classification information of the classified name, most meaningful and the classification target name among the plurality of morphemes obtained from the name of the classification target There was determined the morphemes, and a morpheme pair determination unit that determines the morpheme pair comprising the other morphological meaning is closer to the morphemes identified, wherein the learning result data of predetermined based on said plurality of training data machine The name of the classification target is classified into any first classification information for each morpheme pair including two morphemes positioned at the beginning and the end among a plurality of morphemes included in the name of the learning object by learning or a learning result data including the frequency with which represents a first classification information determining unit includes: the learning result data, and the morpheme pair determined by the morpheme pair determination unit based on the name of the classification target, the It is a classification apparatus which determines said 1st classification information of the name of said classification object based on .

本発明の一態様は、学習対象の文字列を形態素に分割する形態素解析部と、学習対象の文字列の特徴を示す情報として前記学習対象の文字列に付与された第１分類情報と前記学習対象の文字列とを含む複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の文字列に含まれる形態素毎に、分類対象の文字列がいずれの第１分類情報に分類されるかを表す割合を含む学習結果データを生成する学習結果データ生成部と、を備える、学習装置である。 One aspect of the present invention, the morpheme analysis unit for dividing the character string to be learned into morphemes, the first classification information given as the information indicating the feature of the learning target string to a string of the learned learning By performing predetermined machine learning based on a plurality of training data including the target character string, the character string to be classified is classified into any first classification information for each morpheme included in the character string to be learned. comprising a learning result data generation unit for generating including learning result data the percentage representing the Luke, a, is a learning device.

本発明の一態様は、分類装置が、分類対象の文字列を形態素に分割する形態素解析ステップと、分類装置が、学習対象の文字列の特徴を示す情報として前記学習対象の文字列に付与された第１分類情報と前記学習対象の文字列とを含む複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の文字列に含まれる形態素毎に、前記分類対象の文字列がいずれの第１分類情報に分類されるかを表す割合を含む学習結果データを記録する学習結果データ記憶部から前記学習結果データを取得し、前記学習結果データと、前記分類対象の文字列から前記形態素解析ステップにおいて分割された前記形態素と、に基づいて、前記分類対象の文字列の前記第１分類情報を決定する第１分類情報決定ステップと、を有する、分類方法である。 One aspect of the present invention, the classification device, the morphological analysis step of dividing the character string to be classified into morphemes, the classification device, is applied to the string of the learned as information indicating a characteristic of the learning target string By performing predetermined machine learning based on a plurality of training data including the first classification information and the character string to be learned, the character string to be classified is for each morpheme included in the character string to be learned. Gets the previous Kigaku learning result data the percentage representing the either fall into one of the first classification information from the learning result data storage unit that records including learning result data, before Kigaku learning result data, the classification It said morphemes from target string is divided in the morphological analysis step, based on, having a first classification information determining step of determining the first classification information string of the classified, the classification method is there.

本発明の一態様は、学習装置が、学習対象の文字列を形態素に分割する形態素解析ステップと、学習装置が、学習対象の文字列の特徴を示す情報として前記学習対象の文字列に付与された第１分類情報と前記学習対象の文字列とを含む複数の訓練データに基づいて所定の機械学習をすることで前記学習対象の文字列に含まれる形態素毎に、分類対象の文字列がいずれの第１分類情報に分類されるかを表す割合を含む学習結果データを生成する学習結果データ生成ステップと、を有する、学習方法である。 One aspect of the present invention, the learning device, the morphological analysis step of dividing the character string to be learned into morphemes, the learning device is assigned to a string of the learned as information indicating a characteristic of the learning target string to each morpheme included in the character string of the learned by the predetermined machine learning based on a plurality of training data including a first classification information and the character string of the learned, the string to be classified is either a It has a learning result data generating step of a percentage representing how is classified into the first classification information to generate including learning result data, and a learning method.

Claims

A morphological analysis unit that divides a character string into morphemes;
By performing predetermined machine learning based on a plurality of training data including a character string and the classification result of the character string, the frequency in which the first classification information representing the feature of the character string has been given to the character string so far Acquiring the first learning result data from the learning result data storage unit that records first learning result data including the first learning result data, and based on the first learning result data and the morpheme, the first classification information of the character string A first classification information determination unit to determine
A sorting device comprising:

And a morpheme pair determining unit configured to determine morpheme pairs including any two of the morphemes based on a predetermined condition,
The first classification information determination unit performs predetermined machine learning based on the training data, so that the first classification information becomes a character string based on morphemes included in morpheme pairs of the training data. The second learning result data is acquired from the learning result data storage unit that records the second learning result data including the number of times assigned, and the character string is acquired based on the second learning result data and the morpheme pair. The classification device according to claim 1, wherein the first classification information is determined.

By performing predetermined machine learning based on the training data, second classification information representing the feature of the character string is determined among the plurality of second classification information associated with the determined first classification information. Acquiring the third learning result data from the learning result data storage unit that records third learning result data for the second learning, and determining the second classification information based on the third learning result data and the morpheme 2 further comprising a classification information determination unit,
The classification device according to claim 1.

The classification device according to claim 1, further comprising a morpheme substitution unit that substitutes the morpheme with another morpheme based on a predetermined condition.

The morpheme pair determining unit determines a morpheme closest in meaning to the character string among the morphemes, and determines a morpheme pair including the identified morpheme and another morpheme whose meaning is similar,
The first classification information determination unit performs predetermined machine learning based on the training data, so that the first classification information becomes a character string based on morphemes included in morpheme pairs of the training data. The second learning result data is acquired from the learning result data storage unit which records the second learning result data including the number of times assigned, and the character string is acquired based on the second learning result data and the morpheme pair. The classification apparatus according to claim 1, wherein the first classification information of is determined.

A morphological analysis unit that divides a character string into morphemes;
By acquiring a plurality of training data including a character string and the classification result of the character string and performing predetermined machine learning, the first classification information representing the feature of the character string has been given to the character string so far A learning result data generation unit that generates first learning result data including a frequency;
, A learning device.

A morphological analysis step in which the classification device divides the character string into morphemes;
The classification device performs predetermined machine learning based on a plurality of training data including a character string and the classification result of the character string, whereby the first classification information representing the feature of the character string is given to the character string so far The first learning result data is acquired from a learning result data storage unit that records first learning result data including the frequency of occurrence, and the first learning result data is acquired based on the first learning result data and the morpheme. 1 first classification information determination step of determining classification information;
Have a classification method.

A morphological analysis step in which the learning device divides the character string into morphemes;
The learning device acquires a plurality of training data including a character string and the classification result of the character string, and performs predetermined machine learning, so that the first classification information representing the feature of the character string becomes the character string so far A learning result data generation step of generating first learning result data including the assigned frequency;
Have learning methods.

A computer program for causing a computer to function as the classification device according to any one of claims 1 to 5.

A computer program for causing a computer to function as the learning device according to claim 6.