JPH05314168A - Symbol string occurrence probability calculating system - Google Patents

Symbol string occurrence probability calculating system

Info

Publication number
JPH05314168A
JPH05314168A JP4114503A JP11450392A JPH05314168A JP H05314168 A JPH05314168 A JP H05314168A JP 4114503 A JP4114503 A JP 4114503A JP 11450392 A JP11450392 A JP 11450392A JP H05314168 A JPH05314168 A JP H05314168A
Authority
JP
Japan
Prior art keywords
symbol string
occurrence probability
symbol
unregistered
storage means
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP4114503A
Other languages
Japanese (ja)
Inventor
Akira Suzuki
章 鈴木
Sueji Miyahara
末治 宮原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP4114503A priority Critical patent/JPH05314168A/en
Publication of JPH05314168A publication Critical patent/JPH05314168A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

PURPOSE:To provide the symbol string occurrence probability calculating system whose errors are reduced by accurately calculating the occurrence probability of unregistered symbol strings. CONSTITUTION:The numerical values of the power of the length of a symbol string stored in a symbol length storage means 4 to the number of the entire symbols stored in an entire symbol number storage means 3 are calculated at a raising means 7 and the number of the registered symbol strings stored in a registered symbol string number storage means 2 is subtracted from the value at a subtraction means 8. The occurrence probability of the unregistered symbol strings stored in a conditional occurrence probability storage means 6 and the number of the unregistered symbol strings stored in an unregistered symbol string number storage means 5 are multiplied together at a multiplication means 9. The multiplied value is divided by the subtracted value at a division means 10 and the occurrence probability of the unregistered symbol strings is calculated.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、記号列の生起確率を計
算する記号列生起確率計算方式に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a symbol string occurrence probability calculation method for calculating a symbol string occurrence probability.

【0002】[0002]

【従来の技術】記号列の生起確率を計算することは、例
えば日本語処理等の自然言語処理や文字認識、音声認識
等のパターン認識技術において重要であり、誤差ができ
るだけ少ない高精度な生起確率を算出することが必要で
ある。
2. Description of the Related Art Calculating the probability of occurrence of a symbol string is important in natural language processing such as Japanese processing and pattern recognition technology such as character recognition and voice recognition. It is necessary to calculate

【0003】記号列の生起確率を求める方法は、使用さ
れる記号列を生起確率と共に1つのレコードにまとめて
登録した記号列辞書を用意し、ある記号列の生起確率
は、その記号列をこの記号列辞書と照合して、該当する
記号列が存在すればそのレコードに記録されている生起
確率を用い、存在しなければ生起確率を0とする方法が
一般的である。この時使用される記号列辞書は、実際に
使用された記号列を大量に収集して、統計処理により出
現する記号列およびその生起確率を計算することにより
作成する方法が一般的であり、使用される記号列を全て
記号列辞書に登録できる場合には精度が高い生起確率が
得られる。
A method of obtaining the occurrence probability of a symbol string is to prepare a symbol string dictionary in which the used symbol strings are registered together with the occurrence probabilities in one record, and the occurrence probability of a certain symbol string A general method is to compare with a symbol string dictionary and use the occurrence probability recorded in the record if the corresponding symbol string exists, and set the occurrence probability to 0 if it does not exist. The symbol string dictionary used at this time is generally created by collecting a large number of actually used symbol strings and calculating the symbol strings that appear by statistical processing and their occurrence probabilities. If all the symbol strings to be generated can be registered in the symbol string dictionary, a highly accurate occurrence probability can be obtained.

【0004】しかし、記号列が人名や日常使用される単
語等の場合には、全ての記号列を収集してその生起確率
を求めて、辞書に登録することは大変な労力を必要とす
る作業であり、実際には使用されるのに記号列辞書には
登録されない記号列(以下これを未登録記号列と呼ぶ)
が発生する場合が多い。
However, when the symbol string is a person's name or a word used in daily life, it is a labor-intensive task to collect all the symbol strings, obtain their occurrence probabilities, and register them in the dictionary. Is a symbol string that is actually used but is not registered in the symbol string dictionary (hereinafter referred to as an unregistered symbol string).
Often occurs.

【0005】[0005]

【発明が解決しようとする課題】各種記号列には実際に
使用されるにも関わらず、記号列辞書に生起確率が登録
されていない記号列、すなわち未登録記号列が多々ある
が、このような未登録記号列の生起確率は、従来、0で
ないにも関わらず、0として処理され、誤差となってい
た。
There are many symbol strings whose occurrence probabilities are not registered in the symbol string dictionary, that is, unregistered symbol strings, although they are actually used for various symbol strings. In the past, the occurrence probability of unregistered symbol strings was processed as 0 even though it was not 0, resulting in an error.

【0006】本発明は、上記に鑑みてなされたもので、
その目的とするところは、未登録記号列の生起確率を適
確に算出し、誤差を低減した記号列生起確率計算方式を
提供することにある。
The present invention has been made in view of the above,
It is an object of the present invention to provide a symbol string occurrence probability calculation method that accurately calculates the occurrence probability of unregistered symbol strings and reduces errors.

【0007】[0007]

【課題を解決するための手段】上記目的を達成するた
め、本発明の記号列生起確率計算方式は、登録されてい
ない記号列の生起確率を計算する記号列生起確率計算方
式であって、数値を入力する入力手段と、登録されてい
る記号列の個数を格納する登録記号列個数格納手段と、
処理対象とするすべての記号の個数を格納する全記号個
数格納手段と、処理対象とする記号列の長さを格納する
記号長格納手段と、登録されていないが実際には使用さ
れる記号列の個数を格納する未登録記号列個数格納手段
と、使用される記号列であることを前提とした場合の登
録されていない記号列の条件付生起確率を格納する条件
付生起確率格納手段と、前記全記号個数格納手段に格納
された数値に対する前記記号長格納手段に格納された数
値の累乗の数値を計算する累乗手段と、該累乗手段で計
算された値から前記登録記号列個数格納手段に格納され
た値を減算する減算手段と、前記条件付生起確率格納手
段に格納された値と前記未登録記号列個数格納手段に格
納された値を乗算する乗算手段と、該乗算手段で計算さ
れた値を前記減算手段で計算された値で除算し、登録さ
れていない記号列の生起確率を算出する除算手段とを有
することを要旨とする。
In order to achieve the above object, the symbol string occurrence probability calculation method of the present invention is a symbol string occurrence probability calculation method for calculating the occurrence probability of an unregistered symbol string. Input means for inputting, and a registered symbol string number storage means for storing the number of registered symbol strings,
A total symbol number storage means for storing the number of all symbols to be processed, a symbol length storage means for storing the length of the symbol string to be processed, and a symbol string that is not registered but is actually used An unregistered symbol string number storage means for storing the number of, and a conditional occurrence probability storage means for storing a conditional occurrence probability of an unregistered symbol string on the assumption that the symbol string is used, Exponentiation means for calculating a power value of the numerical value stored in the symbol length storage means with respect to the numerical value stored in the total symbol number storage means, and to the registered symbol string number storage means from the value calculated by the exponentiation means. Subtraction means for subtracting the stored value, multiplication means for multiplying the value stored in the conditional occurrence probability storage means by the value stored in the unregistered symbol string number storage means, and the multiplication means Value subtracted Divided by the value calculated in step, it is summarized in that and a division means for calculating the occurrence probability of the symbol string is not registered.

【0008】また、本発明の記号列生起確率計算方式
は、登録されていない記号列の生起確率の値を計算する
未登録記号列生起確率計算手段と、入力された記号列を
格納する入力記号列格納手段と、記号列および該記号列
の生起確率を1つのレコードとして格納する記号列辞書
と、前記未登録記号列生起確率計算手段で計算された未
登録記号列の生起確率を格納する未登録記号列生起確率
格納手段と、前記入力記号列格納手段に格納された記号
列と前記記号列辞書を照合し、該入力記号列が該記号列
辞書のレコードとして登録されているか否かをチェック
し、登録されている場合には、該レコードの生起確率を
前記入力記号列の生起確率として出力し、登録されてい
ない場合には、前記未登録記号列生起確率格納手段に格
納された生起確率を前記入力記号列の生起確率として出
力する照合手段とを有することを要旨とする。
The symbol string occurrence probability calculation method of the present invention is an unregistered symbol string occurrence probability calculating means for calculating the value of the occurrence probability of an unregistered symbol string, and an input symbol for storing an input symbol string. A column storage means, a symbol string dictionary for storing the symbol string and the occurrence probability of the symbol string as one record, and an unregistered symbol string occurrence probability calculated by the unregistered symbol string occurrence probability calculation means. The registered symbol string occurrence probability storage unit is collated with the symbol string stored in the input symbol string storage unit and the symbol string dictionary to check whether the input symbol string is registered as a record in the symbol string dictionary. If registered, the occurrence probability of the record is output as the occurrence probability of the input symbol string, and if not registered, the occurrence probability stored in the unregistered symbol string occurrence probability storage means. To And summarized in that and a collating means for outputting as a probability of the entry power symbol string.

【0009】[0009]

【作用】本発明の記号列生起確率計算方式では、すべて
の記号の個数に対する記号列の長さの累乗の数値を計算
し、この値から登録記号列の個数を減算し、未登録記号
列の生起確率と未登録記号列の個数を乗算し、この乗算
された値を前記減算した値で除算し、未登録記号列の生
起確率を算出する。
In the symbol string occurrence probability calculation method of the present invention, the numerical value of the power of the length of the symbol string with respect to the number of all symbols is calculated, and the number of registered symbol strings is subtracted from this value to calculate the unregistered symbol string. The occurrence probability of the unregistered symbol string is calculated by multiplying the occurrence probability by the number of unregistered symbol strings and dividing the multiplied value by the subtracted value.

【0010】また、本発明の記号列生起確率計算方式
は、登録されていない記号列の生起確率の値を計算して
未登録記号列生起確率格納手段に格納しておくととも
に、記号列および該記号列の生起確率を1つのレコード
として記号辞書に格納しておき、入力された記号列と記
号列辞書を照合し、入力記号列が記号列辞書のレコード
として登録されているか否かをチェックし、登録されて
いる場合には、該レコードの生起確率を入力記号列の生
起確率として出力し、登録されていない場合には、未登
録記号列生起確率格納手段に格納された生起確率を入力
記号列の生起確率として出力する。
Further, in the symbol string occurrence probability calculation method of the present invention, the value of the occurrence probability of the unregistered symbol string is calculated and stored in the unregistered symbol string occurrence probability storing means, and the symbol string and the symbol string The probability of occurrence of the symbol string is stored in the symbol dictionary as one record, the input symbol string is collated with the symbol string dictionary, and it is checked whether the input symbol string is registered as a record in the symbol string dictionary. If it is registered, the occurrence probability of the record is output as the occurrence probability of the input symbol string, and if it is not registered, the occurrence probability stored in the unregistered symbol string occurrence probability storage means is input symbol. Output as a row occurrence probability.

【0011】[0011]

【実施例】以下、図面を用いて本発明の実施例を説明す
る。
Embodiments of the present invention will be described below with reference to the drawings.

【0012】図1は、本発明の一実施例に係わる記号列
生起確率計算方式の構成を示すブロック図である。同図
に示す記号列生起確率計算方式は、登録されていない記
号列の生起確率を算出するものであり、数値を入力する
入力手段1と、登録されている記号列の個数を格納する
登録記号列個数格納手段2と、処理対象とするすべての
記号の個数を格納する全記号個数格納手段3と、処理対
象とする記号列の長さを格納する記号長格納手段4と、
登録されていないが実際には使用される記号列の個数を
格納する未登録記号列個数格納手段5と、使用される記
号列であることを前提とした場合の登録されていない記
号列の条件付生起確率を格納する条件付生起確率格納手
段6と、前記全記号個数格納手段に格納された数値に対
する前記記号長格納手段に格納された数値の累乗の数値
を計算する累乗手段7と、該累乗手段で計算された値か
ら前記登録記号列個数格納手段に格納された値を減算す
る減算手段8と、前記条件付生起確率格納手段に格納さ
れた値と前記未登録記号列個数格納手段に格納された値
を乗算する乗算手段9と、該乗算手段で計算された値を
前記減算手段で計算された値で除算し、登録されていな
い記号列の生起確率を算出する除算手段10とを有す
る。
FIG. 1 is a block diagram showing the configuration of a symbol string occurrence probability calculation system according to an embodiment of the present invention. The symbol string occurrence probability calculation method shown in the figure is for calculating the occurrence probability of an unregistered symbol string, and input means 1 for inputting a numerical value and a registered symbol for storing the number of registered symbol strings. A column number storage means 2, a total symbol number storage means 3 for storing the number of all symbols to be processed, a symbol length storage means 4 for storing the length of a symbol string to be processed,
Unregistered symbol string number storage means 5 that stores the number of symbol strings that are not registered but are actually used, and conditions of unregistered symbol strings when it is assumed that the symbol strings are used Conditional occurrence probability storage means 6 for storing the occurrence probability, exponentiation means 7 for calculating a numerical value of the power of the numerical value stored in the symbol length storage means with respect to the numerical value stored in the total symbol number storage means, The subtraction means 8 for subtracting the value stored in the registered symbol string number storage means from the value calculated by the exponentiation means, the value stored in the conditional occurrence probability storage means and the unregistered symbol string number storage means Multiplying means 9 for multiplying the stored value and dividing means 10 for dividing the value calculated by the multiplying means by the value calculated by the subtracting means to calculate the occurrence probability of the unregistered symbol string. Have.

【0013】以上のように構成される本実施例の記号列
生起確率計算方式の作用を説明する。
The operation of the symbol string occurrence probability calculation method of the present embodiment configured as described above will be described.

【0014】まず、登録されている記号列の個数を入力
手段1を用いて入力し、その値が登録記号列個数格納手
段2に格納する(例:1000)。次に、処理対象とす
る全ての記号の個数を入力手段1を用いて入力し、その
数値を全記号個数格納手段3に格納する(例:26)。
それから、処理対象とする記号列の長さを入力手段1を
用いて入力し、その数値を記号長格納手段4に格納する
(例:3)。
First, the number of registered symbol strings is input using the input means 1, and the value is stored in the registered symbol string number storage means 2 (example: 1000). Next, the number of all symbols to be processed is input using the input means 1, and the numerical value is stored in the total symbol number storage means 3 (example: 26).
Then, the length of the symbol string to be processed is input using the input means 1, and the numerical value is stored in the symbol length storage means 4 (example: 3).

【0015】次に、登録されていないが実際には使用さ
れる記号列の個数を入力手段1を用いて入力し、その数
値を未登録記号列個数格納手段5に格納する。この数値
は正確な値が求められないことが多いが、その場合は推
定値を入力する(例:200)。そして、登録されてい
ない記号列の、実際には使用される記号列であることを
前提とした条件付生起確率を入力手段1を用いて入力
し、その数値を条件付生起確率格納手段6に格納する
(例:8.0×10-4)。
Next, the number of symbol strings that are not registered but are actually used is input using the input means 1, and the numerical value is stored in the unregistered symbol string number storage means 5. In many cases, an accurate value cannot be obtained for this numerical value, but in that case, an estimated value is input (example: 200). Then, the conditional occurrence probability of the unregistered symbol string on the assumption that the symbol string is actually used is input using the input means 1, and the numerical value thereof is stored in the conditional occurrence probability storage means 6. Store (eg, 8.0 × 10 −4 ).

【0016】次に、全記号個数格納手段3に格納された
数値に対する記号長格納手段4に格納された数値の累乗
の数値を累乗手段7で計算する(計算結果:17,57
6)。次に、減算手段8によって、累乗手段7で計算さ
れた値から登録記号列個数格納手段2に格納された数値
を減算する(計算結果:16,576)。それから、乗
算手段9によって条件付生起確率格納手段6に格納され
た値と未登録記号列個数格納手段3に格納された値を乗
算する(計算結果:0.16)。
Next, the exponentiation means 7 calculates the power of the numerical value stored in the symbol length storage means 4 with respect to the numerical value stored in the total symbol number storage means 3 (calculation result: 17, 57).
6). Next, the subtraction unit 8 subtracts the numerical value stored in the registered symbol string number storage unit 2 from the value calculated by the exponentiation unit 7 (calculation result: 16, 576). Then, the multiplication unit 9 multiplies the value stored in the conditional occurrence probability storage unit 6 by the value stored in the unregistered symbol string number storage unit 3 (calculation result: 0.16).

【0017】最後に、除算手段10によって乗算手段9
で計算された値を減算手段8で計算された値で除算し、
その数値を最終的な結果として出力する(計算結果:
9.7×10-6)。
Finally, the dividing means 10 causes the multiplying means 9
The value calculated in step (1) is divided by the value calculated in the subtracting means (8),
The numerical value is output as the final result (calculation result:
9.7 × 10 −6 ).

【0018】なお、オペレータが数値を入力する入力手
段としては、キーボード、文字認識装置、パンチカード
読取装置等、さまざまなものが考えられ、どれを使用し
てもよい。
As an input means for the operator to input a numerical value, various ones such as a keyboard, a character recognition device, a punched card reading device and the like are conceivable, and any of them may be used.

【0019】次に、図2を参照して、本発明の他の実施
例に係わる記号列生起確率計算方式を説明する。同図に
示す記号列生起確率計算方式は、登録記号列および未登
録記号列の生起確率を算出するものであり、登録されて
いない記号列の生起確率の値を上述した図1に示す実施
例のように計算する未登録記号列生起確率計算手段21
と、入力された記号列を格納する入力記号列格納手段2
2と、記号列および該記号列の生起確率を1つのレコー
ドとして格納する記号列辞書25と、前記未登録記号列
生起確率計算手段21で計算された未登録記号列の生起
確率を格納する未登録記号列生起確率格納手段23と、
前記入力記号列格納手段22に格納された記号列と前記
記号列辞書25を照合し、該入力記号列が該記号列辞書
25のレコードとして登録されているか否かをチェック
し、登録されている場合には、該レコードの生起確率を
前記入力記号列の生起確率として出力し、登録されてい
ない場合には、前記未登録記号列生起確率格納手段23
に格納された生起確率を前記入力記号列の生起確率とし
て出力する照合手段24とを有する。
Next, a symbol string occurrence probability calculation method according to another embodiment of the present invention will be described with reference to FIG. The symbol string occurrence probability calculation method shown in the figure is for calculating the occurrence probabilities of registered symbol strings and unregistered symbol strings, and the embodiment shown in FIG. 1 in which the value of the occurrence probability of unregistered symbol strings is described above. An unregistered symbol string occurrence probability calculation means 21
And an input symbol string storage means 2 for storing the inputted symbol string.
2, a symbol string and a symbol string dictionary 25 for storing the symbol string and the occurrence probability of the symbol string as one record, and an unregistered symbol string occurrence probability calculated by the unregistered symbol string occurrence probability calculation means 21. A registered symbol string occurrence probability storage means 23,
The symbol string stored in the input symbol string storage unit 22 is collated with the symbol string dictionary 25 to check whether the input symbol string is registered as a record in the symbol string dictionary 25, and the symbol string is registered. In this case, the occurrence probability of the record is output as the occurrence probability of the input symbol string, and when it is not registered, the unregistered symbol string occurrence probability storage means 23.
And a matching means 24 for outputting the occurrence probability stored in the above as the occurrence probability of the input symbol string.

【0020】図3は、前記記号列辞書25に格納されて
いる記号列と生起確率を示しているものである。
FIG. 3 shows the symbol strings and the occurrence probabilities stored in the symbol string dictionary 25.

【0021】このように構成される本実施例の記号列生
起確率計算方式の作用を説明する。なお、この説明で
は、記号としてA〜Zの英字を想定し、記号列の長さは
3としている。
The operation of the symbol string occurrence probability calculation method of this embodiment having the above-described configuration will be described. In this description, the letters AZ are assumed as the symbols, and the length of the symbol string is 3.

【0022】まず、未登録記号列生起確率計算手段21
で未登録記号列の生起確率が計算される。この動作につ
いては、上述した実施例で説明した通りである。例とし
て、計算結果は0.00097%であったとする。次
に、この数値を未登録記号列生起確率格納手段23に入
力する。入力の方法は電子的な方法でも良いし、キーボ
ード等でオペレータが入力しても良い。
First, the unregistered symbol string occurrence probability calculation means 21.
At, the occurrence probability of the unregistered symbol string is calculated. This operation is as described in the above embodiment. As an example, it is assumed that the calculation result is 0.00097%. Next, this numerical value is input to the unregistered symbol string occurrence probability storage unit 23. The input method may be an electronic method, or an operator may use a keyboard or the like.

【0023】以上が準備の操作であり、これが終わると
記号列の生起確率を求めることが可能となる。
The above is the preparation operation, and when this is completed, it is possible to obtain the occurrence probability of the symbol string.

【0024】まず、生起確率を求めたい記号列が記号列
辞書25に登録されている場合について説明する。記号
列の例として「ABD」を用いる。
First, the case where the symbol string for which the occurrence probability is to be obtained is registered in the symbol string dictionary 25 will be described. "ABD" is used as an example of the symbol string.

【0025】最初に、記号列「ABD」が入力され、入
力記号列格納手段22に格納される。
First, the symbol string "ABD" is input and stored in the input symbol string storage means 22.

【0026】次に、照合手段24が入力記号列格納手段
22に格納された記号列「ABD」と記号列辞書22内
の記号列と照合し、該当する記号列が記号列辞書25の
中に存在するかどうかを調べる。記号列「ABD」の場
合は存在する。存在する場合は、照合手段24はそのレ
コード内に記録された生成確率(この場合は5×1
-4)を出力して処理を終了する。
Next, the collating means 24 collates the symbol string "ABD" stored in the input symbol string storing means 22 with the symbol string in the symbol string dictionary 22, and the corresponding symbol string is stored in the symbol string dictionary 25. Check if it exists. It is present in the case of the symbol string "ABD". If it exists, the matching means 24 determines the generation probability (5 × 1 in this case) recorded in the record.
0 -4 ) is output and the process ends.

【0027】次に、生起確率を求めたい記号列が記号列
辞書25に登録されていない場合の動作について説明す
る。記号列の例として「AMK」を用いる。
Next, the operation when the symbol string for which the occurrence probability is to be obtained is not registered in the symbol string dictionary 25 will be described. "AMK" is used as an example of the symbol string.

【0028】最初に記号列「AMK」が入力され、入力
記号列格納手段22に格納される。次に、照合手段24
が入力記号列格納手段22に格納された記号列「AM
K」と記号列辞書22内の記号列と照合し、該当する記
号列が記号列辞書25の中に存在するかどうかを調べ
る。図3を見ればわかるように、記号列「AMK」は記
号列辞書中に存在しない。存在しない場合は、照合手段
24は未登録記号列生起確率格納手段23に格納された
生起確率(この場合は9.7×10-6)を出力して処理
を終了する。
First, the symbol string "AMK" is input and stored in the input symbol string storage means 22. Next, the matching means 24
Is the symbol string “AM stored in the input symbol string storage means 22.
K ”is compared with the symbol string in the symbol string dictionary 22 to check whether the corresponding symbol string exists in the symbol string dictionary 25. As can be seen from FIG. 3, the symbol string "AMK" does not exist in the symbol string dictionary. If it does not exist, the collation unit 24 outputs the occurrence probability (9.7 × 10 −6 in this case) stored in the unregistered symbol string occurrence probability storage unit 23, and ends the process.

【0029】このようにして、本発明によって記号列が
記号列辞書に登録されている場合もいない場合も生起確
率を求めることが可能になる。
As described above, according to the present invention, it is possible to obtain the occurrence probability regardless of whether the symbol string is registered in the symbol string dictionary.

【0030】[0030]

【発明の効果】以上説明したように、本発明によれば、
すべての記号の個数に対する記号列の長さの累乗の数値
を計算し、この値から登録記号列の個数を減算し、未登
録記号列の生起確率と未登録記号列の個数を乗算し、こ
の乗算された値を前記減算した値で除算し、未登録記号
列の生起確率を算出するとともに、更にこのように検出
された未登録記号列の生起確率を未登録記号列生起確率
格納手段に格納しておくとともに、記号列および該記号
列の生起確率を1つのレコードとして記号辞書に格納し
ておき、入力された記号列と記号列辞書を照合し、入力
記号列が記号列辞書のレコードとして登録されているか
否かをチェックし、登録されている場合には、該レコー
ドの生起確率を入力記号列の生起確率として出力し、登
録されていない場合には、未登録記号列生起確率格納手
段に格納された生起確率を入力記号列の生起確率として
出力するので、記号列が記号列辞書に登録されていなく
ても、該未登録記号列の生起確率を適確に算出すること
ができ、誤差を低減することができる。
As described above, according to the present invention,
Calculate the numerical value of the power of the length of the symbol string for the number of all symbols, subtract the number of registered symbol strings from this value, multiply the occurrence probability of unregistered symbol string by the number of unregistered symbol strings, The multiplied value is divided by the subtracted value to calculate the occurrence probability of the unregistered symbol string, and the occurrence probability of the unregistered symbol string thus detected is stored in the unregistered symbol string occurrence probability storage means. In addition, the symbol string and the occurrence probability of the symbol string are stored as one record in the symbol dictionary, and the input symbol string is collated with the symbol string dictionary, and the input symbol string is recorded as a record of the symbol string dictionary. It is checked whether or not it is registered, and if it is registered, the occurrence probability of the record is output as the occurrence probability of the input symbol string, and if it is not registered, the unregistered symbol string occurrence probability storage means. Raw stored in Since the probability is output as the occurrence probability of the input symbol string, even if the symbol string is not registered in the symbol string dictionary, the occurrence probability of the unregistered symbol string can be accurately calculated and the error can be reduced. You can

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例に係わる記号列生起確率計算
方式の構成を示すブロック図である。
FIG. 1 is a block diagram showing a configuration of a symbol string occurrence probability calculation method according to an embodiment of the present invention.

【図2】本発明の他の実施例に係わる記号列生起確率計
算方式の構成を示すブロック図である。
FIG. 2 is a block diagram showing a configuration of a symbol string occurrence probability calculation method according to another embodiment of the present invention.

【図3】当該他の実施例における記号列辞書の内容例を
示す図である。
FIG. 3 is a diagram showing an example of contents of a symbol string dictionary in the other embodiment.

【符号の説明】[Explanation of symbols]

1 入力手段 2 登録記号列個数格納手段 3 全記号個数格納手段 4 記号長格納手段 5 未登録記号列個数格納手段 6 条件付生起確率格納手段 7 累乗手段 8 減算手段 9 乗算手段 10 除算手段 21 未登録記号列生起確率計算手段 22 入力記号列格納手段 23 未登録記号列生起確率格納手段 24 照合手段 25 記号列辞書 1 input means 2 registered symbol string number storage means 3 total symbol number storage means 4 symbol length storage means 5 unregistered symbol string number storage means 6 conditional occurrence probability storage means 7 exponentiation means 8 subtraction means 9 multiplication means 10 division means 21 not Registered symbol string occurrence probability calculation unit 22 Input symbol string storage unit 23 Unregistered symbol string occurrence probability storage unit 24 Collating unit 25 Symbol string dictionary

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】 登録されていない記号列の生起確率を計
算する記号列生起確率計算方式であって、数値を入力す
る入力手段と、登録されている記号列の個数を格納する
登録記号列個数格納手段と、処理対象とするすべての記
号の個数を格納する全記号個数格納手段と、処理対象と
する記号列の長さを格納する記号長格納手段と、登録さ
れていないが実際には使用される記号列の個数を格納す
る未登録記号列個数格納手段と、使用される記号列であ
ることを前提とした場合の登録されていない記号列の条
件付生起確率を格納する条件付生起確率格納手段と、前
記全記号個数格納手段に格納された数値に対する前記記
号長格納手段に格納された数値の累乗の数値を計算する
累乗手段と、該累乗手段で計算された値から前記登録記
号列個数格納手段に格納された値を減算する減算手段
と、前記条件付生起確率格納手段に格納された値と前記
未登録記号列個数格納手段に格納された値を乗算する乗
算手段と、該乗算手段で計算された値を前記減算手段で
計算された値で除算し、登録されていない記号列の生起
確率を算出する除算手段とを有することを特徴とする記
号列生起確率計算方式。
1. A symbol string occurrence probability calculation method for calculating the occurrence probability of an unregistered symbol string, the input means for inputting a numerical value, and the number of registered symbol strings for storing the number of registered symbol strings. A storage means, a total symbol number storage means for storing the number of all symbols to be processed, a symbol length storage means for storing the length of a symbol string to be processed, and an unregistered but actually used Means for storing the number of unregistered symbol strings, which stores the number of registered symbol strings, and the conditional occurrence probability, which stores the conditional occurrence probability of unregistered symbol strings on the assumption that they are used symbol strings Storage means, exponentiation means for calculating a power value of the numerical value stored in the symbol length storage means with respect to the numerical value stored in the total symbol number storage means, and the registered symbol string from the value calculated by the exponentiation means In the number storage means Subtraction means for subtracting the stored value, multiplication means for multiplying the value stored in the conditional occurrence probability storage means by the value stored in the unregistered symbol string number storage means, and the multiplication means And a dividing means for calculating the occurrence probability of the unregistered symbol string by dividing the calculated value by the value calculated by the subtracting means.
【請求項2】 登録されていない記号列の生起確率の値
を計算する未登録記号列生起確率計算手段と、入力され
た記号列を格納する入力記号列格納手段と、記号列およ
び該記号列の生起確率を1つのレコードとして格納する
記号列辞書と、前記未登録記号列生起確率計算手段で計
算された未登録記号列の生起確率を格納する未登録記号
列生起確率格納手段と、前記入力記号列格納手段に格納
された記号列と前記記号列辞書を照合し、該入力記号列
が該記号列辞書のレコードとして登録されているか否か
をチェックし、登録されている場合には、該レコードの
生起確率を前記入力記号列の生起確率として出力し、登
録されていない場合には、前記未登録記号列生起確率格
納手段に格納された生起確率を前記入力記号列の生起確
率として出力する照合手段とを有することを特徴とする
記号列生起確率計算方式。
2. An unregistered symbol string occurrence probability calculating means for calculating the value of the occurrence probability of an unregistered symbol string, an input symbol string storing means for storing an input symbol string, a symbol string and the symbol string. A symbol string dictionary that stores the occurrence probability of the unregistered symbol string as one record, an unregistered symbol string occurrence probability storage unit that stores the occurrence probability of the unregistered symbol string calculated by the unregistered symbol string occurrence probability calculation unit, and the input The symbol string stored in the symbol string storage means is collated with the symbol string dictionary, and it is checked whether or not the input symbol string is registered as a record of the symbol string dictionary. The occurrence probability of the record is output as the occurrence probability of the input symbol string, and when it is not registered, the occurrence probability stored in the unregistered symbol string occurrence probability storage means is output as the occurrence probability of the input symbol string. Teru A symbol string occurrence probability calculation method characterized by having a matching means.
JP4114503A 1992-05-07 1992-05-07 Symbol string occurrence probability calculating system Pending JPH05314168A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4114503A JPH05314168A (en) 1992-05-07 1992-05-07 Symbol string occurrence probability calculating system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4114503A JPH05314168A (en) 1992-05-07 1992-05-07 Symbol string occurrence probability calculating system

Publications (1)

Publication Number Publication Date
JPH05314168A true JPH05314168A (en) 1993-11-26

Family

ID=14639387

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4114503A Pending JPH05314168A (en) 1992-05-07 1992-05-07 Symbol string occurrence probability calculating system

Country Status (1)

Country Link
JP (1) JPH05314168A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117553A1 (en) * 2013-01-29 2014-08-07 Tencent Technology (Shenzhen) Company Limited Method and system of adding punctuation and establishing language model
US9779728B2 (en) 2013-05-24 2017-10-03 Tencent Technology (Shenzhen) Company Limited Systems and methods for adding punctuations by detecting silences in a voice using plurality of aggregate weights which obey a linear relationship
US9811517B2 (en) 2013-01-29 2017-11-07 Tencent Technology (Shenzhen) Company Limited Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117553A1 (en) * 2013-01-29 2014-08-07 Tencent Technology (Shenzhen) Company Limited Method and system of adding punctuation and establishing language model
US9811517B2 (en) 2013-01-29 2017-11-07 Tencent Technology (Shenzhen) Company Limited Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text
US9779728B2 (en) 2013-05-24 2017-10-03 Tencent Technology (Shenzhen) Company Limited Systems and methods for adding punctuations by detecting silences in a voice using plurality of aggregate weights which obey a linear relationship

Similar Documents

Publication Publication Date Title
US8666998B2 (en) Handling data sets
CN111814466A (en) Information extraction method based on machine reading understanding and related equipment thereof
US20070038447A1 (en) Pattern matching method and apparatus and speech information retrieval system
CN101458928B (en) Voice recognition apparatus
CN107229627B (en) Text processing method and device and computing equipment
CN112151014B (en) Speech recognition result evaluation method, device, equipment and storage medium
JPH0778165A (en) Method and computer system for detection of error string in text
CN111460814A (en) Sensitive information detection method, device, terminal and medium
CN111859914B (en) Sensitive information detection method, device, computer equipment and storage medium
WO2019227629A1 (en) Text information generation method and apparatus, computer device and storage medium
US20110229036A1 (en) Method and apparatus for text and error profiling of historical documents
JPH05314168A (en) Symbol string occurrence probability calculating system
US20020156628A1 (en) Speech recognition system, training arrangement and method of calculating iteration values for free parameters of a maximum-entropy speech model
CN115544214B (en) Event processing method, device and computer readable storage medium
JP2000089786A (en) Method for correcting speech recognition result and apparatus therefor
CN113808577A (en) Intelligent extraction method and device of voice abstract, electronic equipment and storage medium
US6128595A (en) Method of determining a reliability measure
JP3919968B2 (en) Document proofing device
Tran et al. Markov models for written language identification
JP2018181370A (en) Medicine name output device, medicine name output method and medicine name output program
Sigletos et al. Role identification from free text using hidden Markov models
JPS6394365A (en) Qualifying device for wrong document in japanese sentence
JP3115459B2 (en) Method of constructing and retrieving character recognition dictionary
JP6988680B2 (en) Voice dialogue device
CN114519518A (en) Enterprise organization verification method, device, equipment and medium