CN1920812A - Language processing system - Google Patents

Language processing system Download PDF

Info

Publication number
CN1920812A
CN1920812A CNA2006101256010A CN200610125601A CN1920812A CN 1920812 A CN1920812 A CN 1920812A CN A2006101256010 A CNA2006101256010 A CN A2006101256010A CN 200610125601 A CN200610125601 A CN 200610125601A CN 1920812 A CN1920812 A CN 1920812A
Authority
CN
China
Prior art keywords
morpheme
mentioned
forbid
word sequence
pronunciation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006101256010A
Other languages
Chinese (zh)
Other versions
CN1920812B (en
Inventor
濑户重宣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Publication of CN1920812A publication Critical patent/CN1920812A/en
Application granted granted Critical
Publication of CN1920812B publication Critical patent/CN1920812B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Abstract

The present invention provided a language processing system for preliminarily preventing the generation of a word series including words which are not desired for a system user. This language processing system is provided with a forbidden morpheme storing part 202 for storing use forbidden morphemes, a series candidate generating part 111 for generating a plurality of word series candidates separately written in a plurality of morphemes from a solidly written text and an optimal series selecting part 112 for reading use forbidden morphemes from the forbidden morpheme storing part 202, for excluding the word series candidates including the use forbidden morphemes from among those, and for selecting the optimal word series whose inter-morpheme connectability is the highest from among a plurality of word series candidates.

Description

Language processing system
Technical field
The present invention relates to the morphemic analysis technology, particularly language processing system.
Background technology
From the system of text synthetic video, utilize following function, promptly compare with the system's word that is registered in advance in the system, preferentially the user of system's registration that the user appends is registered word and be used for sound and synthesize.For example, even in system, registered " refreshing Kobe (こ う べ) " such system's word, if system user has appended " refreshing Kobe (か ん べ) " such user to system and has registered word, also make " refreshing Kobe (か ん べ) " more preferential then than the pronunciation of " refreshing Kobe (こ う べ) ", and synthetic video.
But, not as Japanese, word to be separated to write (for example in the Japanese, for read easily understand and with separated literary style between speech and the speech) language in, even the stage of writing continuously, comprise the user that the user of system appended in the text and register under the situation of word, in the process of morphemic analysis, also might generate and do not comprise the word sequence of registering the corresponding morpheme of word with the user.For example, at " at the refreshing Kobe of slope " such text, the supposing the system user wishes the part with " refreshing Kobe (か ん べ) " such pronunciation output " refreshing Kobe ", and is registered as the user and registers word.But, carry out in the process of morphemic analysis in system, generated and separate and be written as "-slope god-Kobe-" the situation of word sequence under, in order to cut apart between " slope god " and " Kobe ", and do not export " refreshing Kobe (か ん べ) " such pronunciation.On the contrary, following technology has been proposed: in text, comprise to play and forbid that term etc. is under the situation of unfavorable word for the user of system, after having determined word sequence by morphemic analysis, detect be documented in tabulation in broadcast forbid the morpheme that term is consistent, skip then with playing with forbidding morpheme that term is consistent and read, perhaps change the word of reading to other (for example referring to Patent Document 1).But, before determining to separate the word sequence of writing, do not prevent to generate the system that comprises the word sequence of unfavorable word for the user of system in advance.
Word is being separated in the language of writing, also still having same problem.This be because: even the boundary of word is tangible, determine word sequence if in morphemic analysis, estimate internuncial intensity of the word that links to each other with front and back, even then have under the situation of user's registration form speech, also might not be only limited to generation and comprise the word sequence of registering the corresponding morpheme of word with the user in registration.
Patent documentation 1: the spy opens flat 5-165486 communique
Summary of the invention
The invention provides a kind of language processing system that comprises the word sequence of unfavorable word for the user of system that prevents from advance to generate.
According to first form of the present invention, a kind of language processing system is provided, possess: preserve to use forbid morpheme forbid the morpheme memory unit; Generate parts with the sequence candidates that a plurality of morphemes separate a plurality of word sequence candidates that write respectively according to the text generation of writing continuously; From forbid the morpheme memory unit, read to use and forbid morpheme, from a plurality of word sequence candidates, get rid of to comprise and use the candidate who forbids morpheme, the optimal sequence alternative pack of the optimum word sequence that the possibility that connects between a plurality of morphemes of selection in a plurality of word sequence candidates is the highest.
According to second form of the present invention, a kind of language processing system is provided, possess: preserve to use forbid morpheme forbid the morpheme memory unit; Read and be kept at the use of forbidding in the morpheme memory unit and forbid morpheme, ban use of the use of forbidding morpheme, generate parts with the sequence candidates that a plurality of morphemes separate a plurality of word sequence candidates that write respectively according to the text generation of writing continuously; The optimal sequence alternative pack of the optimum word sequence that the possibility that connects between a plurality of morphemes of selection in a plurality of word sequence candidates is the highest.
According to the present invention, can provide a kind of language processing system that comprises the word sequence of unfavorable word for the user of system that prevents from advance to generate.
Description of drawings
Fig. 1 is the block diagram of the language processing system of expression embodiments of the invention 1.
Fig. 2 is first mode chart of grid (lattice) structure of an example of the Japanese that language processing system generated of embodiments of the invention 1.
Fig. 3 is first mode chart of grid system of an example of the middle national language that language processing system generated of embodiments of the invention 1.
Fig. 4 is first mode chart of grid system of an example of the English that language processing system generated of embodiments of the invention 1.
Fig. 5 is first table of forbidding morpheme of an example that is kept at the Japanese of forbidding in the morpheme memory unit of expression embodiments of the invention 1.
Fig. 6 is first table of forbidding morpheme of an example that is kept at the middle national language of forbidding in the morpheme memory unit of expression embodiments of the invention 1.
Fig. 7 is first table of forbidding morpheme of an example that is kept at the English of forbidding in the morpheme memory unit of expression embodiments of the invention 1.
Fig. 8 is second mode chart of grid system of an example of the Japanese that language processing system generated of embodiments of the invention 1.
Fig. 9 is second mode chart of grid system of an example of the middle national language that language processing system generated of embodiments of the invention 1.
Figure 10 is second mode chart of grid system of an example of the English that language processing system generated of embodiments of the invention 1.
Figure 11 is the process flow diagram of the language processing method of expression embodiments of the invention 1.
Figure 12 is that being kept at of expression embodiments of the invention 1 forbidden second table of forbidding morpheme in the morpheme memory unit.
Figure 13 is first mode chart of grid system of other examples of the English that language processing system generated of embodiments of the invention 1.
Figure 14 is first table of forbidding morpheme that is kept at other examples of forbidding the English in the morpheme memory unit of expression embodiments of the invention 1.
Figure 15 is second mode chart of grid system of other examples of the English that language processing system generated of embodiments of the invention 1.
Figure 16 is the block diagram of the language processing system of expression embodiments of the invention 2.
Figure 17 is the mode chart of grid system of an example of the Japanese that language processing system generated of embodiments of the invention 2.
Figure 18 is the mode chart of grid system of an example of the middle national language that language processing system generated of embodiments of the invention 2.
Figure 19 is the mode chart of grid system of an example of the English that language processing system generated of embodiments of the invention 2.
Figure 20 is the mode chart of grid system of other examples of the English that language processing system generated of embodiments of the invention 2.
Figure 21 is the process flow diagram of the language processing method of expression embodiments of the invention 2.
Figure 22 is the block diagram of the language processing system of expression embodiments of the invention 3.
Figure 23 is the process flow diagram of the language processing method of expression embodiments of the invention 3.
Figure 24 is the block diagram of the language processing system of expression embodiments of the invention 4.
Figure 25 is the table of forbidding morpheme of an example that is kept at the Japanese of forbidding in the morpheme memory unit of expression embodiments of the invention 4.
Figure 26 is used for illustrating that the morpheme of will forbidding of embodiments of the invention 4 appends the figure of the situation of an example that is saved in the middle national language of forbidding the morpheme memory unit.
Figure 27 is used for illustrating that the morpheme of will forbidding of embodiments of the invention 4 appends the figure of the situation of an example that is saved in the English of forbidding the morpheme memory unit.
Figure 28 is the process flow diagram of the language processing method of expression embodiments of the invention 4.
Figure 29 is used for illustrating that the morpheme of will forbidding of embodiments of the invention 4 appends the figure of other examples that are saved in the middle national language of forbidding the morpheme memory unit.
Figure 30 is the block diagram of the language processing system of expression embodiments of the invention 5.
Figure 31 is used for illustrating that the morpheme of will forbidding of embodiments of the invention 5 appends the figure of an example that is saved in the middle national language of forbidding the morpheme memory unit.
Figure 32 is used for illustrating that the morpheme of will forbidding of embodiments of the invention 5 appends the figure of an example that is saved in the English of forbidding the morpheme memory unit.
Figure 33 is the process flow diagram of the language processing method of expression embodiments of the invention 5.
Figure 34 is used for illustrating that the morpheme of will forbidding of embodiments of the invention 5 appends the figure of other examples that are saved in the middle national language of forbidding the morpheme memory unit.
Figure 35 is used for illustrating that the morpheme of will forbidding of embodiments of the invention 5 appends the figure of other examples that are saved in the English of forbidding the morpheme memory unit.
Embodiment
Then, with reference to the accompanying drawings, embodiments of the invention are described.In the record of following accompanying drawing, to the same or similar symbol of identical or similar part additional phase.In addition, embodiment shown below is that example is used for the device that technological thought of the present invention is specialized or the example of method, and the configuration of the component parts of technological thought of the present invention etc. has more than and is limited to the following description.In the claim scope, can carry out various changes to technological thought of the present invention.
(embodiment 1)
The language processing system of embodiment 1 as shown in Figure 1, the data storage device 200 that possess central calculation processing apparatus (CPU) 100a, is connected with CPU100a.Data storage device 200 and then possess the morpheme of forbidding memory unit 202 and system's dictionary memory unit 201.Forbid that morpheme memory unit 202 is preserved with what forbidden pronunciation was read and forbid morpheme.System's dictionary memory unit 201 keeping records the pronunciation of a plurality of words and system's dictionary of part of speech.In addition, CPU100a also possesses sequence candidates generation parts 111, optimal sequence alternative pack 112.Sequence candidates generates parts 111 and separates a plurality of word sequence candidates that write with a plurality of morphemes respectively according to the text generation write continuously.Optimal sequence alternative pack 112 is read to use from forbid morpheme memory unit 202 and is forbidden morpheme, from a plurality of word sequence candidates, get rid of to comprise and use the candidate who forbids morpheme, in a plurality of word sequence candidates, select the highest optimum word sequence of possibility that connects between a plurality of morphemes.
Specifically, sequence candidates generates parts 111 reference system dictionaries and will write the text of having imported continuously and be decomposed into a plurality of morphemes, and then generates a plurality of morphemes are configured in grid (lattice) structure on the grid point.For example import " main note Yi Shang Kong Inter Ga " such text of Japanese, in system's dictionary, registered and added " main (ぬ) " respectively, " main (ゆ) ", " main (あ ゐ じ) ", " main (お も) ", " note Yi (I お く) ", " overhead (う わ そ ら) ", " go up (う え) ", " go up (か body) ", " go up (じ I う) ", " ", " empty (そ ら) ", " empty (く う) ", " empty (か ら) " “ Kong Inter (く う か ん) " “ Inter (か ん) " “ Inter (あ い だ) " “ Inter (は ざ ま) ", under the situation of the morpheme of the pronunciation of " Ga ", sequence candidates generates parts 111 and generates grid shown in Figure 2 (lattice) structure 50 that conduct is registered in the combination of the morpheme in system's dictionary.In grid system 50, comprise a plurality of word sequence candidates.If be starting point for example, then can generate the such word sequence candidate of " main (ぬ) note Yi (I お く) overhead (う わ そ ら) Inter (か ん) Ga " such word sequence candidate, " main (ぬ) note Yi (I お く) upward (う え) Kong Inter (く う か ん) Ga " etc. with " main (ぬ) ".
Equally, " you see that he holds train ticket " such text of national language in for example importing, in system's dictionary, registered respectively added " you (ni3) " ... under the situation of the morpheme of the pronunciation of " ticket (che1piao4) ", sequence candidates generates parts 111 and generates as the grid system shown in Figure 3 50 that is registered in the combination of the morpheme in system's dictionary.In grid system 50, comprise a plurality of word sequence candidates.If for example with " wearing " point to start with, then can generate such word sequence candidate of " (zhe) train ticket (huo3che1piao4) " such word sequence candidate, " (zhao2huo3) ticket (che1piao4) catches fire " etc.
In addition, for example import " Drink much mate " such text of English, in system's dictionary, register to have and added " drink " respectively ... under the situation of the morpheme of the pronunciation of " mate ", sequence candidates generates parts 111 and generates as the grid system shown in Figure 4 50 that is registered in the combination of the morpheme in system's dictionary.In grid system 50, comprise a plurality of word sequence candidates.If be starting point for example, then can generate such word sequence candidate of " much mate[meit] " such word sequence candidate, " much mate[ma:tei] " etc. with " much ".
Shown in Figure 1 forbid that morpheme memory unit 202 is preserved with what " pronunciation " not wanting to export read forbid morpheme for the user of system.For example as shown in Figure 5, for literal " master ", preservation has added as the morpheme of forbidding of the pronunciation of " the お も " of the pronunciation of not wanting to export for the user of system and " has led (お も) ", for character string " sky ", preserve to have added and forbid morpheme " overhead (う わ そ ら) " etc. as the pronunciation of " the う わ そ ら " of the pronunciation of for the user of system, not wanting to export.
Equally, for example as shown in Figure 6, " see " for literal, preservation has added as the morpheme of forbidding of the pronunciation of " ka1 " of the pronunciation of not wanting to export for the user of system and " has seen (ka1) ", " catch fire " for character string, preserve to have added and forbid morpheme " (zhao2huo3) catches fire " etc. as the pronunciation of " zhao2huo3 " of the pronunciation of for the user of system, not wanting to export.
In addition, for example as shown in Figure 7,, preserve to have added and forbid morpheme " mate[ma:tei] " etc. as the pronunciation of " ma:tei " of the pronunciation of for the user of system, not wanting to export for character string " mate ".
Optimal sequence alternative pack 112 shown in Figure 1 also possesses disabled module 114 and selects module 12.In a plurality of morphemes of disabled module 114 in being included in grid system shown in Figure 2 50, whether retrieval has and is kept at the morpheme of forbidding in the morpheme memory unit 202 that morpheme is corresponding of forbidding.And then disabled module 114 retrieves in grid system 50 under the situation of forbidding morpheme, and morpheme is forbidden in deletion from grid system 50.For example forbid morpheme " main (お も) " and forbid under the situation in morpheme " sky (う わ そ ら) " forbidding as shown in Figure 5 having preserved respectively in the morpheme memory unit 202, as shown in Figure 8, morpheme " main (お も) " and " overhead (う わ そ ら) " are forbidden in deletion from grid system 50.
Equally, as shown in Figure 9, morpheme " (ka1) " and " (zhao2huo3) catches fire " are forbidden in deletion from grid system 50.
In addition, as shown in figure 10, morpheme " mate[ma:tei] " is forbidden in deletion from grid system 50.
Selection module 12 shown in Figure 1 utilizes depth-first to explore (depth-first search), heuristic algorithms such as (breadth-first search) is explored in breadth-first, from deletion shown in Figure 8 forbid the grid system 50 behind the morpheme, select connection possibility between morpheme the highest and be judged as the immediate optimum word sequence of pronunciation.When selecting, also utilize exploratory methods (heuristics) such as the longest consensus method, the minimum method of civilian joint number, the minimum method of cost simultaneously.At this, as optimum word sequence, selection module 12 shown in Figure 1 is selected " main (ゆ) the note Yi (I お く) go up (じ I う) Kong Inter (く う か ん) Ga " as the highest word sequence of the connection possibility between morpheme from grid system 50.Audio files generates the audio files that parts 116 generate the pronunciation that is used to export optimum word sequence.
Data storage device 200 also possesses grid system memory unit 203 and optimal sequence memory unit 204.Grid system memory unit 203 saving sequence candidates generate the grid system 50 that parts 111 are generated.Optimal sequence memory unit 204 is preserved the selected optimum word sequence that goes out of optimal sequence alternative pack 112.In addition, CPU100a also is connected with loudspeaker 342, input media 340, output unit 341, program storage device 230, temporary storage device 231.Loudspeaker 342 is included in the pronunciation of the optimum word sequence in the audio files by voice output.For example can use pointing devices such as keyboard, mouse etc. as input media 340.Output unit 341 can use image display devices such as LCD, monitor, printer etc.Program storage device 230 is preserved the operating system of control CPU100a etc.Temporary storage device 231 is stored the result of calculation of CPU100a one by one.As program storage device 230 and temporary storage device 231, can use recording medium of logging programs such as semiconductor memory, disk, CD, photomagneto disk, tape for example etc.
Then, use the language processing method of flowchart text embodiment 1 shown in Figure 11.
(a) in step S100, generate the text of writing continuously that parts 111 inputs comprise Chinese character to the sequence candidates of CPU100a by input media shown in Figure 1 340.As an example, supposed to import " main note Yi Shang Kong Inter Ga " such text at this.Then, in step S101, sequence candidates generates parts 111 with reference to the system's dictionary that is kept in system's dictionary memory unit 201, will be decomposed into a plurality of morphemes as " main note Yi goes up empty Inter Ga " of input text, and then generate the grid system shown in Figure 2 50 that forms with a plurality of morphemes.Sequence candidates generates parts 111 grid system 50 that generates is saved in the grid system memory unit 203.
(b) in step S102, disabled module 114 shown in Figure 1 is read grid system shown in Figure 2 50 from grid system memory unit 203.Then, in disabled module 114 shown in Figure 1 a plurality of morphemes in being included in grid system shown in Figure 2 50, whether retrieval has and is kept at the morpheme of forbidding in the morpheme memory unit 202 that morpheme is corresponding of forbidding.At this, as shown in Figure 5, in forbidding morpheme memory unit 202, preserved and forbidden morpheme " main (お も) " and forbid under the situation in morpheme " sky (う わ そ ら) ", disabled module 114 is deleted from grid system 50 and is forbidden morpheme " main (お も) " and " overhead (う わ そ ら) " as shown in Figure 8.Then, disabled module 114 shown in Figure 1 will have been deleted the grid system 50 of forbidding behind the morpheme and be write and be saved in the grid system memory unit 203.
(c) in step S103, select module 12 from grid system memory unit 203, to read to have deleted to forbid the grid system 50 behind the morpheme.Then, select module 12 to use heuristic algorithm and exploratory methods, from deletion shown in Figure 8 forbid selecting to be judged as the immediate optimum word sequence of pronunciation the grid system 50 behind the morpheme.At this,, select module 12 to select " main (ゆ) note Yi (I お く) go up (じ I う) Kong Inter (く う か ん) Ga " as optimum word sequence.Then, the optimum word sequence that will select of optimal sequence alternative pack 112 is saved in the optimal sequence memory unit 204.
(d) in step S104, audio files generation parts 116 are read " main (ゆ) note Yi (I お く) goes up (じ I う) empty Inter (く う か ん) Ga " as optimum word sequence from optimal sequence memory unit 204.Then, audio files generation parts 116 are transformed to audio files with the pronunciation of optimum word sequence " main (ゆ) note Yi (I お く) goes up (じ I う) empty Inter (く う か ん) Ga ".Then, audio files generates parts 116 and is contained in the pronunciation of the optimum word sequence the audio files from loudspeaker 342 output packets, finishes the language processing method of embodiment 1.
More than, language processing system and language processing method according to Fig. 1 and embodiment 1 shown in Figure 11, in system's dictionary, preserved the word of reading not wish the pronunciation of exporting for the user, to forbid that morpheme is kept in advance and forbid in the morpheme memory unit 202, and can prevent the additional undesirable pronunciation of the text of input.Therefore, can add the pronunciation that the user wishes to text with higher probability.In addition, in example shown in Figure 5, represented the combination of title and pronunciation is kept at the example of forbidding in the morpheme memory unit 202.To this, also can be as shown in figure 12, the combination of title, pronunciation and part of speech is kept at forbids in the morpheme memory unit 202.
For example, " Colored pencil leads break easily " such text of input English, in system's dictionary, registered and added " colored " respectively ... under the situation of the morpheme of the pronunciation of " easily ", sequence candidates generates parts 111 and generates as the grid system shown in Figure 13 50 that is registered in the combination of the morpheme in system's dictionary.
At this, for example as shown in figure 14, at character string " pencil ", the forbidding that morpheme " pencil (v) [pensl] " etc. is saved in and forbid in the morpheme memory unit 202 of pronunciation of the part of speech v that do not wish to export, pronunciation " pensl " will have been added for the user of system.
Thus, disabled module 114 as shown in figure 15 from grid system 50 deletion forbid morpheme " pencil (v) [pensl] ".
Thus, be not the pronunciation mark of word, can also correctly handle syntax, improved the naturalities such as modulation in tone when reading.
(embodiment 2)
The difference of the language processing system of embodiment 2 and language processing system shown in Figure 1 is: as shown in figure 16, forbid that parts 214 and sequence candidates generate parts 211 and be connected.Forbid parts 214 in system's dictionary memory unit 201, preserve be kept at the situation of forbidding the morpheme that morpheme is consistent of forbidding in the morpheme memory unit 202 under, be provided with and forbid sequence candidates generate parts 211 with reference to and be registered in system's dictionary in forbid the morpheme that morpheme is consistent.
Therefore, for example imported under the situation of " main note Yi goes up empty Inter Ga " such text generating parts 211 to sequence candidates, sequence candidates generate parts 211 not with reference to be included in system's dictionary in forbid the consistent morpheme of morpheme " overhead (う わ そ ら) " and " Inter (か ん) ", generation does not comprise the grid system 51 of forbidding morpheme in advance as shown in figure 17.Because other inscapes of language processing system shown in Figure 16 are same as in figure 1, so omit explanation.
Equally, for example under the situation of " you see that he holds train ticket " of having imported middle national language to sequence candidates generation parts 211 such text, sequence candidates generate parts 211 not with reference to be included in system's dictionary in forbid that the consistent morpheme of morpheme " is seen (ka1) " and " (zhao2huo3) catches fire ", generation does not comprise the grid system 51 of forbidding morpheme in advance as shown in figure 18.
In addition, equally, for example under the situation of " the Drink much mate " that imported English to sequence candidates generation parts 211 such text, sequence candidates generate parts 211 not with reference to be included in system's dictionary in forbid the consistent morpheme of morpheme " mate[ma:tei] ", generate as shown in figure 19 and do not comprise the grid system 51 of forbidding morpheme in advance.
And then, equally, for example under the situation of " the Colored pencil leads break easily " that imported English to sequence candidates generation parts 211 such text, sequence candidates generate parts 211 not with reference to be included in system's dictionary in forbid the consistent morpheme of morpheme " pencil (v) [pensl] ", generate as shown in figure 20 and do not comprise the grid system 51 of forbidding morpheme in advance.
Then, use the language processing method of flowchart text embodiment 2 shown in Figure 21.
(a) in step S200, generate the text of writing continuously " main note Yi goes up empty Inter Ga " that parts 211 inputs comprise Chinese character to the sequence candidates of CPU100b by input media shown in Figure 16 340.In step S201, forbid parts 214 in system's dictionary memory unit 201, preserve be kept at the situation of forbidding the morpheme that morpheme is consistent of forbidding in the morpheme memory unit 202 under, be provided with and forbid sequence candidates generate parts 211 with reference to and be registered in system's dictionary in forbid the morpheme that morpheme is consistent.
(b) in step S202, sequence candidates generates parts 211 with reference to the system's dictionary that is kept in system's dictionary memory unit 201, to be decomposed into a plurality of morphemes as " the main note Yi Shang Kong Inter Ga " of input text, and then generate the grid system shown in Figure 17 51 that forms with a plurality of morphemes.At this moment, owing in step S201, be provided with forbid sequence candidates generate parts 211 with reference to and be registered in system's dictionary in forbid the morpheme that morpheme is consistent, do not forbid morpheme so in the grid system 51 that is generated, do not comprise.The grid system 51 of forbidding morpheme that do not comprise that sequence candidates generation parts 211 will generate is saved in the grid system memory unit 203.
(c) in step S203, optimal sequence alternative pack 212 is read from grid system memory unit 203 and is not comprised the grid system 51 of forbidding morpheme.Then, optimal sequence alternative pack 212 uses heuristic algorithm and exploratory method, selects to be judged as the immediate optimum word sequence of pronunciation from grid system 51.Then, ground the same implementation step S204, the language processing method of end embodiment 2 with step S104.
More than, according to language processing system and the language processing method of Figure 16 and embodiment 2 shown in Figure 21, can prevent the additional undesirable pronunciation of input text.
(embodiment 3)
The difference of the language processing system of embodiment 3 and language processing system shown in Figure 1 is: as shown in figure 22, forbid that parts 314 are connected with optimal sequence alternative pack 312.Forbid parts 214 in system's dictionary memory unit 201, preserve be kept at the situation of forbidding the morpheme that morpheme is consistent of forbidding in the morpheme memory unit 202 under, be provided with and forbid optimal sequence alternative pack 312 select to comprise forbid morpheme the word sequence candidate as optimum word sequence.Because other inscapes of language processing system shown in Figure 22 are same as in figure 1, so omit explanation.
Then, use the language processing method of flowchart text embodiment 3 shown in Figure 23.
(a) in step S300, generate the text of writing continuously " main note Yi goes up empty Inter Ga " that parts 111 inputs comprise Chinese character to the sequence candidates of CPU100c by input media shown in Figure 1 340.Then, in step S301, sequence candidates generates parts 111 with reference to the system's dictionary that is kept in system's dictionary memory unit 201, will be decomposed into a plurality of morphemes as " main note Yi goes up empty Inter Ga " of input text, and then generate the grid system shown in Figure 2 50 that forms with a plurality of morphemes.Sequence candidates generates parts 111 grid system 50 that generates is saved in the grid system memory unit 203.
(b) in step S302, forbid parts 314 in system's dictionary memory unit 201, preserve be kept at the situation of forbidding the morpheme that morpheme is consistent of forbidding in the morpheme memory unit 202 under, be provided with and forbid optimal sequence alternative pack 312 select to comprise forbid morpheme the word sequence candidate as optimum word sequence.In step S303, optimal sequence alternative pack 312 is read grid system 50 from grid system memory unit 203.Then, optimal sequence alternative pack 312 uses heuristic algorithm and exploratory method, selects to be judged as the immediate optimum word sequence of pronunciation from grid system 50.Then, ground the same implementation step S304, the language processing method of end embodiment 3 with step S104.
More than, according to language processing system and the language processing method of Figure 22 and embodiment 3 shown in Figure 23, can prevent the additional undesirable pronunciation of input text.
(embodiment 4)
The difference of the language processing system of embodiment 4 and language processing system shown in Figure 1 is: as shown in figure 24, CPU100d also comprises error range specified parts 120 and forbids that morpheme appends parts 121.At this, for example at input text " main note Yi Shang Kong Inter Ga ", optimal sequence alternative pack 112 has selected " main (ゆ) note Yi (I お く) overhead (う わ そ ら) Inter (か ん) Ga " as optimum word sequence mistakenly.In this case, the quilt of error range specified parts 120 from the optimum word sequence that system's user's acceptance error has been selected added the appointment of misreading morpheme of unfavorable pronunciation.For example, under the situation of having specified character string " Shang Kong Inter ", error range specified parts 120 is by " going up empty Inter " with character string and grid system 50 contrasts, and is divided into morpheme " overhead (う わ そ ら) " and morpheme " Inter (か ん) ", and each is defined as misreads morpheme.Forbid that morpheme appends parts 121 and will misread morpheme and forbid in the morpheme memory unit 202 as forbidding that morpheme appends to be saved in.In Figure 25, expression is at this moment to forbidding that the morpheme memory unit appends the saved example of forbidding morpheme.Because other inscapes of language processing system shown in Figure 24 are same as in figure 1, so omit explanation.
Equally, as shown in figure 26, for example suppose the input text " you see that he holds train ticket " at middle national language, optimal sequence alternative pack 112 has selected " you (ni3) " " seeing (kan4) " " he (ta1) " " (zhao2huo3) catches fire " " ticket (che1piao4) " that " take (na2) " as optimum word sequence mistakenly.The quilt of error range specified parts 120 from the optimum word sequence that system's user's acceptance error has been selected added the appointment of misreading morpheme of unfavorable pronunciation.For example under the situation of having specified character string " train ticket ", error range specified parts 120 is by " train ticket " with character string and grid system 50 contrasts, and be divided into morpheme " (zhao2huo3) catches fire " and morpheme " ticket (che1piao4) ", and each is defined as misreads morpheme.Forbid that morpheme appends parts 121 and will misread morpheme and forbid in the morpheme memory unit 202 as forbidding that morpheme appends to be saved in.
In addition, as shown in figure 27, for example suppose the input text " Drink muchmate " at English, optimal sequence alternative pack 112 has selected " drink (v) " " much (adv) " " mate (n) [ma:tei]) " as optimum word sequence mistakenly.The quilt of error range specified parts 120 from the optimum word sequence that system's user's acceptance error has been selected added the appointment of misreading morpheme of unfavorable pronunciation.For example under the situation of having specified character string " mate ", error range specified parts 120 is by with character string " mate " and grid system 50 contrasts, and is defined as morpheme " mate (n) [meit] ", and each is defined as misreads morpheme.Forbid that morpheme appends parts 121 and will misread morpheme and forbid in the morpheme memory unit 202 as forbidding that morpheme appends to be saved in.
Then, use the language processing method of flowchart text embodiment 4 shown in Figure 28.
(a) implement step S400 shown in Figure 28 and step S401 with step S100 and step S101 shown in Figure 11 the samely.In step S402, disabled module 114 shown in Figure 24 is read grid system from grid system memory unit 203.Then, in a plurality of morphemes of disabled module 114 in being included in grid system, delete and be kept at the morpheme of forbidding in the morpheme memory unit 202 that morpheme is corresponding of forbidding.In addition, suppose at this moment in forbidding morpheme memory unit 202, not preserve morpheme " overhead (う わ そ ら) " and " Inter (か ん) ".Then, disabled module 114 will have been deleted the grid system of forbidding behind the morpheme and be write and be saved in the grid system memory unit 203.
(b) in step S403, select module 12 from grid system memory unit 203, to read to have deleted to forbid the grid system behind the morpheme.Then, select module 12 to use heuristic algorithm and exploratory methods, from deletion shown in Figure 8 forbid selecting to be judged as the immediate optimum word sequence of pronunciation the grid system behind the morpheme.At this, and 12 selections of selection module " main (ゆ) note Yi (the I お く) sky (う わ そ ら) Inter (か ん) Ga " as optimum word sequence.Then, the optimum word sequence that optimal sequence alternative pack 112 goes out wrong choice is saved in the optimal sequence memory unit 204, the optimum word sequence that output unit 341 output errors are selected.
(c) in step S404, error range specified parts 120 is via the input of input media 340 from the user of system acceptance error scope.Under the situation of the character string " Shang Kong Inter " in imported the optimum word sequence that is included in wrong choice and goes out " main (ゆ) note Yi (I お く) overhead (う わ そ ら) Inter (か ん) Ga " by the user of system as error range, error range specified parts 120 is by " going up empty Inter " with character string and grid system 50 contrasts, and be divided into morpheme " overhead (う わ そ ら) " and morpheme “ Inter (か ん) ", and each is defined as misreads morpheme.Then, error range specified parts 120 will be misread morpheme and be transferred to and forbid that morpheme appends parts 121.
Equally, at input text " you see that he holds train ticket " at middle national language, system user has imported under the situation that character string among the optimum word sequence " you are (ni3) " " seeing (kan4) " " he (ta1) " that is included in wrong choice and goes out " (zhao2huo3) catches fire " " ticket (che1piao4) " that " take (na2) " " train ticket " as error range, error range specified parts 120 is by " train ticket " with character string and grid system 50 contrasts, and be divided into morpheme " (zhao2huo3) catches fire " and morpheme " ticket (che1piao4) ", and each is defined as misreads morpheme.Then, error range specified parts 120 will be misread morpheme and be transferred to and forbid that morpheme appends parts 121.
Equally, at input text " Drink much mate " at English, system user has imported under the situation of character string " mate " as error range in the optimum word sequence that is included in wrong choice and goes out " drink (v) " " much (adv) " " mate (n) [ma:tei]) ", error range specified parts 120 is with character string " mate " and grid system 50 contrasts, and be defined as morpheme " mate (n) [meit] ", and each is defined as misreads morpheme.Then, error range specified parts 120 will be misread morpheme and be transferred to and forbid that morpheme appends parts 121.
(d) in step S405, forbid that morpheme appends parts 121 and will misread morpheme " overhead (う わ そ ら) " and misread morpheme " Inter (か ん) " and forbid in the morpheme memory unit 202 as forbidding that morpheme is saved in respectively, the language processing method of end embodiment 4.
More than, language processing system and language processing method according to Figure 24 and embodiment 4 shown in Figure 28, after next time, can not select to comprise the word sequence candidate that forbids morpheme " overhead (う わ そ ら) " and forbid morpheme " Inter (か ん) " as optimum word sequence.
In addition, the error range of appointment also can not must be the scope that can be divided into morpheme in optimum word sequence in step S404.Specifically, be not " overhead (う わ そ ら) Inter (か ん) ", also can specify " empty (そ ら) Inter (か ん) " as error range.In this case, forbid that morpheme appends parts 121 and the morpheme " overhead (う わ そ ら) " that partly comprises as " empty (そ ら) " of error range appointment can be forbidden in the morpheme memory unit 202 as forbidding that morpheme is saved in.In addition, in embodiment 4, represented in language processing system shown in Figure 1 and then comprise error range specified parts 120 and forbid that morpheme appends the example of parts 121, but can certainly and then in Figure 10 or language processing system shown in Figure 22, comprise error range specified parts 120 and forbid that morpheme appends parts 121.
Equally, for the error range of appointment in step S404, in Chinese Language example originally, as shown in figure 29, be not " train ticket ", also can specify " train ticket " as error range.In this case, forbid that morpheme appends parts 121 and also the morpheme " (zhao3huo3) catches fire " that partly comprises as " fire " of error range appointment can be forbidden in the morpheme memory unit 202 as forbidding that morpheme is saved in.
(embodiment 5)
The difference of the language processing system of embodiment 5 and language processing system shown in Figure 1 is: as shown in figure 30, CPU100e also comprises pronunciation input block 122, contrast is extracted parts 123 out and forbidden that morpheme appends parts 121.At this, suppose that at input text " main note Yi Shang Kong Inter Ga " optimal sequence alternative pack 112 has selected " main (ゆ) note Yi (I お く) overhead (う わ そ ら) Inter (か ん) Ga " as optimum word sequence mistakenly.In this case, pronunciation input block 122 is accepted the input of the correct pronunciation " ゆ I お く じ I う く う か ん Ga " of input text " main note Yi Shang Kong Inter Ga " from the user of system.The pronunciation that the optimum word sequence that parts 123 go out wrong choice is extracted in contrast out compares with correct pronunciation, extracts the difference different with correct pronunciation partly " う わ そ ら " in the pronunciation of the optimum word sequence that wrong choice goes out out.Forbid that morpheme appends the morpheme " overhead (う わ そ ら) " of misreading that parts 121 will add the pronunciation of difference part " う わ そ ら " and forbids in the morpheme memory unit 202 as forbidding that morpheme is saved in.Because other inscapes of language processing system shown in Figure 30 are same as in figure 1, so omit explanation.
Equally, as shown in figure 31, suppose the input text " you see that he holds train ticket " at middle national language, optimal sequence alternative pack 112 has selected " you (ni3) " " seeing (kan4) " " he (ta1) " " by (na2) " " (zhao2huo3) catches fire " " ticket (che1piao4) " as optimum word sequence mistakenly.In this case, pronunciation input block 122 is accepted the input of the correct pronunciation " ni3 kan4 ta1 na2 zhe huo3che1 piao4 " of input text " you see that he holds train ticket " from the user of system.The pronunciation that the optimum word sequence that parts 123 go out wrong choice is extracted in contrast out compares with correct pronunciation, extracts the difference different with correct pronunciation partly " zhe huo3 che1 piao4 " in the pronunciation of the optimum word sequence that wrong choice goes out out.Forbid that morpheme appends misread morpheme " (zhao2huo3) catches fire " and " ticket (che1piao4) " conduct that parts 121 will add the pronunciation of difference part " zhe huo3 che1 piao4 " and forbids that morpheme is saved in and forbid in the morpheme memory unit 202.
In addition, shown in figure 32, suppose the input text " Drink muchmate " at English, optimal sequence alternative pack 112 has selected " drink (v) " " much (adv) " " mate (n) [ma:tei]) " as optimum word sequence mistakenly.In this case, pronunciation input block 122 is accepted the input of the correct pronunciation " drink mats meit " of input text " Drink much mate " from the user of system.The pronunciation that the optimum word sequence that parts 123 go out wrong choice is extracted in contrast out compares with correct pronunciation, extracts the difference different with correct pronunciation partly " meit " in the pronunciation of the optimum word sequence that wrong choice goes out out.Forbid that morpheme appends the morpheme " mate (n) [meit] " misread that parts 121 will add the pronunciation of difference part " meit " and forbids in the morpheme memory unit 202 as forbidding that morpheme is saved in.
Then, use the language processing method of flowchart text embodiment 5 shown in Figure 33.
(a) implement step S500 shown in Figure 33 to step S503 with step S400 shown in Figure 28 to step S403 the samely, suppose that optimal sequence alternative pack 112 selected " main (ゆ) note Yi (I お く) overhead (う わ そ ら) Inter (か ん) Ga " as optimum word sequence mistakenly.Then, the optimum word sequence that optimal sequence alternative pack 112 goes out wrong choice is saved in the optimal sequence memory unit 204, the optimum word sequence that output unit 341 output errors are selected.
(b) in step S504, pronunciation input block 122 is accepted the input of the correct pronunciation " ゆ I お く じ I う く う か ん Ga " of text " main note Yi Shang Kong Inter Ga " from the user of system via input media 340.Pronunciation input block 122 is saved in correct pronunciation " ゆ I お く じ I う く う か ん Ga " in the pronunciation memory unit 205.In step S405, the optimum word sequence " overhead (う わ そ ら) Inter (か ん) Ga of main (ゆ) note Yi (I お く) " that parts 123 readout error from optimal sequence memory unit 204 is selected is extracted in contrast out, reads correct pronunciation " ゆ I お く じ I う く う か ん Ga " from pronunciation memory unit 205.Then, the pronunciation that the optimum word sequence that parts 123 go out wrong choice is extracted in contrast out compares with correct pronunciation, extracts the difference different with correct pronunciation partly " う わ そ ら " in the pronunciation of the optimum word sequence that wrong choice goes out out.
(c) in step S505, contrast is extracted parts 123 out and will be included in the optimum word sequence that wrong choice goes out and the morpheme " overhead (う わ そ ら) " of misreading that added the pronunciation of difference part " う わ そ ら " is transferred to and forbids that morpheme appends parts 121.Forbid that morpheme appends parts 121 and will misread morpheme " overhead (う わ そ ら) " and forbid in the morpheme memory unit 202 as forbidding that morpheme is saved in, finish the language processing method of embodiment 5.
More than, according to language processing system and the language processing method of Figure 30 and embodiment 5 shown in Figure 33, after next time, can not select to comprise forbid morpheme " overhead (う わ そ ら) " the word sequence candidate as optimum word sequence.In addition, in embodiment 5, represented in language processing system shown in Figure 1 so comprise pronunciation input block 122, contrast is extracted parts 123 out, is forbidden that morpheme appends the example of parts 121, but can certainly and then comprise pronunciation input block 122 in Figure 16 or language processing system shown in Figure 22, contrast is extracted parts 123 out, is forbidden that morpheme appends parts 121.
(other embodiment)
As mentioned above, embodiments of the invention have been described, but should be understood that to limit content of the present invention as the argumentation and the accompanying drawing of the part of the disclosure.It is apparent that those skilled in the art can obtain various alternative forms of implementation, embodiment and application technology from the disclosure.For example illustrated that pronunciation input block 122 shown in Figure 30 accepts the input of the correct pronunciation of input text from the user of system.Relative therewith, the input of also can be pronunciation input block 122 have added from the user of system is received in the part of the input text morpheme of correct pronunciation.For example also can select mistakenly under " main (ゆ) note Yi (I お く) overhead (う わ そ ら) Inter (か ん) Ga " situation as optimum word sequence at optimal sequence alternative pack 112,122 acceptance of pronunciation input block have added the input of the morpheme " empty Inter (く う か ん) " of correct pronunciation, and contrast is extracted parts 123 out and extracted out and the inconsistent morpheme of morpheme " empty Inter (く う か ん) " " sky (う わ そ ら) " and " Inter (か ん) ".
Equally, as shown in figure 34, also can be at input text " you see that he holds train ticket " at middle national language, optimal sequence alternative pack 112 has been selected under " you (ni3) " " seeing (kan4) " " he (ta1) " " by (the na2) " situation of " (zhao2huo3) catches fire " " ticket (che1piao4) " as optimum word sequence mistakenly, 122 acceptance of pronunciation input block have added the input of the morpheme " train ticket (huo3 che1 piao4) " of correct pronunciation, and contrast is extracted parts 123 out and extracted out and morpheme " train ticket (huo3 che1 piao4) " inconsistent morpheme " (zhao2huo3) catches fire " and " ticket (che1piao4) ".
Equally, as shown in figure 35, also can be at input text " Drink muchmate " at English, optimal sequence alternative pack 112 has been selected under the situation of " drink (v) " " much (adv) " " mate (n) [ma:tei]) " as optimum word sequence mistakenly, 122 acceptance of pronunciation input block have added the input of the morpheme " mate (n) [meit] " of correct pronunciation, and contrast is extracted parts 123 out and extracted out and the inconsistent morphemes of morpheme " mate (n) [meit] " " mate (n) [ma:tei] ".
In addition, in an embodiment, represented that audio files generates the example that parts 116 generate the audio files of the pronunciation that is used to export optimum word sequence.But, just directly do not generate audio files from optimum word sequence, also can be to generate pronunciation information (pronunciation mark) file, and then generate the system of audio files from pronunciation mark file according to optimum word sequence.In addition, in Fig. 1, represented example that loudspeaker 342 is connected with CPU100a, but loudspeaker 342 must not be connected with CPU100a, can certainly other computing machine or sound system in use the audio files that has generated.
In addition, above-mentioned language processing method can be used as a series of processing or the operation that connect on the sequential and shows.Therefore, for effective language disposal route in CPU100a shown in Figure 1, can realize language processing method shown in Figure 5 by the computer program that produces a plurality of functions that processor in the CPU100a etc. produced.At this, computer program is exactly to carry out the recording medium of input and output or pen recorder etc. to CPU100a.As recording medium, comprise storage arrangement, disk set, optical disc apparatus, other can logging program device.Like this, the present invention also is included in these various embodiment that do not put down in writing etc. certainly.Therefore, as seen from the above description, only determine technical scope of the present invention according to the object of invention of suitable claim.

Claims (6)

1. language processing system is characterized in that comprising:
Preserve to use forbid morpheme forbid the morpheme memory unit;
Generate parts with the sequence candidates that a plurality of morphemes separate a plurality of word sequence candidates that write respectively according to the text generation of writing continuously;
Forbid reading the morpheme memory unit above-mentioned use and forbid morpheme from above-mentioned, from above-mentioned a plurality of word sequence candidates, get rid of and comprise the candidate that morpheme is forbidden in above-mentioned use, the optimal sequence alternative pack of the optimum word sequence that the possibility that connects between the above-mentioned a plurality of morphemes of selection in above-mentioned a plurality of word sequence candidates is the highest.
2. language processing system is characterized in that comprising:
Preserve to use forbid morpheme forbid the morpheme memory unit;
Read to be kept at and above-mentionedly forbid that the above-mentioned use in the morpheme memory unit forbids morpheme, forbid that above-mentioned use forbids the use of morpheme, generate parts with the sequence candidates that a plurality of morphemes separate a plurality of word sequence candidates that write respectively according to the text generation of writing continuously;
The optimal sequence alternative pack of the optimum word sequence that the possibility that connects between the above-mentioned a plurality of morphemes of selection in above-mentioned a plurality of word sequence candidates is the highest.
3. language processing system according to claim 1 and 2 is characterized in that also comprising:
Accept the error range specified parts that the quilt in the above-mentioned optimum word sequence has added the appointment of misreading morpheme of the pronunciation different with the correct pronunciation of above-mentioned text.
4. language processing system according to claim 1 and 2 is characterized in that also comprising:
The pronunciation of above-mentioned optimum word sequence and the correct pronunciation of above-mentioned text are compared, from above-mentioned optimum word sequence, extract the contrast of misreading morpheme that has been added the pronunciation different out and extract parts out with above-mentioned correct pronunciation.
5. language processing system according to claim 3 is characterized in that also comprising:
The above-mentioned morpheme of misreading is forbidden that morpheme appends to be saved in and above-mentionedly forbidden that the morpheme of forbidding in the morpheme memory unit appends parts as above-mentioned.
6. language processing system according to claim 4 is characterized in that also comprising:
The above-mentioned morpheme of misreading is forbidden that morpheme appends to be saved in and above-mentionedly forbidden that the morpheme of forbidding in the morpheme memory unit appends parts as above-mentioned.
CN2006101256010A 2005-08-24 2006-08-24 Language processing system Expired - Fee Related CN1920812B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2005242492A JP2007058509A (en) 2005-08-24 2005-08-24 Language processing system
JP2005-242492 2005-08-24
JP2005242492 2005-08-24

Publications (2)

Publication Number Publication Date
CN1920812A true CN1920812A (en) 2007-02-28
CN1920812B CN1920812B (en) 2011-02-02

Family

ID=37778538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006101256010A Expired - Fee Related CN1920812B (en) 2005-08-24 2006-08-24 Language processing system

Country Status (3)

Country Link
US (1) US7917352B2 (en)
JP (1) JP2007058509A (en)
CN (1) CN1920812B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101512518B (en) * 2006-09-07 2015-06-24 日本电气株式会社 Natural language processing system and dictionary registration system
US8103503B2 (en) * 2007-11-01 2012-01-24 Microsoft Corporation Speech recognition for determining if a user has correctly read a target sentence string
US20130151251A1 (en) * 2011-12-12 2013-06-13 Advanced Micro Devices, Inc. Automatic dialog replacement by real-time analytic processing
JP2014021136A (en) * 2012-07-12 2014-02-03 Yahoo Japan Corp Speech synthesis system
JP2014092838A (en) * 2012-11-01 2014-05-19 Nec Corp Morpheme analysis device, morpheme analysis program, and morpheme analysis method
US8831953B2 (en) 2013-01-16 2014-09-09 Vikas Vanjani Systems and methods for filtering objectionable content
WO2015134579A1 (en) 2014-03-04 2015-09-11 Interactive Intelligence Group, Inc. System and method to correct for packet loss in asr systems
JP6300596B2 (en) * 2014-03-27 2018-03-28 Kddi株式会社 Dictionary device, morpheme analyzer, data structure, morpheme analysis method and program
JP6300601B2 (en) * 2014-03-31 2018-03-28 Kddi株式会社 Dictionary device, morpheme analyzer, data structure, morpheme analysis method and program
US10083169B1 (en) * 2015-08-28 2018-09-25 Google Llc Topic-based sequence modeling neural networks
US9705618B1 (en) 2015-12-18 2017-07-11 Intel Corporation Systems, methods and devices for public announcements
US10572586B2 (en) * 2018-02-27 2020-02-25 International Business Machines Corporation Technique for automatically splitting words

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4829423A (en) * 1983-01-28 1989-05-09 Texas Instruments Incorporated Menu-based natural language understanding system
JPS61264472A (en) * 1985-05-20 1986-11-22 Toshiba Corp Document producing device
US5146405A (en) * 1988-02-05 1992-09-08 At&T Bell Laboratories Methods for part-of-speech determination and usage
JP3375979B2 (en) * 1991-09-09 2003-02-10 キヤノン株式会社 Character processing apparatus and method
JPH05165486A (en) 1991-12-18 1993-07-02 Oki Electric Ind Co Ltd Text voice transforming device
JPH08185197A (en) * 1994-12-28 1996-07-16 Fujitsu Ltd Japanese analyzing device and japanese text speech synthesizing device
US5828991A (en) * 1995-06-30 1998-10-27 The Research Foundation Of The State University Of New York Sentence reconstruction using word ambiguity resolution
US6182028B1 (en) * 1997-11-07 2001-01-30 Motorola, Inc. Method, device and system for part-of-speech disambiguation
US6098042A (en) * 1998-01-30 2000-08-01 International Business Machines Corporation Homograph filter for speech synthesis system
US6640006B2 (en) * 1998-02-13 2003-10-28 Microsoft Corporation Word segmentation in chinese text
US6076060A (en) * 1998-05-01 2000-06-13 Compaq Computer Corporation Computer method and apparatus for translating text to sound
US6078885A (en) * 1998-05-08 2000-06-20 At&T Corp Verbal, fully automatic dictionary updates by end-users of speech synthesis and recognition systems
US6185530B1 (en) * 1998-08-14 2001-02-06 International Business Machines Corporation Apparatus and methods for identifying potential acoustic confusibility among words in a speech recognition system
US6233553B1 (en) * 1998-09-04 2001-05-15 Matsushita Electric Industrial Co., Ltd. Method and system for automatically determining phonetic transcriptions associated with spelled words
US6587822B2 (en) * 1998-10-06 2003-07-01 Lucent Technologies Inc. Web-based platform for interactive voice response (IVR)
US6233718B1 (en) * 1998-10-19 2001-05-15 Dolby Laboratories Licensing Corporation Avoiding forbidden data patterns in coded audio data
JP2000194389A (en) * 1998-12-25 2000-07-14 Matsushita Electric Ind Co Ltd Information processor
US6731802B1 (en) * 2000-01-14 2004-05-04 Microsoft Corporation Lattice and method for identifying and normalizing orthographic variations in Japanese text
US7280964B2 (en) * 2000-04-21 2007-10-09 Lessac Technologies, Inc. Method of recognizing spoken language with recognition of language color
US7124080B2 (en) * 2001-11-13 2006-10-17 Microsoft Corporation Method and apparatus for adapting a class entity dictionary used with language models
US7496498B2 (en) * 2003-03-24 2009-02-24 Microsoft Corporation Front-end architecture for a multi-lingual text-to-speech system
US7580827B1 (en) * 2003-12-31 2009-08-25 Google Inc. Semantic unit recognition
US7437290B2 (en) * 2004-10-28 2008-10-14 Microsoft Corporation Automatic censorship of audio data for broadcast

Also Published As

Publication number Publication date
US7917352B2 (en) 2011-03-29
JP2007058509A (en) 2007-03-08
CN1920812B (en) 2011-02-02
US20070055496A1 (en) 2007-03-08

Similar Documents

Publication Publication Date Title
CN1920812A (en) Language processing system
CN1801139A (en) Sentence displaying method, information processing system
CN1259632C (en) Method and system for filtering & selecting from a candidate listing generated by random inputting method
US6671684B1 (en) Method and apparatus for simultaneous highlighting of a physical version of a document and an electronic version of a document
CN1139884C (en) Method and device for information treatment and storage medium for storaging and impelementing said method program
CN1145872C (en) Method for automatically cutting and identiying hand written Chinese characters and system for using said method
CN1288581C (en) Document retrieval by minus size index
CN1609846A (en) Digital ink annotation process and system for recognizing, anchoring and reflowing digital ink annotations
CN1471029A (en) System and method for auto-detecting collcation mistakes of file
CN1167014C (en) File processing method, data processing device and storage medium
CN1834955A (en) Multilingual translation memory, translation method, and translation program
CN101065746A (en) System and method for automatic enrichment of documents
CN1290901A (en) Method and system for text substitute mode formed by random input source
CN101055523A (en) Method for exchanging software program code to hardware described language program code
CN1232226A (en) Sentence processing apparatus and method thereof
CN1186287A (en) Method and apparatus for character recognition
CN1490744A (en) Method and system for searching confirmatory sentence
CN1744087A (en) Document processing apparatus for searching documents control method therefor,
CN1779782A (en) User interface design apparatus and method
CN1829364A (en) Communication terminal and method of inserting symbols thereof
CN1771494A (en) Automatic segmentation of texts comprising chunsk without separators
CN1713171A (en) Document processing device, document processing method, and storage medium recording program therefor
CN100341273C (en) Data processing method, data processing apparatus
CN1896997A (en) Character string searching device and its program product
CN1106619C (en) Chinese input transition processing device and Chinese input transition processing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110202

Termination date: 20120824