CN105426355A - Syllabic size based method and apparatus for identifying Tibetan syntax chunk - Google Patents
Syllabic size based method and apparatus for identifying Tibetan syntax chunk Download PDFInfo
- Publication number
- CN105426355A CN105426355A CN201510711234.1A CN201510711234A CN105426355A CN 105426355 A CN105426355 A CN 105426355A CN 201510711234 A CN201510711234 A CN 201510711234A CN 105426355 A CN105426355 A CN 105426355A
- Authority
- CN
- China
- Prior art keywords
- syllable
- syntactic
- chunk
- marker
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a syllabic size based method and apparatus for identifying a Tibetan syntax chunk, and belongs to the technical field of machine translation in computer application technology. The method according to the present invention comprises: firstly, preprocessing an original Tibetan corpus to delete non-Tibetan text; then, performing identification by using a pre-trained syntax marker identification model M1, to acquire a syntax marker type; restoring text with the syntax marker type being an adhesion form, to acquire a standard corpus without the adhesion form; and finally, for the standard corpus, using a pre-trained syntax chunk identification model M2 to directly perform chunk identification on a functional chunk. Compared with the prior art, in the method and apparatus provided by the present invention, the functional chunk is directly identified without word segmentation and part-of-speech tagging, thereby reducing time and space costs of preprocessing while avoiding poor performance in identifying function chunks caused by inaccuracy of word segmentation and part-of-speech tagging.
Description
Technical field
The invention belongs to Computer Applied Technology field, relate to a kind of Tibetan language syntactic groups block identifying method based on syllable granularity in fields such as being applied to mechanical translation and device.
Background technology
Chunk identifies it is the study hotspot of natural language processing field automatically.Chunk parsing is as a kind of preprocessing means, greatly can reduce the complicacy of the syntactic analysis process based on phrase, for further syntactic analysis and semantic analysis etc. provide infrastructural support, syntactic analysis is simplified to a certain extent, has therefore been applied to many utility systems such as mechanical translation, question answering system.
The object of Tibetan language syntax chunk Study of recognition is the border and the type that correctly mark out the syntax chunk forming Tibetan language sentence.Existing chunk parsing research, all the identification carrying out syntax chunk on the basis of language material being carried out to participle and part-of-speech tagging again, but Tibetan language participle and part-of-speech tagging effect still do not reach actual demand at present, because the error rate of participle and part-of-speech tagging is higher, greatly reduce the accuracy of follow-up phase identification Tibetan language chunk.The present invention finds by going deep into language analysis, due to Tibetan language self inherent characteristics, more in esse syntactic markers in Tibetan language, contain the effective semantic information to chunk type identification, if directly identified syntactic marker, the object of chunk parsing can be reached.
Summary of the invention
The object of the invention is the identification problem in order to solve syntax chunk in Tibetan language Intelligent Information Processing, a kind of Tibetan language syntactic groups block identifying method based on syllable granularity is proposed, this method can be directly granularity unit with syllable, Tibetan language syntax chunk is identified, avoid in existing conventional method and first must complete Tibetan language participle and part-of-speech tagging drawback, decrease participle and the cost of the time and space needed for part-of-speech tagging pre-service, also efficiently solve because of participle and part-of-speech tagging accuracy is low and problem that continued syntactical chunk parsing performance that is that directly cause reduces simultaneously.
A Tibetan language syntactic groups block identifying method for syllable granularity, comprises following concrete steps:
Step one: Text Pretreatment is carried out to input language material and obtains standardization sentence language material S;
Step 2: the syntactic marker model of cognition M that training in advance is good is adopted to S
1carry out identification and obtain syntactic marker type;
Step 3: the syntactic marker type obtained step 2 is that the text of form of sticking together reduces and obtains not containing the standard corpus of the form of sticking together;
Step 4: the syntax chunk model of cognition M that training in advance is good is adopted to the standard corpus that step 3 obtains
2carry out chunk parsing and obtain type identification result.
In the present invention's specific embodiment, described in step one, the concrete steps of Text Pretreatment comprise:
1. create and collect corpus data, corpus Data Source includes but not limited to: teaching material, Scientific Magazine, periodical, newspaper and website Tibetan language text;
2. pair above-mentioned corpus data carry out pre-service, delete nonsignificant data; Described nonsignificant data refers to the text of other language adulterated in Tibetan language language material;
3. carrying out sentence cutting to above-mentioned text further, is with the text sequence of sentence unit by material segmentation.
In the present invention's specific embodiment, the concrete steps of described Tibetan language syntactic marker model of cognition training comprise:
1. pair language material carries out syntactic marker mark sentence by sentence, creates the syllable syntax mark system of corpus;
2. language material after mark is brought into CRFs by specific feature templates, trains syntactic marker CRFs model of cognition.
In the present invention's instantiation, described in stick together form reduction concrete steps comprise:
1., according to syntactic marker type, be that the syllable of form of sticking together carries out syllable splitting according to different rules to the form of sticking together by wherein result, be reduced to single syllable text;
2., after cutting, between the syllable split out, mend ". ", complete the reduction work to raw material.
In the present invention's specific embodiment, the concrete steps of described syntax chunk model of cognition training comprise:
1. chunk type mark is carried out to each sentence, create the chunk mark system of corpus, namely enter in different chunk type to each syllabification in each;
2. above-mentioned language material is brought into CRFs by specific feature templates, trains chunk type CRFs model of cognition.
A Tibetan language syntax chunk recognition device for syllable granularity, comprise connect successively Text Pretreatment module, syntactic marker identification module, stick together form recovery module and chunk type identification module;
Text Pretreatment module is used for processing input language material text, and sentence cutting obtains the sentence that can be used for syntax chunk parsing;
Syntactic marker identification module is used for carrying out Syntactic Recognition to the sentence that Text Pretreatment module exports and obtains syntactic marker;
Stick together form recovery module to reduce to the form of sticking together in original sentence for the syntactic marker that exports according to syntactic marker identification module and obtain writing form sentence with the non-glutinous of syntactic marker;
Chunk type identification module is used for carrying out the identification of syntactic groups block type according to the sentence sticking together the output of form recovery module and obtains recognition result and export.
Beneficial effect
To original Tibetan language text without participle and part-of-speech tagging link, but in units of syllable, utilize the own characteristic of Tibetan language directly to carry out chunk type identification, propose out a kind of new Tibetan language chunk parsing method, infrastructural support can be provided for further Tibetan language syntactic analysis, semantic analysis even depth Intelligent treatment.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, in describing embodiment below, required accompanying drawing is introduced briefly.Accompanying drawing is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings, wherein:
Fig. 1 is the Tibetan language syntactic groups block identifying method schematic flow sheet of a kind of syllable granularity of the embodiment of the present invention.
Fig. 2 is the Tibetan language syntax chunk recognition device structural representation of a kind of syllable granularity of the embodiment of the present invention.
Embodiment
Remove to the technical scheme in the embodiment of the present invention, intactly describe below, obviously, described embodiment is only a part of embodiment of the present invention, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
The Tibetan language syntactic groups block identifying method of a kind of syllable granularity of the present invention, as shown in Figure 1, comprises the following steps:
One, Text Pretreatment: namely obtain Tibetan language urtext, and subordinate sentence process is carried out to it.
The mode adopting manual entry and web crawlers to crawl network text in the present embodiment creates and collects corpus data, then deletes wherein insignificant data, finally utilizes the method for rule to utilize subordinate sentence to identify the urtext obtained
cut into the text in units of simple sentence, concrete cutting statement example is as shown in table 1:
Table 1 Tibetan language Text Pretreatment (sentence cutting) example
Two, syntactic marker identification
(1) syntactic marker model of cognition training
Need in the present embodiment to adopt to S the syntactic marker model of cognition M that training in advance is good
1carry out identifying thus obtain syntactic marker type, therefore needing first to train syntactic marker model of cognition M by corpus
1due to: Tibetan language, in units of syllable, separates with ". " between syllable, forms word form sentence further by different terms collocation by the collocation of different syllables, this point and Chinese are very similar, and Chinese is also the sentence that composition is complete further in units of syllable.But modern Tibetan writes a kind of special writing style, namely sticks together form, two to three syllables can stick together together by it, and centre can not be split with ". ".The syntactic marker of Tibetan language is abundanter, syntactic marker mentioned here refers in modern Tibetan, the syntax chunk that some formal notations (comprising case marking and auxiliary word mark) are divided into function different sentence, as represent the time, place the adverbial modifier chunk after locative case may be had to mark, agentive case may be had after subject chunk to mark, may have object case marking etc. after object object chunk, these marks are bases of the chunk parsing based on syllable granularity.But some lattice and auxiliary word mark due to character calligraph reason cause two syllables be abbreviated as a syllable namely this section beginning described in stick together form, in order to lattice and auxiliary word mark can be made full use of, we not only need lattice and the auxiliary word mark of individual syllables, also need those to form the lattice that stick together in syllable and auxiliary word mark is separated, the common recognition feature forming syntactic function chunk border of these marks.Therefore, training syntactic marker model of cognition M herein
1trained by the mode of machine learning exactly, in the present embodiment, for condition random field (CRFs) model of cognition, training process be described:
First the text obtained step one carries out syntactic marker and manually marks, and marking types is divided into 6 types, SS, VV, RR, CC, M, N.Described SS represents syllable
(executing lattice/instrumental (case) mark) sticks together the syllable of form, and described VV represents syllable
(genetive marker) sticks together the syllable of form, and described RR represents syllable
(dative/position case marking) sticks together the syllable of form, and described CC represents syllable
(punctuation words) sticks together the syllable of form, and described M represents the non-case marking and the auxiliary word mark that stick together form individual syllables, and described N represents the syllable of non-syntactic marker, and the syllable namely except SS, VV, RR, CC, M all marks with N.
Stick together formal notation, instantiation is as shown in table 2.
Table 2 sticks together formal notation example
Secondly, the text obtained based on step one and artificial annotation results are set up specific CRFs feature templates for CRFs model training and are obtained final syntactic marker CRFs model of cognition M
1, therefore the selection of template is most important, and according to great many of experiments, the feature templates of CRFs is selected as follows in the present embodiment: syllable font style characteristic and syntactic marker; Syllable font style characteristic: the font getting current syllable and front and back adjacent syllable thereof is as feature, and as preferably, arranging window is 5, namely gets each two adjacent syllables of current syllable and front and back thereof; Syntactic marker: get the syntactic marker type of current word as syntactic marker.
Further, in Templated process, due to sentence beginning or end up less than two syllables before or after it, adopt and fill out NULL process, with sentence in step one (2)
for example, suppose that current syllable is
due to the beginning that this syllable is sentence, two syllables before it are empty, and therefore characteristic of correspondence is NULL and NULL, and its two syllables are below respectively
therefore this syllable
characteristic of correspondence is
nULL, NULL,
other words the like, what obtain is as shown in table 3 for the templating language material of training.
Table 3 syntactic marker recognition template example
(2) identify
Obtaining M
1after, only input language material need be constructed syllable font style characteristic according to the syllable font feature construction rule in feature templates and give M
1carry out identifying and can obtain syntactic marker result corresponding to font style characteristic, namely corresponding SS, CC, VV, RR, M, N.
Three, form reduction is sticked together
The recognition result obtained upper step is form of the sticking together syllable of SS, CC, VV, RR, and utilize the marker combination rule of the form of sticking together of step 2, opened by the syllable sticked together again cutting, centre is filled ". ", is reduced into two original individual syllables.As in step 2
it sticks together formal notation is RR, utilizes the rule of correspondence, is split as
concrete cutting rule is as shown in table 4, and wherein "/" represents the place carrying out cutting.
Table 4 cutting method example
Four, syntax chunk parsing
(1) syntactic marker model of cognition training
Need in this step to adopt to language material the syntax chunk model of cognition M that training in advance is good
2carry out chunk parsing and obtain chunk type recognition result, therefore need first to train syntax chunk model of cognition M by corpus
2, below or for CRFs model of cognition, training process is described:
First, on step 3 basis, chunk type mark is carried out to language material and sets up chunk type mark system.Described chunk type, comprising: subject block (S), predicate block (P), object block (O), adverbial modifier's block (D), complement block (C) and in order to process convenient and syntactic marker block (M) that is that set up.Syntactic marker mentioned here refers in modern Tibetan, the syntax chunk that some formal notations (comprising case marking and auxiliary word mark) are divided into function different sentence, as represent the time, place the adverbial modifier chunk after locative case may be had to mark, agentive case may be had after subject chunk to mark, may have object case marking etc. after object object chunk, these marks are bases of the chunk parsing based on syllable granularity.
Secondly, in this step, the object of carrying out training obtains syntax chunk model of cognition M exactly
2, and set up model of cognition, feature selecting is most important, and according to abundant experimental results, the specific feature templates of the present embodiment CRFs is selected as follows: syllable characteristic and chunk type mark thereof.
Syllable characteristic: get current syllable and front and back adjacent syllable thereof, and the syntactic marker of current syllable is as syllable characteristic, further, when setting window is 5, namely gets current syllable and its forward and backward each two syllables can obtain better effect.
In Templated process, due to the beginning of sentence or ending place before or after it less than two syllables, adopt and fill out NULL process, with sentence in table 1 (2)
for example, its correspondence to stick together formal notation as follows:
Adopted the method for step 3 to carry out cutting and obtained following sequence:
Easy in order to process in the present embodiment, syntactic marker SS, CC, VV and RR after cutting are abbreviated as S, C, V and R respectively.
Chunk type marks the block mainly marked belonging to current word, in order to the convenience of subsequent treatment, block belonging to current word not only marks by the present embodiment, also mark the position that it is residing in affiliated block, with 2 letter representations, this syllable of previous letter representation is in the position of chunk, B represents beginning, I represents middle, and E represents end, if only have a syllable, only represents with B; A rear letter representation chunk type, to punctuation mark
directly mark with B.Such as: " B-S " is expressed as the beginning syllable that this syllable is subject block, what obtain successively is as shown in table 5 for the templating example of training.
Table 5 syntactic groups block type recognition template example
(2) identify
Obtaining M
2after, only input language material need be constructed syllable characteristic according to the syllable characteristic structure rule in above-mentioned feature templates and give M
2carry out identifying and can obtain chunk type mark result corresponding to this syllable, the mark result of each syllable is updated to output in the standard corpus of step 3 and can obtains final syntactic groups block type recognition result.
According to the Tibetan language syntactic groups block identifying method of above-mentioned a kind of syllable granularity, the Tibetan language syntax chunk recognition device of a set of a kind of syllable granularity can be built, for carrying out chunk parsing to input language material and obtain the type of chunk, below this recognition device is described in detail:
The Tibetan language syntax chunk recognition device of a kind of syllable granularity of the present invention, as shown in Figure 2, comprise connect successively Text Pretreatment module, syntactic marker identification module, stick together form recovery module and chunk type identification module;
Text Pretreatment module is used for processing input language material text, and sentence cutting obtains the sentence that can be used for syntax chunk parsing;
Syntactic marker identification module is used for carrying out Syntactic Recognition to the sentence that Text Pretreatment module exports and obtains syntactic marker;
Stick together form recovery module to reduce to the form of sticking together in original sentence for the syntactic marker that exports according to syntactic marker identification module and obtain writing form sentence with the non-glutinous of syntactic marker;
Chunk type identification module is used for carrying out the identification of syntactic groups block type according to the sentence sticking together the output of form recovery module and obtains recognition result and export.
More than show and describe ultimate principle of the present invention and principal character and advantage of the present invention.The technician of the industry should understand; the present invention is not restricted to the described embodiments; what describe in above-described embodiment and instructions just illustrates principle of the present invention; without departing from the spirit and scope of the present invention; the present invention also has various changes and modifications; these changes and improvements are all in the claimed scope of the invention, and application claims protection domain is defined by appending claims and equivalent thereof.
Claims (10)
1. a Tibetan language syntactic groups block identifying method for syllable granularity, is characterized in that, comprise the following steps:
Step one: Text Pretreatment is carried out to input language material and obtains standardization sentence language material S;
Step 2: the syntactic marker model of cognition M that training in advance is good is adopted to S
1carry out identification and obtain syntactic marker type;
Step 3: the syntactic marker type obtained step 2 is that the text of form of sticking together reduces and obtains not containing the standard corpus of the form of sticking together;
Step 4: the syntax chunk model of cognition M that training in advance is good is adopted to the standard corpus that step 3 obtains
2carry out chunk parsing and obtain chunk type recognition result.
2. the Tibetan language syntactic groups block identifying method of a kind of syllable granularity according to claim 1, it is characterized in that, described Text Pretreatment comprises following content:
(1). create and collect corpus data, corpus Data Source includes but not limited to: teaching material, Scientific Magazine, periodical, newspaper and website Tibetan language text;
(2). pre-service is carried out to above-mentioned corpus data, deletes non-Tibetan language language text;
(3). carrying out sentence cutting to above-mentioned text further, is with the text sequence of sentence unit by material segmentation.
3. the Tibetan language syntactic groups block identifying method of a kind of syllable granularity according to claim 1, is characterized in that, the syntactic marker model of cognition M that described training in advance is good
1for being trained the model obtained by following steps:
(1). sentence by sentence syntactic marker mark is carried out to S, creates the syllable syntax mark system of corpus;
(2). language material after mark is brought into CRFs by specific feature templates, trains syntactic marker CRFs model of cognition.
4. the Tibetan language syntactic groups block identifying method of a kind of syllable granularity according to claim 3, is characterized in that, described specific feature templates is syllable font style characteristic and syntactic marker.
5. the Tibetan language syntactic groups block identifying method of a kind of syllable granularity according to claim 4, is characterized in that, described syllable font style characteristic gets each two adjacent syllables of current syllable and front and back thereof, and when before and after it, adjacent syllable is less than two, fills with NULL.
6. the Tibetan language syntactic groups block identifying method of a kind of syllable granularity according to claim 1, is characterized in that, described reduction to the form of sticking together is completed by following steps:
(1). according to syntactic marker type, be that the syllable of form of sticking together carries out syllable splitting according to different rules to the form of sticking together by wherein result, be reduced to single syllable text;
(2). after cutting, between the syllable split out, mend " ", complete the reduction work to raw material.
7. the Tibetan language syntactic groups block identifying method of a kind of syllable granularity according to claim 1, is characterized in that, the syntax chunk model of cognition M that described training in advance is good
2for being trained the model obtained by following steps:
(1). chunk type mark is carried out to each sentence, creates the chunk mark system of corpus, namely enter in different chunk type to each syllabification in each;
(2). above-mentioned language material is brought into CRFs by specific feature templates, trains chunk type CRFs model of cognition.
8. according to the Tibetan language syntactic groups block identifying method of the arbitrary described a kind of syllable granularity of claim 1-7, it is characterized in that, described specific feature templates is syllable characteristic and chunk type mark thereof.
9. the Tibetan language syntactic groups block identifying method of a kind of syllable granularity according to claim 7, is characterized in that, described syllable characteristic gets each two adjacent syllables of current syllable and front and back thereof, and the syntactic marker of current syllable.
10. a Tibetan language syntax chunk recognition device for syllable granularity, is characterized in that, comprise connect successively Text Pretreatment module, syntactic marker identification module, stick together form recovery module and chunk type identification module;
Text Pretreatment module is used for carrying out non-Tibetan language language delete processing to input language material text, and sentence cutting obtains the sentence that can be used for syntax chunk parsing;
The sentence that syntactic marker identification module is used for Text Pretreatment module exports adopts the syntactic marker model of cognition M that training in advance is good
1carry out Syntactic Recognition and obtain syntactic marker;
Stick together form recovery module to reduce to the form of sticking together in original sentence for the syntactic marker that exports according to syntactic marker identification module and obtain writing form sentence with the non-glutinous of syntactic marker;
The sentence that chunk type identification module is used for according to sticking together the output of form recovery module adopts the syntax chunk model of cognition M that training in advance is good
2carry out the identification of syntactic groups block type obtain recognition result and export.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510711234.1A CN105426355A (en) | 2015-10-28 | 2015-10-28 | Syllabic size based method and apparatus for identifying Tibetan syntax chunk |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510711234.1A CN105426355A (en) | 2015-10-28 | 2015-10-28 | Syllabic size based method and apparatus for identifying Tibetan syntax chunk |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105426355A true CN105426355A (en) | 2016-03-23 |
Family
ID=55504569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510711234.1A Pending CN105426355A (en) | 2015-10-28 | 2015-10-28 | Syllabic size based method and apparatus for identifying Tibetan syntax chunk |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105426355A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992479A (en) * | 2017-12-25 | 2018-05-04 | 北京牡丹电子集团有限责任公司数字电视技术中心 | Word rank Chinese Text Chunking method based on transfer method |
CN108595434A (en) * | 2018-05-02 | 2018-09-28 | 武汉烽火普天信息技术有限公司 | A kind of interdependent method of syntax based on condition random field and rule adjustment |
CN109871537A (en) * | 2019-01-31 | 2019-06-11 | 沈阳雅译网络技术有限公司 | A kind of high-precision Thai subordinate sentence method |
CN112951206A (en) * | 2021-02-08 | 2021-06-11 | 天津大学 | Tibetan Tibet dialect spoken language identification method based on deep time delay neural network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101446943A (en) * | 2008-12-10 | 2009-06-03 | 苏州大学 | Reference and counteraction method based on semantic role information in Chinese character processing |
CN104239294A (en) * | 2014-09-10 | 2014-12-24 | 华建宇通科技(北京)有限责任公司 | Multi-strategy Tibetan long sentence segmentation method for Tibetan to Chinese translation system |
-
2015
- 2015-10-28 CN CN201510711234.1A patent/CN105426355A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101446943A (en) * | 2008-12-10 | 2009-06-03 | 苏州大学 | Reference and counteraction method based on semantic role information in Chinese character processing |
CN104239294A (en) * | 2014-09-10 | 2014-12-24 | 华建宇通科技(北京)有限责任公司 | Multi-strategy Tibetan long sentence segmentation method for Tibetan to Chinese translation system |
Non-Patent Citations (6)
Title |
---|
TIANHANG WANG ET AL.: "Research on Recognition of Semantic Chunk Boundary in Tibetan", 《2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING》 * |
康才畯 等: "基于词位的藏文黏写形式的切分", 《计算机工程与应用》 * |
李琳 等: "藏语句法功能组块的边界识别", 《中文信息学报》 * |
王天航 等: "基于错误驱动学习策略的藏语句法功能组块边界识别", 《中文信息学报》 * |
王天航 等: "基于音节的藏语功能组块边界识别", 《第十届全国机器翻译研讨会》 * |
龙从军 等: "基于多策略的藏语语义角色标注研究", 《中文信息学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992479A (en) * | 2017-12-25 | 2018-05-04 | 北京牡丹电子集团有限责任公司数字电视技术中心 | Word rank Chinese Text Chunking method based on transfer method |
CN108595434A (en) * | 2018-05-02 | 2018-09-28 | 武汉烽火普天信息技术有限公司 | A kind of interdependent method of syntax based on condition random field and rule adjustment |
CN109871537A (en) * | 2019-01-31 | 2019-06-11 | 沈阳雅译网络技术有限公司 | A kind of high-precision Thai subordinate sentence method |
CN109871537B (en) * | 2019-01-31 | 2022-12-27 | 沈阳雅译网络技术有限公司 | High-precision Thai sentence segmentation method |
CN112951206A (en) * | 2021-02-08 | 2021-06-11 | 天津大学 | Tibetan Tibet dialect spoken language identification method based on deep time delay neural network |
CN112951206B (en) * | 2021-02-08 | 2023-03-17 | 天津大学 | Tibetan Tibet dialect spoken language identification method based on deep time delay neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104408078B (en) | A kind of bilingual Chinese-English parallel corpora base construction method based on keyword | |
CN100568225C (en) | The Words symbolization processing method and the system of numeral and special symbol string in the text | |
CN104346319B (en) | Method and system for inspecting document style | |
DE60207593D1 (en) | A PRINTER SYSTEM | |
CN110046261A (en) | A kind of construction method of the multi-modal bilingual teaching mode of architectural engineering | |
DE102018007165A1 (en) | FORECASTING STYLES WITHIN A TEXT CONTENT | |
CN107392143A (en) | A kind of resume accurate Analysis method based on SVM text classifications | |
CN105159870A (en) | Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization | |
CN109241540A (en) | A kind of blind automatic switching method of Chinese based on deep neural network and system | |
Das et al. | A novel system for generating simple sentences from complex and compound sentences | |
CN105068990B (en) | A kind of English long sentence dividing method of more strategies of Machine oriented translation | |
CN109933796A (en) | A kind of bulletin text key message extracting method and equipment | |
CN105426355A (en) | Syllabic size based method and apparatus for identifying Tibetan syntax chunk | |
CN108664474A (en) | A kind of resume analytic method based on deep learning | |
CN104317786A (en) | Method and system for segmenting text paragraphs | |
CN103324607B (en) | Word method and device cut by a kind of Thai text | |
CN111178088A (en) | Configurable neural machine translation method oriented to XML document | |
CN111563372B (en) | Typesetting document content self-duplication checking method based on teaching book publishing | |
CN107526717B (en) | Method for automatically generating natural language text by structured process model | |
CN105740355A (en) | Aggregated text density based webpage body text extraction method and apparatus | |
CN106257442A (en) | Computer-aided translation method | |
Haaf et al. | Enabling the Encoding of Manuscripts within the DTABf: Extension and Modularization of the Format | |
CN109344389B (en) | Method and system for constructing Chinese blind comparison bilingual corpus | |
CN105447027A (en) | Acquisition method and device of PDF (portable document format) document directory | |
CN111897958B (en) | Ancient poetry classification method based on natural language processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160323 |