CN107423292A - The bilingual name syllable alignment schemes of the card Chinese based on layering Di Li Cray processes - Google Patents
The bilingual name syllable alignment schemes of the card Chinese based on layering Di Li Cray processes Download PDFInfo
- Publication number
- CN107423292A CN107423292A CN201710484050.5A CN201710484050A CN107423292A CN 107423292 A CN107423292 A CN 107423292A CN 201710484050 A CN201710484050 A CN 201710484050A CN 107423292 A CN107423292 A CN 107423292A
- Authority
- CN
- China
- Prior art keywords
- chinese
- card
- name
- bilingual
- language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 239000000463 material Substances 0.000 claims abstract description 65
- 238000000605 extraction Methods 0.000 claims description 9
- 238000013459 approach Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 238000004321 preservation Methods 0.000 claims description 2
- 238000013519 translation Methods 0.000 abstract description 7
- 230000000877 morphologic effect Effects 0.000 abstract description 3
- 238000011160 research Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 abstract 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Abstract
The present invention relates to the bilingual name syllable alignment schemes of the card Chinese based on layering Di Li Cray processes, belong to natural language processing technique field.The present invention includes extracting the bilingual name pair of the card Chinese first, then pretreatment operation is carried out to the bilingual name language material being drawn into, secondly according to layering dirichlet principle, carry out realizing HDP models using coded system, finally processed good language material is input in HDP models, obtains card Chinese bilingual alignment result.Strong support is provided for work such as the card Chinese bilingual name translation, morphological analysis, syntactic analysis and machine translation;The research report of the name syllable alignment of correlation is done currently without the discovery card Chinese, the present invention achieves good effect.
Description
Technical field
The present invention relates to the bilingual name syllable alignment schemes of card-Chinese based on layering Di Li Cray processes, belong to nature language
Say processing technology field.
Background technology
Card-Hans' name syllable alignment is the key link in the work such as participle, part-of-speech tagging, is the base of other higher layer applications
Plinth, play an important role.In all kinds of cards-Chinese information processing software or system, the translation of card-Hans' name is can not
Or scarce work.With the continuous improvement of the Internet search technology, card-Hans' name syllable alignment schemes also increasingly enjoy pass
Note, the degree of card-Hans' name syllable alignment decide the accuracy rate of search;Correctly alignment can carry card-Hans' name syllable simultaneously
The application effects such as morphological analysis, syntactic analysis, semantic analysis and the machine translation on high card language upper strata.
The content of the invention
The invention provides it is a kind of based on layering Di Li Cray processes the bilingual name syllable alignment schemes of card-Chinese, with
In solving the problems, such as the alignment of Kampuchean-Chinese name syllable.
The technical scheme is that:Based on the bilingual name syllable alignment schemes of card-Chinese of layering Di Li Cray processes, institute
State comprising the following steps that for method:
Step1, first according to webpage feature, manual compiling program obtains the bilingual name language material of card-Chinese, and by the bilingual people of card-Chinese
Name language material is pre-processed, and the bilingual name of card-Chinese required for obtaining HDP mode inputs is carried out to language material, then to gained language material
Cutting, preservation, obtain card language name character string and Chinese personal name Chinese character sequence language material;
Step2, the bilingual name syllable alignment of card-Chinese is carried out using the Dirichlet unsupervised approaches of layering, secondly according to layering
Dirichlet principle, manual compiling program realize layering Di Li Cray processes, realize the layering of the bilingual name syllable alignment of card-Chinese
Di Li Cray models;
Step3, it is input to obtained card language name character string and Chinese personal name Chinese character sequence pair language material as input language material
In the layering Di Li Cray models of the bilingual name syllable alignment of card-Chinese built, the bilingual name syllable alignment knot of card-Chinese is obtained
Fruit, and result is stored in database.
The step Step1 is comprised the following steps that:
Step1.1, first according to structure of web page, and carry out webpage characteristic analysis, manual compiling program used with reference to webpage feature
Mode, swashed from card-Chinese bilingual Web sites and get the bilingual parallel corpus of text of card-Chinese, and be saved in database;
Step1.2, the bilingual parallel corpus of text of card-Chinese got, by removing noise, going garbage disposal, construct sentence
Card-Chinese bilingual teaching mode of level, and it is stored in database;
Step1.3, from Step1.2 take out card-Chinese bilingual sentence level parallel corpora, using name entity extraction tool to card-
Chinese bilingual sentence level parallel corpora carries out the bilingual name identification of card-Chinese, obtains the bilingual name of card-Chinese to corpus, and be stored in
In database;
Step1.4, the bilingual name of card-Chinese is taken out to language material from Step1.3 databases, entered using card language character string cutting instrument
Row card language name character string cutting, card language name character string language material is obtained, and be Chinese personal name Chinese character sequence by the cutting of Chinese name
Row, obtain card language name character string and Chinese personal name Chinese character sequence language material, and be stored in database.
The step Step2's comprises the following steps that:
Step2.1, take out card language name character string and Chinese personal name Chinese character sequence pair language material;
Step2.2, to statistical analysis of the bilingual name of card-Chinese to language material, the bilingual people of card-Chinese is carried out using unsupervised learning method
Name syllable alignment;
Step2.3, according to layering dirichlet principle, manual compiling program realizes layering Di Li Cray processes, realizes that card-Chinese is double
The layering Di Li Cray models of language name syllable alignment.
The specific steps of the step Step1.1:
Step1.1.1, card-Chinese material website is artificially collected first, select card-Chinese bilingual parallel corporas website, deposit
Into database;
Step1.1.2, the structure according to card-Chinese bilingual web page, webpage feature is analyzed, it is bilingual parallel that manual compiling extracts card-Chinese
Language material program simultaneously combines the characteristics of having analyzed, and extraction obtains the bilingual parallel corpus of text of card-Chinese, and is stored in database.
The specific steps of the step Step1.2:
Step1.2.1, the bilingual parallel corpus of text of card-Chinese is taken out from database, the corpus of text of extraction is carried out effectively
Filtering, invalid information and label are removed, obtains noiseless language material;
Step1.2.2, the noiseless language material manually obtained to Step1.2.1 carry out sentence cutting, and it is bilingual to obtain Sentence-level card-Chinese
Parallel corpora, and be saved in database.
The specific steps of the step Step1.3:
Step1.3.1, card-Chinese bilingual sentence level parallel corpora is taken out from Step1.2 databases, obtain card-Chinese bilingual sentence
Level parallel corpora;
Step1.3.2, card-Chinese bilingual sentence level parallel corpora is obtained from Step1.3.1, extracted using existing name entity
Instrument carries out the identification of card language name to card sentence in card-Chinese bilingual sentence level parallel corpora, obtains card language name corpus;
Step1.3.3, card-Chinese bilingual sentence level parallel corpora is obtained from Step1.3.1, extracted using existing name entity
Instrument carries out Chinese name identification to the Chinese sentence in card-Chinese bilingual sentence level parallel corpora, obtains Chinese name corpus;
Step1.3.4, the bilingual name of card-Chinese obtained in Step1.3.2 and Step1.3.3 is stored in database to language material
In.
The specific steps of the step Step1.4:
Step1.4.1, the bilingual name of card-Chinese is taken out to language material from Step1.3 databases, obtain the bilingual name of card-Chinese to language
Material;
Step1.4.2, the bilingual name entity language material of card-Chinese is obtained from Step1.4.1, by the card of the bilingual name centering of card-Chinese
Language name carries out card language name character string cutting using card language character string cutting instrument, obtains card language name character string language material, and
It is stored in database;
Step1.4.3, the bilingual name entity language material of card-Chinese is obtained from Step1.4.1, by the Chinese of the bilingual name centering of card-Chinese
The cutting of language name is Chinese personal name Chinese character sequence, obtains Chinese personal name Chinese character sequence and is stored in database.
The beneficial effects of the invention are as follows:
1st, set forth herein based on layering Di Li Cray processes the bilingual name syllable alignment schemes of card-Chinese, to card-Hans' name sound
Section realizes effective alignment, is provided strong support for morphological analysis, syntactic analysis and the translation of upper strata machine name.
2nd, at present, card-Hans' name syllable alignment research is very few, is not available for the resource of research, makes up card-Hans' name herein
The blank in syllable alignment field.
3rd, herein by compared with GIZA++, context of methods is better than GIZA++ model performances in performance.
Brief description of the drawings
Fig. 1 is total flow chart of card in the present invention-Hans' name translation;
Fig. 2 is the modeling procedure figure of card in the present invention-Hans' name translation.
Embodiment
Embodiment 1:As shown in Figure 1-2, a kind of bilingual name syllable alignment side of card-Chinese based on layering Di Li Cray processes
Method, methods described comprise the following steps that:
Step1, first according to webpage feature, manual compiling program obtains the bilingual name language material of card-Chinese, and by the bilingual people of card-Chinese
Name language material is pre-processed, and obtains HDP models(It is layered Di Li Cray models)The bilingual name of card-Chinese required for input is to language
Material, then cutting is carried out to gained language material, preserved, facilitate follow-up work to use, obtain card language name character string and the Chinese personal name Chinese
Word sequence language material;
The step Step1 is comprised the following steps that:
Step1.1, first according to structure of web page, and carry out webpage characteristic analysis, manual compiling program used with reference to webpage feature
Mode, swashed from card-Chinese bilingual Web sites and get the bilingual parallel corpus of text of card-Chinese, and be saved in database, facilitate follow-up work
Use;
The specific steps of the step Step1.1:
Step1.1.1, card-Chinese material website is artificially collected first, select card-Chinese bilingual parallel corporas website, deposit
Into database;
Step1.1.2, the structure according to card-Chinese bilingual web page, webpage feature is analyzed, it is bilingual parallel that manual compiling extracts card-Chinese
Language material program simultaneously combines the characteristics of having analyzed, and extraction obtains the bilingual parallel corpus of text of card-Chinese, and is stored in database.
Step1.2, the bilingual parallel corpus of text of card-Chinese got, by removing noise, going garbage disposal, construct
The card of Sentence-level-Chinese bilingual teaching mode, and database is stored in, facilitate follow-up work to use;
The specific steps of the step Step1.2:
Step1.2.1, the bilingual parallel corpus of text of card-Chinese is taken out from database, the corpus of text of extraction is carried out effectively
Filtering, invalid information and label are removed, obtains noiseless language material;
Step1.2.2, the noiseless language material manually obtained to Step1.2.1 carry out sentence cutting, and it is bilingual to obtain Sentence-level card-Chinese
Parallel corpora, and be saved in database.
Step1.3, card-Chinese bilingual sentence level parallel corpora is taken out from Step1.2, use name entity extraction tool pair
Card-Chinese bilingual sentence level parallel corpora carries out the bilingual name identification of card-Chinese, obtains the bilingual name of card-Chinese to corpus, and deposit
Into database, follow-up work is facilitated to use;
The specific steps of the step Step1.3:
Step1.3.1, card-Chinese bilingual sentence level parallel corpora is taken out from Step1.2 databases, obtain card-Chinese bilingual sentence
Level parallel corpora;
Step1.3.2, card-Chinese bilingual sentence level parallel corpora is obtained from Step1.3.1, extracted using existing name entity
Instrument carries out the identification of card language name to card sentence in card-Chinese bilingual sentence level parallel corpora, obtains card language name corpus;
Step1.3.3, card-Chinese bilingual sentence level parallel corpora is obtained from Step1.3.1, extracted using existing name entity
Instrument carries out Chinese name identification to the Chinese sentence in card-Chinese bilingual sentence level parallel corpora, obtains Chinese name corpus;
Step1.3.4, the bilingual name of card-Chinese obtained in Step1.3.2 and Step1.3.3 is stored in database to language material
In, facilitate follow-up work to use.
Step1.4, the bilingual name of card-Chinese is taken out to language material from Step1.3 databases, using card language character string cutting work
Tool carries out card language name character string cutting, obtains card language name character string language material, and be the Chinese personal name Chinese by the cutting of Chinese name
Word sequence, card language name character string and Chinese personal name Chinese character sequence language material are obtained, and be stored in database, facilitate follow-up work
Use.
The specific steps of the step Step1.4:
Step1.4.1, the bilingual name of card-Chinese is taken out to language material from Step1.3 databases, obtain the bilingual name of card-Chinese to language
Material;
Step1.4.2, the bilingual name entity language material of card-Chinese is obtained from Step1.4.1, by the card of the bilingual name centering of card-Chinese
Language name carries out card language name character string cutting using card language character string cutting instrument, obtains card language name character string language material, and
It is stored in database, facilitates follow-up work to use;
Step1.4.3, the bilingual name entity language material of card-Chinese is obtained from Step1.4.1, by the Chinese of the bilingual name centering of card-Chinese
The cutting of language name is Chinese personal name Chinese character sequence, obtains Chinese personal name Chinese character sequence and is stored in database.
Step2, the bilingual name syllable alignment of card-Chinese is carried out using the Dirichlet unsupervised approaches of layering, secondly basis
Dirichlet principle is layered, manual compiling program realizes layering Di Li Cray processes, realizes the bilingual name syllable alignment of card-Chinese
It is layered Di Li Cray models;
The step Step2's comprises the following steps that:
Step2.1, take out card language name character string and Chinese personal name Chinese character sequence pair language material;
Step2.2, to statistical analysis of the bilingual name of card-Chinese to language material, the bilingual people of card-Chinese is carried out using unsupervised learning method
Name syllable alignment;
Step2.3, according to layering dirichlet principle, manual compiling program realizes layering Di Li Cray processes, realizes that card-Chinese is double
The layering Di Li Cray models of language name syllable alignment.
Step3, using obtained card language name character string and Chinese personal name Chinese character sequence pair language material as input language material input
Into the layering Di Li Cray models of the bilingual name syllable alignment of card-Chinese built, the bilingual name syllable pair of card-Chinese is obtained
Neat result, and result is stored in database, facilitate follow-up work to use.
Above in conjunction with accompanying drawing to the present invention embodiment be explained in detail, but the present invention be not limited to it is above-mentioned
Embodiment, can also be before present inventive concept not be departed from those of ordinary skill in the art's possessed knowledge
Put that various changes can be made.
Claims (8)
1. the bilingual name syllable alignment schemes of card-Chinese based on layering Di Li Cray processes, it is characterised in that:The tool of methods described
Body step is as follows:
Step1, first according to webpage feature, manual compiling program obtains the bilingual name language material of card-Chinese, and by the bilingual people of card-Chinese
Name language material is pre-processed, and the bilingual name of card-Chinese required for obtaining HDP mode inputs is carried out to language material, then to gained language material
Cutting, preservation, obtain card language name character string and Chinese personal name Chinese character sequence language material;
Step2, the bilingual name syllable alignment of card-Chinese is carried out using the Dirichlet unsupervised approaches of layering, secondly according to layering
Dirichlet principle, manual compiling program realize layering Di Li Cray processes, realize the layering of the bilingual name syllable alignment of card-Chinese
Di Li Cray models;
Step3, it is input to obtained card language name character string and Chinese personal name Chinese character sequence pair language material as input language material
In the layering Di Li Cray models of the bilingual name syllable alignment of card-Chinese built, the bilingual name syllable alignment knot of card-Chinese is obtained
Fruit, and result is stored in database.
2. card-Chinese bilingual name syllable alignment schemes according to claim 1 based on layering Di Li Cray processes, it is special
Sign is:The step Step1 is comprised the following steps that:
Step1.1, first according to structure of web page, and carry out webpage characteristic analysis, manual compiling program used with reference to webpage feature
Mode, swashed from card-Chinese bilingual Web sites and get the bilingual parallel corpus of text of card-Chinese, and be saved in database;
Step1.2, the bilingual parallel corpus of text of card-Chinese got, by removing noise, going garbage disposal, construct sentence
Card-Chinese bilingual teaching mode of level, and it is stored in database;
Step1.3, from Step1.2 take out card-Chinese bilingual sentence level parallel corpora, using name entity extraction tool to card-
Chinese bilingual sentence level parallel corpora carries out the bilingual name identification of card-Chinese, obtains the bilingual name of card-Chinese to corpus, and be stored in
In database;
Step1.4, the bilingual name of card-Chinese is taken out to language material from Step1.3 databases, entered using card language character string cutting instrument
Row card language name character string cutting, card language name character string language material is obtained, and be Chinese personal name Chinese character sequence by the cutting of Chinese name
Row, obtain card language name character string and Chinese personal name Chinese character sequence language material, and be stored in database.
3. card-Chinese bilingual name syllable alignment schemes according to claim 1 based on layering Di Li Cray processes, it is special
Sign is:The step Step2's comprises the following steps that:
Step2.1, take out card language name character string and Chinese personal name Chinese character sequence pair language material;
Step2.2, to statistical analysis of the bilingual name of card-Chinese to language material, the bilingual people of card-Chinese is carried out using unsupervised learning method
Name syllable alignment;
Step2.3, according to layering dirichlet principle, manual compiling program realizes layering Di Li Cray processes, realizes that card-Chinese is double
The layering Di Li Cray models of language name syllable alignment.
4. card-Chinese bilingual name syllable alignment schemes according to claim 2 based on layering Di Li Cray processes, it is special
Sign is:The specific steps of the step Step1.1:
Step1.1.1, card-Chinese material website is artificially collected first, select card-Chinese bilingual parallel corporas website, deposit
Into database;
Step1.1.2, the structure according to card-Chinese bilingual web page, webpage feature is analyzed, it is bilingual parallel that manual compiling extracts card-Chinese
Language material program simultaneously combines the characteristics of having analyzed, and extraction obtains the bilingual parallel corpus of text of card-Chinese, and is stored in database.
5. card-Chinese bilingual name syllable alignment schemes according to claim 2 based on layering Di Li Cray processes, it is special
Sign is:The specific steps of the step Step1.2:
Step1.2.1, the bilingual parallel corpus of text of card-Chinese is taken out from database, the corpus of text of extraction is carried out effectively
Filtering, invalid information and label are removed, obtains noiseless language material;
Step1.2.2, the noiseless language material manually obtained to Step1.2.1 carry out sentence cutting, and it is bilingual to obtain Sentence-level card-Chinese
Parallel corpora, and be saved in database.
6. card-Chinese bilingual name syllable alignment schemes according to claim 2 based on layering Di Li Cray processes, it is special
Sign is:The specific steps of the step Step1.3:
Step1.3.1, card-Chinese bilingual sentence level parallel corpora is taken out from Step1.2 databases, obtain card-Chinese bilingual sentence
Level parallel corpora;
Step1.3.2, card-Chinese bilingual sentence level parallel corpora is obtained from Step1.3.1, extracted using existing name entity
Instrument carries out the identification of card language name to card sentence in card-Chinese bilingual sentence level parallel corpora, obtains card language name corpus;
Step1.3.3, card-Chinese bilingual sentence level parallel corpora is obtained from Step1.3.1, extracted using existing name entity
Instrument carries out Chinese name identification to the Chinese sentence in card-Chinese bilingual sentence level parallel corpora, obtains Chinese name corpus;
Step1.3.4, the bilingual name of card-Chinese obtained in Step1.3.2 and Step1.3.3 is stored in database to language material
In.
7. card-Chinese bilingual name syllable alignment schemes according to claim 2 based on layering Di Li Cray processes, it is special
Sign is:The specific steps of the step Step1.4:
Step1.4.1, the bilingual name of card-Chinese is taken out to language material from Step1.3 databases, obtain the bilingual name of card-Chinese to language
Material;
Step1.4.2, the bilingual name entity language material of card-Chinese is obtained from Step1.4.1, by the card of the bilingual name centering of card-Chinese
Language name carries out card language name character string cutting using card language character string cutting instrument, obtains card language name character string language material, and
It is stored in database;
Step1.4.3, the bilingual name entity language material of card-Chinese is obtained from Step1.4.1, by the Chinese of the bilingual name centering of card-Chinese
The cutting of language name is Chinese personal name Chinese character sequence, obtains Chinese personal name Chinese character sequence and is stored in database.
8. card-Chinese bilingual name syllable alignment schemes according to claim 2 based on layering Di Li Cray processes, it is special
Sign is:In the step Step1.4:Constructing the bilingual name entity storehouse of card-Chinese includes 1468.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710484050.5A CN107423292A (en) | 2017-06-23 | 2017-06-23 | The bilingual name syllable alignment schemes of the card Chinese based on layering Di Li Cray processes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710484050.5A CN107423292A (en) | 2017-06-23 | 2017-06-23 | The bilingual name syllable alignment schemes of the card Chinese based on layering Di Li Cray processes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107423292A true CN107423292A (en) | 2017-12-01 |
Family
ID=60427350
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710484050.5A Pending CN107423292A (en) | 2017-06-23 | 2017-06-23 | The bilingual name syllable alignment schemes of the card Chinese based on layering Di Li Cray processes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107423292A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104965925A (en) * | 2015-07-13 | 2015-10-07 | 广西达译商务服务有限责任公司 | Automatic Chinese-Khmer bilingual parallel text acquisition system and implementation method |
CN105095194A (en) * | 2014-05-23 | 2015-11-25 | 富士通株式会社 | Method and equipment for extraction of name dictionary and translation rule table |
CN105138548A (en) * | 2015-07-13 | 2015-12-09 | 广西达译商务服务有限责任公司 | System for automatically collecting Chinese-Thai bilingual parallel corpus and implementation method |
US20160253679A1 (en) * | 2015-02-24 | 2016-09-01 | Thomson Reuters Global Resources | Brand abuse monitoring system with infringement deteciton engine and graphical user interface |
CN106776560A (en) * | 2016-12-15 | 2017-05-31 | 昆明理工大学 | A kind of Kampuchean organization name recognition method |
-
2017
- 2017-06-23 CN CN201710484050.5A patent/CN107423292A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095194A (en) * | 2014-05-23 | 2015-11-25 | 富士通株式会社 | Method and equipment for extraction of name dictionary and translation rule table |
US20160253679A1 (en) * | 2015-02-24 | 2016-09-01 | Thomson Reuters Global Resources | Brand abuse monitoring system with infringement deteciton engine and graphical user interface |
CN104965925A (en) * | 2015-07-13 | 2015-10-07 | 广西达译商务服务有限责任公司 | Automatic Chinese-Khmer bilingual parallel text acquisition system and implementation method |
CN105138548A (en) * | 2015-07-13 | 2015-12-09 | 广西达译商务服务有限责任公司 | System for automatically collecting Chinese-Thai bilingual parallel corpus and implementation method |
CN106776560A (en) * | 2016-12-15 | 2017-05-31 | 昆明理工大学 | A kind of Kampuchean organization name recognition method |
Non-Patent Citations (2)
Title |
---|
小木: "层次狄利克雷过程", 《HTTPS://WWW.DATALEARNER.COM/BLOG/1051487944219663》 * |
李婷婷: "基于非参数贝叶斯学习的多语言人名音译研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106570148B (en) | A kind of attribute extraction method based on convolutional neural networks | |
CN107463607B (en) | Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning | |
CN105022725B (en) | A kind of text emotion trend analysis method applied to finance Web fields | |
CN104408078B (en) | A kind of bilingual Chinese-English parallel corpora base construction method based on keyword | |
CN109408642A (en) | A kind of domain entities relation on attributes abstracting method based on distance supervision | |
CN107861947B (en) | Method for identifying invitation named entities based on cross-language resources | |
CN105956052A (en) | Building method of knowledge map based on vertical field | |
CN107704558A (en) | A kind of consumers' opinions abstracting method and system | |
CN104199972A (en) | Named entity relation extraction and construction method based on deep learning | |
CN109271644A (en) | A kind of translation model training method and device | |
CN103886034A (en) | Method and equipment for building indexes and matching inquiry input information of user | |
CN103116578A (en) | Translation method integrating syntactic tree and statistical machine translation technology and translation device | |
CN102253930A (en) | Method and device for translating text | |
CN104899188A (en) | Problem similarity calculation method based on subjects and focuses of problems | |
CN104750820A (en) | Filtering method and device for corpuses | |
CN106126505B (en) | Parallel phrase learning method and device | |
CN104699797A (en) | Webpage data structured analytic method and device | |
CN109033166A (en) | A kind of character attribute extraction training dataset construction method | |
CN110134934A (en) | Text emotion analysis method and device | |
CN110674378A (en) | Chinese semantic recognition method based on cosine similarity and minimum editing distance | |
CN107436931B (en) | Webpage text extraction method and device | |
CN106202038A (en) | Synonym method for digging based on iteration and device | |
CN113407842B (en) | Model training method, theme recommendation reason acquisition method and system and electronic equipment | |
CN111061873A (en) | Multi-channel text classification method based on Attention mechanism | |
CN107451116A (en) | Raw big data statistical analysis technique in a kind of Mobile solution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171201 |