CN110046261B - Construction method of multi-modal bilingual parallel corpus of construction engineering - Google Patents
Construction method of multi-modal bilingual parallel corpus of construction engineering Download PDFInfo
- Publication number
- CN110046261B CN110046261B CN201910323653.6A CN201910323653A CN110046261B CN 110046261 B CN110046261 B CN 110046261B CN 201910323653 A CN201910323653 A CN 201910323653A CN 110046261 B CN110046261 B CN 110046261B
- Authority
- CN
- China
- Prior art keywords
- corpus
- translation
- construction
- text
- bilingual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 claims abstract description 17
- 230000011218 segmentation Effects 0.000 claims abstract description 10
- 230000001915 proofreading effect Effects 0.000 claims abstract description 8
- 238000012216 screening Methods 0.000 claims abstract description 7
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 238000013519 translation Methods 0.000 claims description 78
- 239000000463 material Substances 0.000 claims description 17
- 230000000007 visual effect Effects 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 4
- BUGBHKTXTAQXES-UHFFFAOYSA-N Selenium Chemical compound [Se] BUGBHKTXTAQXES-UHFFFAOYSA-N 0.000 claims description 3
- 229910052711 selenium Inorganic materials 0.000 claims description 3
- 239000011669 selenium Substances 0.000 claims description 3
- 239000004035 construction material Substances 0.000 claims description 2
- 238000012937 correction Methods 0.000 claims description 2
- 230000008676 import Effects 0.000 claims 2
- 238000004140 cleaning Methods 0.000 claims 1
- 230000009193 crawling Effects 0.000 claims 1
- 238000003384 imaging method Methods 0.000 claims 1
- 238000013518 transcription Methods 0.000 claims 1
- 230000035897 transcription Effects 0.000 claims 1
- 238000012800 visualization Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 2
- 230000014616 translation Effects 0.000 description 76
- 238000002372 labelling Methods 0.000 description 8
- 238000011160 research Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 239000004566 building material Substances 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000009435 building construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910323653.6A CN110046261B (en) | 2019-04-22 | 2019-04-22 | Construction method of multi-modal bilingual parallel corpus of construction engineering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910323653.6A CN110046261B (en) | 2019-04-22 | 2019-04-22 | Construction method of multi-modal bilingual parallel corpus of construction engineering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110046261A CN110046261A (en) | 2019-07-23 |
CN110046261B true CN110046261B (en) | 2022-01-21 |
Family
ID=67278357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910323653.6A Active CN110046261B (en) | 2019-04-22 | 2019-04-22 | Construction method of multi-modal bilingual parallel corpus of construction engineering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110046261B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543844A (en) * | 2019-08-26 | 2019-12-06 | 中电科大数据研究院有限公司 | A Metadata Extraction Method for Government Affairs Metadata PDF Files |
CN110889295B (en) * | 2019-09-12 | 2021-10-01 | 华为技术有限公司 | Machine translation model, method, system and device for determining pseudo-professional parallel corpus |
CN110942765B (en) * | 2019-11-11 | 2022-05-27 | 珠海格力电器股份有限公司 | Method, device, server and storage medium for constructing corpus |
CN111209461A (en) * | 2019-12-30 | 2020-05-29 | 成都理工大学 | Bilingual corpus collection system based on public identification words |
CN111221965A (en) * | 2019-12-30 | 2020-06-02 | 成都信息工程大学 | Classification and sampling detection method based on bilingual corpus of public signs |
CN111241784A (en) * | 2019-12-30 | 2020-06-05 | 成都理工大学 | Methods of processing and sorting out public signage corpus resources |
CN112016604B (en) * | 2020-08-19 | 2021-03-26 | 华东师范大学 | Zero-resource machine translation method applying visual information |
CN112085985B (en) * | 2020-08-20 | 2022-05-10 | 安徽七天网络科技有限公司 | Student answer automatic scoring method for English examination translation questions |
CN114626390A (en) * | 2020-12-12 | 2022-06-14 | 郑州宝冶钢结构有限公司 | Method for improving translation efficiency based on steel structure engineering parallel corpus |
CN113268980A (en) * | 2021-04-29 | 2021-08-17 | 赵天诚 | Text recognition method and device, terminal equipment and storage medium |
CN115423578B (en) * | 2022-09-01 | 2023-12-05 | 广东博成网络科技有限公司 | Bid bidding method and system based on micro-service containerized cloud platform |
CN115688811A (en) * | 2022-09-20 | 2023-02-03 | 甲骨易(北京)语言科技股份有限公司 | Corpus alignment method combining rules and semantics |
CN118170933B (en) * | 2024-05-13 | 2024-08-13 | 之江实验室 | A method and device for constructing multimodal corpus data in scientific fields |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8825466B1 (en) * | 2007-06-08 | 2014-09-02 | Language Weaver, Inc. | Modification of annotated bilingual segment pairs in syntax-based machine translation |
CN104657351A (en) * | 2015-02-12 | 2015-05-27 | 中国科学院软件研究所 | Method and device for processing bilingual alignment corpora |
CN105005561A (en) * | 2015-07-07 | 2015-10-28 | 刘改琳 | Bilingual retrieval statistical translation system based on corpus |
CN105068997A (en) * | 2015-07-15 | 2015-11-18 | 清华大学 | Parallel corpus construction method and device |
CN106066870A (en) * | 2016-05-27 | 2016-11-02 | 南京信息工程大学 | A kind of bilingual teaching mode constructing system of linguistic context mark |
CN106919689B (en) * | 2017-03-03 | 2018-05-11 | 中国科学技术信息研究所 | Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5611076A (en) * | 1994-09-21 | 1997-03-11 | Micro Data Base Systems, Inc. | Multi-model database management system engine for databases having complex data models |
CN101101752B (en) * | 2007-07-19 | 2010-12-01 | 华中科技大学 | A lip-reading recognition system for monosyllabic languages based on visual features |
US8600730B2 (en) * | 2011-02-08 | 2013-12-03 | Microsoft Corporation | Language segmentation of multilingual texts |
US20130332450A1 (en) * | 2012-06-11 | 2013-12-12 | International Business Machines Corporation | System and Method for Automatically Detecting and Interactively Displaying Information About Entities, Activities, and Events from Multiple-Modality Natural Language Sources |
CN104408078B (en) * | 2014-11-07 | 2019-02-12 | 北京第二外国语学院 | A kind of bilingual Chinese-English parallel corpora base construction method based on keyword |
CN105843802A (en) * | 2016-03-31 | 2016-08-10 | 长安大学 | Corpus intervention module and method in translation |
-
2019
- 2019-04-22 CN CN201910323653.6A patent/CN110046261B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8825466B1 (en) * | 2007-06-08 | 2014-09-02 | Language Weaver, Inc. | Modification of annotated bilingual segment pairs in syntax-based machine translation |
CN104657351A (en) * | 2015-02-12 | 2015-05-27 | 中国科学院软件研究所 | Method and device for processing bilingual alignment corpora |
CN105005561A (en) * | 2015-07-07 | 2015-10-28 | 刘改琳 | Bilingual retrieval statistical translation system based on corpus |
CN105068997A (en) * | 2015-07-15 | 2015-11-18 | 清华大学 | Parallel corpus construction method and device |
CN106066870A (en) * | 2016-05-27 | 2016-11-02 | 南京信息工程大学 | A kind of bilingual teaching mode constructing system of linguistic context mark |
CN106919689B (en) * | 2017-03-03 | 2018-05-11 | 中国科学技术信息研究所 | Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge |
Non-Patent Citations (2)
Title |
---|
"互联网+背景下多模态、多语种外交话语平行语料库设计与创建探析";杨明星等;《外语教学》;20181110;第39卷(第6期);第16页第3.4,第17页3.4-3.6 * |
"建筑双语平行语料库构建及其对MTI学生思辨能力的开发";李家坤等;《沈阳建筑大学学报》;20181015;第20卷(第5期);第531页右栏第3-4段,第532页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110046261A (en) | 2019-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110046261B (en) | Construction method of multi-modal bilingual parallel corpus of construction engineering | |
WO2022179149A1 (en) | Machine translation method and apparatus based on translation memory | |
Vel | Pre-processing techniques of text mining using computational linguistics and python libraries | |
CN110609983B (en) | Structured decomposition method for policy file | |
CN118170933B (en) | A method and device for constructing multimodal corpus data in scientific fields | |
CN104991890A (en) | Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora | |
CN106055623A (en) | Cross-language recommendation method and system | |
CN113159969A (en) | Financial long text rechecking system | |
CN111563372B (en) | Typesetting document content self-duplication checking method based on teaching book publishing | |
CN101464856A (en) | Alignment method and apparatus for parallel spoken language materials | |
CN112257442A (en) | Policy document information extraction method based on corpus expansion neural network | |
CN111897917A (en) | Rail transit industry term extraction method based on multi-modal natural language features | |
CN111353077A (en) | Intelligent creation algorithm-based converged media collecting, editing and distributing system | |
CN112488593B (en) | Auxiliary bid evaluation system and method for bidding | |
CN117194614A (en) | Text difference recognition method, device and computer readable medium | |
CN114064878A (en) | Natural language data marking method and system based on reinforcement learning | |
CN112836067A (en) | Intelligent searching method based on knowledge graph | |
CN108268669A (en) | A kind of crucial new word discovery method based on multidimensional words and phrases feature and sentiment analysis | |
CN103164398A (en) | Chinese-Uygur language electronic dictionary and automatic translating Chinese-Uygur language method thereof | |
Jindal et al. | Building english-punjabi parallel corpus for machine translation | |
CN114239579A (en) | Electric power feasibility study document extraction method and device based on regular expression and CRF model | |
Li | Key Technologies for Constructing Bilingual Corpus for English-Chinese Translation | |
CN111046663B (en) | An Intelligent Correction Method for Chinese Forms | |
CN106776590A (en) | A kind of method and system for obtaining entry translation | |
Gamal et al. | Survey of arabic machine translation, methodologies, progress, and challenges |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Gao Jinling Inventor after: Zhang Congying Inventor after: Wang Haifeng Inventor after: Ding Mei Inventor after: Bao Yuping Inventor after: Gao Jiyun Inventor after: Zhang Xiaohong Inventor after: Wang Wei Inventor before: Zhang Xiaohong Inventor before: Wang Wei Inventor before: Zhang Congying Inventor before: Ding Mei Inventor before: Gao Jinling Inventor before: Bao Yuping |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |