CN110046261A - A kind of construction method of the multi-modal bilingual teaching mode of architectural engineering - Google Patents

A kind of construction method of the multi-modal bilingual teaching mode of architectural engineering Download PDF

Info

Publication number
CN110046261A
CN110046261A CN201910323653.6A CN201910323653A CN110046261A CN 110046261 A CN110046261 A CN 110046261A CN 201910323653 A CN201910323653 A CN 201910323653A CN 110046261 A CN110046261 A CN 110046261A
Authority
CN
China
Prior art keywords
corpus
text
bilingual
modal
building
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910323653.6A
Other languages
Chinese (zh)
Other versions
CN110046261B (en
Inventor
张晓红
王薇
张聪颖
丁玫
高金岭
鲍玉平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jianzhu University
Original Assignee
Shandong Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jianzhu University filed Critical Shandong Jianzhu University
Priority to CN201910323653.6A priority Critical patent/CN110046261B/en
Publication of CN110046261A publication Critical patent/CN110046261A/en
Application granted granted Critical
Publication of CN110046261B publication Critical patent/CN110046261B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to technical field of data processing, and in particular to a kind of construction method of the multi-modal bilingual teaching mode of architectural engineering;Corpus screening corpus extraction, check and correction, material segmentation, alignment, denoises, obtains Parallel Corpus, corpus update and six steps of dilatation;For building vocabulary provide contrast sample abundant, the meaning of the vocabulary or syntax that retrieve all be it is relevant to building, eliminate some useless meanings;The bilingual paginal translation sample of enormous amount is provided for user, cutting is fine, and precision is high, and the vocabulary or syntax meaning retrieved is all that building is relevant, eliminates some useless meanings, the bilingual paginal translation sample of Building class for providing enormous amount for user.

Description

A kind of construction method of the multi-modal bilingual teaching mode of architectural engineering
Technical field
The invention belongs to technical field of data processing, and in particular to a kind of multi-modal bilingual teaching mode of architectural engineering Construction method.
Background technique
Building English is the combination of building trade and English, is related to the links of construction industry, and such as prequalification is recruited and thrown Mark, construction, quality evaluation etc..It builds and belongs to scientific text in the Stylistic Feature of English, there is oneself specialized vocabulary and table Up to habit, mode of discourse is written body, and language keynote is formal body.As China is in the continuous expansion of the Foreign Architecture market share And domestic construction market and Foreign Architecture market integrate with, build English using more and more common, build the translation of English Also a large amount of to occur, and the viewpoint of corpus linguistics and method provide a kind of instrumental for the research of building English Translation and teaching Method, building constructions English corpus, the application study of service building specialized English teaching and building field is a Xiang Shifen Urgent and significant task, in consideration of it, it is necessary to design a kind of building side of multi-modal bilingual teaching mode of architectural engineering Method.
When computer-aided translation (CAT) refers to that interpreter carries out translation, the translation of backstage constantly automatic storage interpreter's typing, To establishing database, in this way in later translation process, when occurring same or similar phrase or syntagma again, system is just Stored same or similar content in database can be searched for automatically, provided reference translation for interpreter, avoided it duplicate Translation labour, therefore, using mode of the multi-modal corpus of architectural engineering in conjunction with CAT, can greatly improve translation efficiency. But still have some problems at present: domestic External building technology material library itself is extremely rare, and the multi-modal corpus of Building class Library is even more unprecedented;Existing Building class corpus corpus compile check and correction it is less, even without check and correction, cause corpus lattice Formula and content are lack of standardization;Corpus source is not authoritative enough, and some corpus collect the various texts on network without distinction, causes language Expect that noise is big, purity is low, it can not be really with into CAT software;Parallel corpora is mostly paragraph alignment at present, but when translation, Most have reference value is sentence, followed by language fragments, phrase and term, and the precision translated to whole section is lower.
Summary of the invention
The invention of this corpus aims to overcome that disadvantage of the existing technology, and it is more that proposition designs a kind of architectural engineering The construction method of mode bilingual teaching mode, i.e. cutting are fine, and precision is high, and the vocabulary or syntax meaning retrieved is all built Related fields is built, thus be excluded that some useless meanings, the bilingual paginal translation sample of Building class that provides enormous amount for user This.
The construction method of the multi-modal bilingual teaching mode of architectural engineering of the present invention, specifically includes the following steps:
(1) corpus screens: original language material is obtained by network downloading, scanning recognition, manual typing and web crawlers mode, it is original The main source of corpus is that Building class english-chinese bilingual works, the government document that national publishing house formally publishes are reported, official is recognized Demonstrate,prove material, the audio of building trade official meeting, video, drawing, picture etc.;
(2) corpus extract, check and correction: using Modern Imaging Technique acquire multi-modal architectural engineering category information (picture, chart, drawing, Video, audio and text etc.), and it is excavated, is constructed;Then proofreaded, to the original language material on server into Row increases, deletes, changing, looking into operation, and the cleaning and removal to original language material progress data save after collated correct, and by bilingual corpora The sentence alignment thereof based on paragraph is made into Tmxmall software;
(3) material segmentation, alignment: will carry out cutting to syntagma after alignment in step (2), so that every bilingual parallel sentence pairs of a pair, It is no more than four rows in the visible document of Word;
(4) denoise: by the way of artificial noise reduction, to translation result, accurately sentence or paragraph are not modified, and are manually entered And save to corpus, it is ensured that accurate matching of corpus during computer-aided translation;
(5) mark and transcription: needing to design reasonable, sufficient data mining scheme according to research object and research, soft marking Different mark layers is established on part, from different visual angle and in terms of corpus is labeled, such as agreement for construction corpus is carried out Mark;And with multi-modal corpus annotation and software is retrieved, presentation is synchronized to transcription content, audio and video, is propped up simultaneously Hold the output of the diversified forms result such as text, audio and video;
(6) it obtains Parallel Corpus: machine translation is successively carried out to identified text, and after being corrected using human translation Obtain Parallel Corpus;
(7) corpus update and dilatation: corpus update is controlled by updating unit, and word is recommended in updating unit not timing pop-up Item and its recommendation weight, according to recommending terms and recommend weight that corpus is written in recommending terms, recommending weight is according to recommendation Entry pops up the number of the word or sentence to determine, such as the same word pops up 5 times, i.e., weight is denoted as 5, when the weight is more than When 10, i.e., corpus is written into the word, realizes the update and dilatation of corpus.
In the corpus screening process of step (1) of the present invention, the method for the web crawlers is using under python Basis of the selenium network test packet as crawler library crawls related fields text by third party websites such as Baidu's science first Part downloads the external linkage of page, then unifies to enter these external linkages again, in such a way that page elements are clicked in simulation, downloading Related fields file converts the format of associated documents, cleans redundancy and error message, and extract corresponding structure Then information segments the text after conversion, removes stop words and filtering without semantic paragraph, constructs the basis for analysis Text.
National level publishing house described in step (1) of the present invention includes China Construction Industry Press etc., the government document Report includes meeting bulletin etc., and official's authentication material includes contract text etc., and the material of other forms includes that building trade is formal The audio of meeting, video, drawing, picture;The field that corpus relates generally to include: green building, Construction Theory, building bidding documents, Agreement for construction, construction material and urban planning etc..
In step (2) of the present invention, English text rule will not met by " text reorganizer " software in computer The extra enter key in the double byte character and numerical value of model, full-shape space is arranged and is replaced, to irregularity present in text Symbol, the format of model are arranged, and are realized that original language material carries out the cleaning and removal of data, are kept the spatter property of text.
In step (3) of the present invention, every bilingual parallel sentence pairs of a pair, no more than four rows in the visible document of Word Specific step is as follows:
S31: counting the character of the first to three row of text to be translated first, and detect the third line a fullstop last or comma or Branch enters a new line at fullstop or comma or branch when detecting the third line a fullstop last or comma or branch Then operation executes step S34;If being not detected, step S32 is executed;
S32: the second row of detection a fullstop last or comma or branch in the second row a fullstop last or comma or divide Line feed operation is carried out at number, then executes step S34;If the second row is not detected, S33 is thened follow the steps;
S33: same step detects the first row, carries out line feed operation at the first row a fullstop last or comma or branch, Then step S34 is executed;
S34: the character of the first to three row after continuing to test wrapping text to be translated, and the step of repeating S31-S33, effectively The cutting for realizing corpus syntagma ensure that every bilingual parallel sentence pairs of a pair, and four rows are no more than in the visible document of Word.
The beneficial effects of the present invention are: compared with prior art, architectural engineering of the present invention is multi-modal bilingual parallel The construction method of corpus provides contrast sample abundant for building vocabulary, and the meaning of the vocabulary or syntax that retrieve all is It is relevant to building, eliminate some useless meanings;The bilingual paginal translation sample of enormous amount is provided for user, not only makes to teach Teacher can have a large amount of illustration to impart knowledge to students in teaching, promote quality of instruction, and student can also be made outside class to learn and research Middle income is much;Can be provided for user it is abundant professional text information can be used, in this, as further imparting knowledge to students, learn, grind The reference studied carefully and practiced;More professional transcription platform, and corpus source authority are provided for building field, cutting is fine, It is strongly professional, it can accomplish industry Dock With Precision Position, improve the matching degree of corpus, to meet the requirement of architectural engineering.
Detailed description of the invention
Fig. 1 is translation result exemplary diagram of the invention;
Fig. 2, Fig. 3 are multi-modal material translating result exemplary diagram of the invention;
Fig. 4 is the translation result exemplary diagram of certain higher translation on line platform of occupation rate of market;
Fig. 5 is the result exemplary diagram that corpus+CAT of the invention is translated.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.Therefore, below to the reality of the invention provided in the accompanying drawings The detailed description for applying example is not intended to limit the range of claimed invention, but is merely representative of selected implementation of the invention Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts Every other embodiment, shall fall within the protection scope of the present invention.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
Below by specific embodiment, the invention will be further described.
Embodiment:
The construction method of the multi-modal bilingual teaching mode of the architectural engineering that the present embodiment is related to, specifically includes the following steps:
(1) corpus screens: original language material is obtained by network downloading, scanning recognition, manual typing and web crawlers mode, it is original The main source of corpus is that Building class english-chinese bilingual works, the government document that national publishing house formally publishes are reported, official is recognized Demonstrate,prove material, the audio of building trade official meeting, video, drawing, picture etc.;
(2) corpus extract, check and correction: using Modern Imaging Technique acquire multi-modal architectural engineering category information (picture, chart, drawing, Video, audio and text etc.), and it is excavated, is constructed;Then proofreaded, to the original language material on server into Row increases, deletes, changing, looking into operation, and the cleaning and removal to original language material progress data save after collated correct, and by bilingual corpora The sentence alignment thereof based on paragraph is made into Tmxmall software;
(3) material segmentation, alignment: will carry out cutting to syntagma after alignment in step (2), so that every bilingual parallel sentence pairs of a pair, It is no more than four rows in the visible document of Word;
(4) denoise: by the way of artificial noise reduction, to translation result, accurately sentence or paragraph are not modified, and are manually entered And save to corpus, it is ensured that accurate matching of corpus during computer-aided translation;
(5) mark and transcription: needing to design reasonable, sufficient data mining scheme according to research object and research, soft marking Different mark layers is established on part, from different visual angle and in terms of corpus is labeled, such as agreement for construction corpus is carried out Mark;And with multi-modal corpus annotation and software is retrieved, presentation is synchronized to transcription content, audio and video, is propped up simultaneously Hold the output of the diversified forms result such as text, audio and video;
(6) it obtains Parallel Corpus: machine translation is successively carried out to identified text, and after being corrected using human translation Obtain Parallel Corpus;
(7) corpus update and dilatation: corpus update is controlled by updating unit, and word is recommended in updating unit not timing pop-up Item and its recommendation weight, according to recommending terms and recommend weight that corpus is written in recommending terms, recommending weight is according to recommendation Entry pops up the number of the word or sentence to determine, such as the same word pops up 5 times, i.e., weight is denoted as 5, when the weight is more than When 10, i.e., corpus is written into the word, realizes the update and dilatation of corpus.
In the corpus screening process of step described in the present embodiment (1), the method for the web crawlers is using under python Basis of the selenium network test packet as crawler library crawls related fields text by third party websites such as Baidu's science first Part downloads the external linkage of page, then unifies to enter these external linkages again, in such a way that page elements are clicked in simulation, downloading Related fields file converts the format of associated documents, cleans redundancy and error message, and extract corresponding structure Then information segments the text after conversion, removes stop words and filtering without semantic paragraph, constructs the basis for analysis Text.
The national publishing house includes China Construction Industry Press etc., the government in the step (1) of the present invention Official document report includes meeting bulletin etc., and official's authentication material includes contract text etc., and the material of other forms includes building trade Audio, video, drawing, picture of official meeting etc.;The field that corpus relates generally to includes: green building, Construction Theory, building Bidding documents, agreement for construction, construction material and urban planning etc..
In step (2) of the present invention, English text rule will not met by " text reorganizer " software in computer The extra enter key in the double byte character and numerical value of model, full-shape space is arranged and is replaced, to irregularity present in text Symbol, the format of model are arranged, and are realized that original language material carries out the cleaning and removal of data, are kept the spatter property of text.
In step described in the present embodiment (3), every bilingual parallel sentence pairs of a pair are no more than four rows in the visible document of Word Specific step is as follows:
S31: counting the character of the first to three row of text to be translated first, and detect the third line a fullstop last or comma or Branch enters a new line at fullstop or comma or branch when detecting the third line a fullstop last or comma or branch Then operation executes step S34;If being not detected, step S32 is executed;
S32: the second row of detection a fullstop last or comma or branch in the second row a fullstop last or comma or divide Line feed operation is carried out at number, then executes step S34;If the second row is not detected, S33 is thened follow the steps;
S33: same step detects the first row, carries out line feed operation at the first row a fullstop last or comma or branch, Then step S34 is executed;
S34: the character of the first to three row after continuing to test wrapping text to be translated, and the step of repeating S31-S33, effectively The cutting for realizing corpus syntagma ensure that every bilingual parallel sentence pairs of a pair, and four rows are no more than in the visible document of Word.
Corpus library format described in the present embodiment includes the TMX format and TXT lattice that can be introduced directly into CAT software Formula, while visualization EXCEL format can also be provided.
The multi-modal bilingual parallel expectation library of architectural engineering described in the present embodiment when in use, using the corpus as turning over It translates data base (TM) to import in computer-aided translation software, text of serving as interpreter encounters sentence pair corresponding with corpus or vocabulary When table (Glossary), computer-aided translation software is by Auto-matching, therefore, is guaranteeing the same of building trade translation quality When, the time is greatly saved;The translation process uses the translation of " corpus+CAT " mode, and integrates CAT and MT, establishes The mode of " CAT+ Machine Translation (MT)+post-editing (PE) ", promotes the efficiency of translation, it is ensured that translation quality;To corpus and CAT setting translation accuracy rate confidence level, since the core of CAT technology is translation memory technology, whenever same or similar phrase When appearance, using immediate translation in data base, user needs to use according to actual needs, give up system meeting automatically prompting user It abandons or edits the text repeated, therefore, setting translation accuracy rate confidence level, if translation accuracy rate confidence level is not when translation When lower than 90, then need to give up the knot translated at this time as a result, if when translation accuracy rate confidence level is lower than 90 using translation Fruit, and being corrected using human translation, using the result of human translation replace that corpus and CAT translate as a result, artificial turn over It translates and carries out editor's input in editable region and save to corpus, realize " CAT+ Machine Translation (MT)+post-editing (PE) " Interpretive scheme, translation accuracy rate confidence level be the setting program for being preset in internal system, for judge translate accuracy;Language The continuous renewal dilatation for expecting library, ensure that the renewal speed of corpus, it is ensured that the steady continuity of corpus, it is ensured that be term and translator's wind The unified of lattice provides more corpus references, for language described in the reliable resource the present embodiment of building term extraction Job readiness Expect library;It is imported the corpus as translation memory library (TM) in computer-aided translation software, realizes multi-modal corpus Directly be put in storage, after multi-modal material input computer, computer-aided translation software can directly transfer picture, chart, The corpus of the formats such as drawing, video, audio carries out translation and teaching research etc..
Corpus described in the present embodiment has been put in storage the following contents: " architectural environment and energy source use Introduction to Engineering " (Chinese is translated English), " green Northern Europe: the city of sustainable development and building " (Chinese to English), FIDIC contract translate (Chinese to English), " ecocity And green building " (Chinese to English), " Fletcher architectural history " (English to Chinese), " construction material " (English to Chinese), " bar make: a principle, Diversified forms " (Chinese to English), " study of poetry slope sheet one of anticlimax at building " (Chinese to English), " design concept " (English to Chinese), " city City's sustainable development principles " (Chinese to English), " from concept to building 2 " (Chinese to English), " urban sustainable development and architectural design " (Chinese to English), " international civil engineering work contract " (Chinese to English);In addition, corpus described in the present embodiment take part in it is more The test of a translation project, industries professional knowledge, the processing that can be more perfect such as these Project designs largely builds, electric power are translated Professional industry language accuracy and scientific text language logicality problem, it is ensured that the efficiency and quality of translation service.
The translation result imported in CAT translation software using this corpus is present embodiments provided, as shown in Figure 1, passing through Fig. 1 can be seen that corpus described in the present embodiment ensure that accurate matching of the corpus in CAT translation software;Fig. 2 and Fig. 3 exhibition What is shown is translation result of this corpus using multi-modal technology to roof construction figure;Fig. 4 and Fig. 5 is comparative diagram, by Fig. 4 and The comparison of Fig. 5, it can be seen that corpus described in the present embodiment is in building class text, matching precision height and translation result It is more accurate.
Above-mentioned specific embodiment is only specific case of the invention, and scope of patent protection of the invention includes but is not limited to The product form and style of above-mentioned specific embodiment, any claims of the present invention and any technical field of meeting The appropriate changes or modifications that those of ordinary skill does it, all shall fall within the protection scope of the present invention.

Claims (6)

1. a kind of construction method of the multi-modal bilingual teaching mode of architectural engineering, it is characterised in that: specifically includes the following steps:
(1) corpus screens: original language material is obtained by network downloading, scanning recognition, manual typing and web crawlers mode, it is original The main source of corpus is that Building class english-chinese bilingual works, the government document that national publishing house formally publishes are reported, official is recognized Demonstrate,prove material, the audio of building trade official meeting, video, drawing, picture;
(2) corpus extract, check and correction: using Modern Imaging Technique acquire multi-modal architectural engineering category information (picture, chart, drawing, Video, audio and text), and it is excavated, is constructed;Then it is proofreaded, the original language material on server is carried out Increase, delete, changing, looking into operation, the cleaning and removal of data are carried out to original language material, is saved after collated correct, and bilingual corpora is existed The sentence alignment thereof based on paragraph is made into Tmxmall software;
(3) material segmentation, alignment: will carry out cutting to syntagma after alignment in step (2), so that every bilingual parallel sentence pairs of a pair, It is no more than four rows in the visible document of Word;
(4) denoise: by the way of artificial noise reduction, to translation result, accurately sentence or paragraph are not modified, and are manually entered And save to corpus, it is ensured that accurate matching of corpus during computer-aided translation;
(5) mark is from transcription: different mark layers are established in marking software, from different visual angle and in terms of corpus is marked Note;And with multi-modal corpus annotation and software is retrieved, presentation is synchronized to transcription content, audio and video, is supported simultaneously Text, the output of audio and video diversified forms result;
(6) it obtains Parallel Corpus: machine translation is successively carried out to identified text, and after being corrected using human translation Obtain Parallel Corpus;
(7) corpus update and dilatation: corpus update is controlled by updating unit, and word is recommended in updating unit not timing pop-up Item and its recommendation weight, according to recommending terms and recommend weight that corpus is written in recommending terms, recommending weight is according to recommendation Entry pops up the number of the word or sentence to determine.
2. the construction method of the multi-modal bilingual teaching mode of architectural engineering according to claim 1, it is characterised in that: institute In the corpus screening process for stating step (1), the method for the web crawlers is using the selenium network test packet under python As the basis in crawler library, the external linkage of related fields file download page is crawled by Baidu's science third party website first, Then unify to enter these external linkages again, in such a way that page elements are clicked in simulation, related fields file is downloaded, to correlation The format of file is converted, and cleans redundancy and error message, and extract corresponding structural information, then to conversion after Text is segmented, removes stop words and filtering without semantic paragraph, and the base text for analysis is constructed.
3. the construction method of the multi-modal bilingual teaching mode of architectural engineering according to claim 1, it is characterised in that: institute Stating national level publishing house described in step (1) includes China Construction Industry Press, and the government document report includes that meeting is public Report, official's authentication material includes contract text, and the material of other forms includes the audio of building trade official meeting, video, figure Paper, picture;The field that corpus relates generally to include: green building, Construction Theory, building bidding documents, agreement for construction, construction material and Urban planning.
4. the construction method of the multi-modal bilingual teaching mode of architectural engineering according to claim 1, it is characterised in that: institute It states in step (2), the double byte character sum number of English text specification will not be met by " text reorganizer " software in computer The extra enter key in value, full-shape space is arranged and is replaced, with symbol, the format progress to specification is not conformed to present in text It arranges, realizes that original language material carries out the cleaning and removal of data, keep the spatter property of text.
5. the construction method of the multi-modal bilingual teaching mode of architectural engineering according to claim 1, it is characterised in that: institute It states in step (3), every bilingual parallel sentence pairs of a pair, four rows are no more than in the visible document of Word, and specific step is as follows:
S31: counting the character of the first to three row of text to be translated first, and detect the third line a fullstop last or comma or Branch enters a new line at fullstop or comma or branch when detecting the third line a fullstop last or comma or branch Then operation executes step S34;If being not detected, step S32 is executed;
S32: the second row of detection a fullstop last or comma or branch in the second row a fullstop last or comma or divide Line feed operation is carried out at number, then executes step S34;If the second row is not detected, S33 is thened follow the steps;
S33: same step detects the first row, carries out line feed operation at the first row a fullstop last or comma or branch, Then step S34 is executed;
S34: the character of the first to three row after continuing to test wrapping text to be translated, and the step of repeating S31-S33, effectively The cutting for realizing corpus syntagma ensure that every bilingual parallel sentence pairs of a pair, and four rows are no more than in the visible document of Word.
6. the construction method of the multi-modal bilingual teaching mode of architectural engineering according to claim 1, it is characterised in that: institute The corpus library format stated includes the TMX format and TXT format that can be introduced directly into CAT software, while can also provide can Depending on changing EXCEL format.
CN201910323653.6A 2019-04-22 2019-04-22 Construction method of multi-modal bilingual parallel corpus of construction engineering Active CN110046261B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910323653.6A CN110046261B (en) 2019-04-22 2019-04-22 Construction method of multi-modal bilingual parallel corpus of construction engineering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910323653.6A CN110046261B (en) 2019-04-22 2019-04-22 Construction method of multi-modal bilingual parallel corpus of construction engineering

Publications (2)

Publication Number Publication Date
CN110046261A true CN110046261A (en) 2019-07-23
CN110046261B CN110046261B (en) 2022-01-21

Family

ID=67278357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910323653.6A Active CN110046261B (en) 2019-04-22 2019-04-22 Construction method of multi-modal bilingual parallel corpus of construction engineering

Country Status (1)

Country Link
CN (1) CN110046261B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file
CN110889295A (en) * 2019-09-12 2020-03-17 华为技术有限公司 Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora
CN110942765A (en) * 2019-11-11 2020-03-31 珠海格力电器股份有限公司 Method, device, server and storage medium for constructing corpus
CN111209461A (en) * 2019-12-30 2020-05-29 成都理工大学 Bilingual corpus collection system based on public identification words
CN111221965A (en) * 2019-12-30 2020-06-02 成都信息工程大学 Classification sampling detection method based on bilingual corpus of public identification words
CN111241784A (en) * 2019-12-30 2020-06-05 成都理工大学 Processing and sorting method for language material resources of public identification languages
CN112016604A (en) * 2020-08-19 2020-12-01 华东师范大学 Zero-resource machine translation method applying visual information
CN112085985A (en) * 2020-08-20 2020-12-15 安徽七天教育科技有限公司 Automatic student answer scoring method for English examination translation questions
CN113268980A (en) * 2021-04-29 2021-08-17 赵天诚 Text recognition method and device, terminal equipment and storage medium
CN114626390A (en) * 2020-12-12 2022-06-14 郑州宝冶钢结构有限公司 Method for improving translation efficiency based on steel structure engineering parallel corpus
CN115423578A (en) * 2022-09-01 2022-12-02 广东博成网络科技有限公司 Bidding method and system based on micro-service containerization cloud platform
CN115688811A (en) * 2022-09-20 2023-02-03 甲骨易(北京)语言科技股份有限公司 Corpus alignment method combining rules and semantics
CN118170933A (en) * 2024-05-13 2024-06-11 之江实验室 Construction method and device of multi-mode corpus data oriented to scientific field

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5611076A (en) * 1994-09-21 1997-03-11 Micro Data Base Systems, Inc. Multi-model database management system engine for databases having complex data models
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
US20120203540A1 (en) * 2011-02-08 2012-08-09 Microsoft Corporation Language segmentation of multilingual texts
CN103488663A (en) * 2012-06-11 2014-01-01 国际商业机器公司 System and method for automatically detecting and interactively displaying information about entities, activities, and events from multiple-modality natural language sources
US8825466B1 (en) * 2007-06-08 2014-09-02 Language Weaver, Inc. Modification of annotated bilingual segment pairs in syntax-based machine translation
CN104408078A (en) * 2014-11-07 2015-03-11 北京第二外国语学院 Construction method for key word-based Chinese-English bilingual parallel corpora
CN104657351A (en) * 2015-02-12 2015-05-27 中国科学院软件研究所 Method and device for processing bilingual alignment corpora
CN105005561A (en) * 2015-07-07 2015-10-28 刘改琳 Bilingual retrieval statistical translation system based on corpus
CN105068997A (en) * 2015-07-15 2015-11-18 清华大学 Parallel corpus construction method and device
CN105843802A (en) * 2016-03-31 2016-08-10 长安大学 Corpus intervention module and method in translation
CN106066870A (en) * 2016-05-27 2016-11-02 南京信息工程大学 A kind of bilingual teaching mode constructing system of linguistic context mark
CN106919689B (en) * 2017-03-03 2018-05-11 中国科学技术信息研究所 Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5611076A (en) * 1994-09-21 1997-03-11 Micro Data Base Systems, Inc. Multi-model database management system engine for databases having complex data models
US8825466B1 (en) * 2007-06-08 2014-09-02 Language Weaver, Inc. Modification of annotated bilingual segment pairs in syntax-based machine translation
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
US20120203540A1 (en) * 2011-02-08 2012-08-09 Microsoft Corporation Language segmentation of multilingual texts
CN103488663A (en) * 2012-06-11 2014-01-01 国际商业机器公司 System and method for automatically detecting and interactively displaying information about entities, activities, and events from multiple-modality natural language sources
CN104408078A (en) * 2014-11-07 2015-03-11 北京第二外国语学院 Construction method for key word-based Chinese-English bilingual parallel corpora
CN104657351A (en) * 2015-02-12 2015-05-27 中国科学院软件研究所 Method and device for processing bilingual alignment corpora
CN105005561A (en) * 2015-07-07 2015-10-28 刘改琳 Bilingual retrieval statistical translation system based on corpus
CN105068997A (en) * 2015-07-15 2015-11-18 清华大学 Parallel corpus construction method and device
CN105843802A (en) * 2016-03-31 2016-08-10 长安大学 Corpus intervention module and method in translation
CN106066870A (en) * 2016-05-27 2016-11-02 南京信息工程大学 A kind of bilingual teaching mode constructing system of linguistic context mark
CN106919689B (en) * 2017-03-03 2018-05-11 中国科学技术信息研究所 Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
WOJCIECH SKUT等: "a linguistically interpreted corpus of german newspaper text", 《COMPUTATION AND LANGUAGE》 *
李家坤等: ""建筑双语平行语料库构建及其对MTI学生思辨能力的开发"", 《沈阳建筑大学学报》 *
杨明星等: ""互联网+背景下多模态、多语种外交话语平行语料库设计与创建探析"", 《外语教学》 *
王俊超: "构建中国企业"走出去"外宣翻译的研究框架——基于500强企业网页外宣语料库", 《上海翻译》 *
路邈: "汉日口译语料库的构建及其在翻译教学研究中的应用", 《日语学习与研究》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file
CN110889295A (en) * 2019-09-12 2020-03-17 华为技术有限公司 Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora
CN110889295B (en) * 2019-09-12 2021-10-01 华为技术有限公司 Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora
CN110942765A (en) * 2019-11-11 2020-03-31 珠海格力电器股份有限公司 Method, device, server and storage medium for constructing corpus
CN111209461A (en) * 2019-12-30 2020-05-29 成都理工大学 Bilingual corpus collection system based on public identification words
CN111221965A (en) * 2019-12-30 2020-06-02 成都信息工程大学 Classification sampling detection method based on bilingual corpus of public identification words
CN111241784A (en) * 2019-12-30 2020-06-05 成都理工大学 Processing and sorting method for language material resources of public identification languages
CN112016604B (en) * 2020-08-19 2021-03-26 华东师范大学 Zero-resource machine translation method applying visual information
CN112016604A (en) * 2020-08-19 2020-12-01 华东师范大学 Zero-resource machine translation method applying visual information
CN112085985A (en) * 2020-08-20 2020-12-15 安徽七天教育科技有限公司 Automatic student answer scoring method for English examination translation questions
CN112085985B (en) * 2020-08-20 2022-05-10 安徽七天网络科技有限公司 Student answer automatic scoring method for English examination translation questions
CN114626390A (en) * 2020-12-12 2022-06-14 郑州宝冶钢结构有限公司 Method for improving translation efficiency based on steel structure engineering parallel corpus
CN113268980A (en) * 2021-04-29 2021-08-17 赵天诚 Text recognition method and device, terminal equipment and storage medium
CN115423578A (en) * 2022-09-01 2022-12-02 广东博成网络科技有限公司 Bidding method and system based on micro-service containerization cloud platform
CN115423578B (en) * 2022-09-01 2023-12-05 广东博成网络科技有限公司 Bid bidding method and system based on micro-service containerized cloud platform
CN115688811A (en) * 2022-09-20 2023-02-03 甲骨易(北京)语言科技股份有限公司 Corpus alignment method combining rules and semantics
CN118170933A (en) * 2024-05-13 2024-06-11 之江实验室 Construction method and device of multi-mode corpus data oriented to scientific field

Also Published As

Publication number Publication date
CN110046261B (en) 2022-01-21

Similar Documents

Publication Publication Date Title
CN110046261A (en) A kind of construction method of the multi-modal bilingual teaching mode of architectural engineering
Brockett et al. Correcting ESL errors using phrasal SMT techniques
Hutchins Current commercial machine translation systems and computer-based translation tools: system types and their uses
US7383542B2 (en) Adaptive machine translation service
US8078451B2 (en) Interface and methods for collecting aligned editorial corrections into a database
CN101464856A (en) Alignment method and apparatus for parallel spoken language materials
CN113343717A (en) Neural machine translation method based on translation memory library
CN106156013A (en) The two-part machine translation method that a kind of regular collocation type phrase is preferential
Ortego Antón The design of TorreznoTRAD: The semiautomatic Spanish-English writing and translation aid tool
Gamal et al. Survey of arabic machine translation, methodologies, progress, and challenges
CN117473971A (en) Automatic generation method and system for bidding documents based on purchasing text library
CN106776590A (en) A kind of method and system for obtaining entry translation
Rosen Introducing a corpus of non-native Czech with automatic annotation
Ma et al. Corpus Support for Machine Translation at LDC.
AbuSa’aleek The adequacy and acceptability of machine translation in translating the Islamic texts
Garside The large-scale production of syntactically analysed corpora
NZUANKE et al. Technology and translation: Areas of convergence and divergence between machine translation and computer-assisted translation
Al-Obaidli et al. Bi-text alignment of movie subtitles for spoken english-arabic statistical machine translation
Sani et al. A Survey on the Machine Translation Methods for Indian Languages: Challenges, Availability, and Production of Parallel Corpora, Government Policies and Research Directions
CN115965017B (en) Multi-language input and analysis system and method based on development platform
Fan Exploring English Translation Strategies Oriented by Big Data Technology
Kovačević Technical Translation and the Internet
Fu Construction on Parallel Corpus System for English Translation of Liaoning Dialect
Jia et al. Research on the Role of Artificial Intelligence in the Core of Intelligent Translation Systems
Qumar et al. Emerging resources, enduring challenges: a comprehensive study of Kashmiri parallel corpus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Gao Jinling

Inventor after: Zhang Congying

Inventor after: Wang Haifeng

Inventor after: Ding Mei

Inventor after: Bao Yuping

Inventor after: Gao Jiyun

Inventor after: Zhang Xiaohong

Inventor after: Wang Wei

Inventor before: Zhang Xiaohong

Inventor before: Wang Wei

Inventor before: Zhang Congying

Inventor before: Ding Mei

Inventor before: Gao Jinling

Inventor before: Bao Yuping

GR01 Patent grant
GR01 Patent grant