CN108009160A - Corpus translation method and device containing named entity, electronic equipment and storage medium - Google Patents

Corpus translation method and device containing named entity, electronic equipment and storage medium Download PDF

Info

Publication number
CN108009160A
CN108009160A CN201711245629.2A CN201711245629A CN108009160A CN 108009160 A CN108009160 A CN 108009160A CN 201711245629 A CN201711245629 A CN 201711245629A CN 108009160 A CN108009160 A CN 108009160A
Authority
CN
China
Prior art keywords
language material
translation
name entity
translated
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711245629.2A
Other languages
Chinese (zh)
Inventor
李晓普
宋洪伟
程碧霄
闵可锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201711245629.2A priority Critical patent/CN108009160A/en
Publication of CN108009160A publication Critical patent/CN108009160A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a corpus translation method and device containing a named entity, electronic equipment and a storage medium, relates to the field of machine translation, and aims to solve the problem that the existing machine translation method is low in corpus translation accuracy of the named entity. The corpus translation method containing the named entity comprises the following steps: receiving a corpus to be translated containing a named entity; translating the linguistic data to be translated through a machine learning model to obtain a first translation result; when the machine learning model translates the corpus to be translated, translating a named entity in the corpus to be translated into a first character string; acquiring a named entity corresponding to the first character string from the corpus to be translated; translating the named entity into a target language literal string according to a preset translation rule; and replacing the first character string in the first translation result with the target language character string to obtain a second translation result. The invention is applicable to various machine translation models.

Description

Language material interpretation method, device, electronic equipment and storage medium containing name entity
Technical field
The present invention relates to machine translation field, more particularly to a kind of language material interpretation method, device, electricity containing name entity Sub- equipment and storage medium.
Background technology
Machine translation is that a kind of natural language (original language) is converted to another natural language (target language using computer Speech) process.It is a branch of computational linguistics, is one of ultimate aim of artificial intelligence, and there is important science to grind Study carefully value.And translation is in itself the business memory block with potential quality, the prosperity of international exchange, more expands the need to translation Ask.
Machine translation method based on deep learning is developed rapidly from after proposing, becomes current machine translation field Research hotspot.At present due to language material size limit, naming the translation effect of entity cannot reach qualified horizontal.Name entity Identification and the important step that translation is language material preprocessing tasks in statistical machine translation, to follow-up model training and system Performance has important influence.
At present, the identification of entity is named and method that interpretation method is based primarily upon statistics, it utilizes the language material manually marked Translation model is trained, translation model is identified from language phenomenon learning and translation knowledge, automatic discrimination simultaneously translate name Entity.But the machine learning method based on statistics needs the support of large-scale corpus, when language material scale is smaller, can reduce Identification and the translation accuracy rate of entity are named, finally influences follow-up natural language processing task.And in daily language material on The language material scale is smaller of entity is named, it is not high using current translation model translation accuracy rate.
The content of the invention
In view of this, the embodiment of the present invention provides a kind of language material interpretation method, device, electronic equipment containing name entity And storage medium, it can solve the problems, such as that existing machine translation method is not high to the language material translation accuracy rate containing name entity.
In a first aspect, the embodiment of the present invention provides a kind of language material interpretation method containing name entity, including:
Receive the language material to be translated containing name entity;
The language material to be translated is translated by machine learning model, obtains the first translation result;Wherein, the machine It is the first word by the Named entity translation in the language material to be translated when device learning model translates the language material to be translated Symbol string;
Name entity corresponding with first character string is obtained from the language material to be translated;
By the Named entity translation it is object language text strings according to default translation rule;
First character string in first translation result is replaced with into the object language text strings, obtains second Translation result.
With reference to first aspect, in the first embodiment of first aspect, the machine learning model is according to such as lower section Method obtains:
Prepare some groups of bilingual training corpus;
Identify the name entity in the bilingual training corpus;
According to The Rules of Normalization set in advance, name entity in the bilingual training corpus that will identify that is into professional etiquette Generalized;
It is according to the recognition rule of name entity prepared in advance, the name after standardizing in the bilingual training corpus is real Body replaces with the first character string respectively;The spoken and written languages string and first of standardization is preserved in the recognition rule of the name entity Correspondence between character string;
Entity will be named to replace with the bilingual training corpus input translation model of the first character string and be trained acquisition institute State machine learning model.
With reference to first aspect, in second of embodiment of first aspect, described in first translation result One character string is with its positional information of the corresponding name entity in the language material to be translated;
The acquisition name entity corresponding with first character string from the language material to be translated, including:
The positional information carried according to first character string, obtains the positional information institute from the language material to be translated Refer to the source language text strings of position, as name entity corresponding with first character string.
The first embodiment with reference to first aspect, in the third embodiment of first aspect, if described prepare The dry bilingual training corpus of group, including:
Obtain some groups of bilingual corporas;
Data cleansing is carried out to the bilingual corpora;
Chinese language material after cleaning is segmented, the word in Latin class language material is filled into row label.
Any embodiment into the third embodiment of first aspect with reference to first aspect, the of first aspect In four kinds of embodiments, the name entity includes at least one kind of name, place name, currency, date or ordinary numbers.
Any embodiment into the third embodiment of first aspect with reference to first aspect, the of first aspect In five kinds of embodiments, first character string is the character string being made of spcial character.
Second aspect, the embodiment of the present invention provide a kind of language material translating equipment containing name entity, including:
Receiving module, for receiving the language material to be translated containing name entity;
First translation module, for being translated by machine learning model to the language material to be translated, obtains first and turns over Translate result;Wherein, when the machine learning model translates the language material to be translated, by the life in the language material to be translated Name entity is translated as the first character string;
Matching module, for obtaining name entity corresponding with first character string from the language material to be translated;
Second translation module, for by the Named entity translation being object language word according to default translation rule String;
Result-generation module, for first character string in first translation result to be replaced with the target language Say text strings, obtain the second translation result.
With reference to second aspect, in the first embodiment of second aspect, the language material translation containing name entity Device, further includes:
Language material preparation module, for preparing some groups of bilingual training corpus;
Identification module, for identifying the name entity in the bilingual training corpus;
Specification module, for according to The Rules of Normalization set in advance, in the bilingual training corpus that will identify that Name entity standardizes;
Replacement module, for the recognition rule according to name entity prepared in advance, will advise in the bilingual training corpus Name entity after generalized replaces with the first character string respectively;The language of standardization is preserved in the recognition rule of the name entity Say the correspondence between text strings and the first character string;
Training module, for by name entity replace with the first character string bilingual training corpus input translation model in into Row training obtains the machine learning model.
With reference to second aspect, in second of embodiment of second aspect, what first translation module was translated First character string in first translation result is with its position of the corresponding name entity in the language material to be translated Confidence ceases;
The matching module, specifically for the positional information carried according to first character string, from the language to be translated The source language text strings of the positional information pointed location are obtained in material, it is real as name corresponding with first character string Body.
With reference to the first embodiment of second aspect, in the third embodiment of second aspect, the language material is accurate Standby module includes:
Acquisition submodule, for obtaining some groups of bilingual corporas;
Submodule is cleaned, for carrying out data cleansing to the bilingual corpora;
Submodule is handled, for being segmented to the Chinese language material after cleaning, the word in Latin class language material is carried out Label fills.
With reference to second aspect to second aspect the third embodiment in any embodiment, the of second aspect In four kinds of embodiments, the name entity includes at least one kind of name, place name, currency, date or ordinary numbers.
With reference to second aspect to second aspect the third embodiment in any embodiment, the of second aspect In five kinds of embodiments, first character string is the character string being made of spcial character.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, and the electronic equipment includes:Housing, processor, deposit Reservoir, circuit board and power circuit, wherein, circuit board is placed in the interior volume that housing surrounds, and processor and memory are set On circuit boards;Power circuit, for each circuit or the device power supply for above-mentioned client;Memory is used to store executable Program code;The executable program code that processor is stored by reading in memory is corresponding with executable program code to run Program, for perform described in foregoing any embodiment containing name entity language material interpretation method.
Fourth aspect, the embodiment of the present invention provide a kind of computer-readable recording medium, the computer-readable storage medium Matter memory contains computer program, when the computer program is executed by processor or realizes containing described in foregoing any embodiment There is the language material interpretation method step of name entity.
5th aspect, the embodiment of the present invention provide a kind of application program, and the application program is used to perform foregoing any reality Apply the language material interpretation method containing name entity described in example.
A kind of language material interpretation method, device, electronic equipment and storage containing name entity provided in an embodiment of the present invention Medium, on the basis of the name entity composing law in considering source language and object language bilingual, passes through engineering Practising model can will be translated as including the first translation result of the first character string comprising the language material to be translated of name entity, pass through by Individually translation can be achieved to the language material containing name entity the corresponding name entity of the first character string in first translation result Accurate translation, realize rule-based approach name entity identification and translation.Translation scheme provided by the invention is disobeyed Rely in language material scale, it is not necessary to which mark and training, have the translation to the language material of higher identification and translation containing name entity Accuracy rate, improves the performance of statictic machine translation system.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow chart of the language material interpretation method embodiment containing name entity of the present invention;
Fig. 2 is the acquisition methods flow of the machine learning model in language material interpretation method of the present invention containing name entity Figure;
Fig. 3 is the preparation method flow chart of the bilingual training corpus of the present invention;
Fig. 4 is the structure diagram of language material translating equipment embodiment one of the present invention containing name entity;
Fig. 5 is the structure diagram of language material translating equipment embodiment two of the present invention containing name entity;
Fig. 6 is the structure diagram of language material preparation module 16;
Fig. 7 is the structure diagram of electronic equipment one embodiment of the present invention.
Embodiment
Since the form of the name entity such as time class and numeric class is fairly simple, also have in naming rule more obvious Regularity can be followed, therefore the present invention translates it using rule-based method.For make the invention solves technology ask Topic, technical solution and advantage are clearer, are described in detail below in conjunction with the accompanying drawings and the specific embodiments.
A kind of to the embodiment of the present invention language material interpretation method containing name entity and device carry out detailed below in conjunction with the accompanying drawings Thin description.
It will be appreciated that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.Base Embodiment in the present invention, those of ordinary skill in the art obtained without creative efforts it is all its Its embodiment, belongs to the scope of protection of the invention.
Fig. 1 is a kind of flow chart of the language material interpretation method embodiment containing name entity of the present invention, as shown in Figure 1, this The method of embodiment can include:
Step 101, receive the language material to be translated containing name entity.
In the present embodiment, name entity includes at least one kind of name, place name, currency, date or ordinary numbers.It can lead to Cross human-computer interaction interface and receive the language material to be translated containing name entity.Language material clearly for the not entity containing name is also can Normally to be translated, details are not described herein again.
Step 102, by machine learning model translate the language material to be translated, obtains the first translation result.
Wherein, when the machine learning model translates the language material to be translated, by the language material to be translated Named entity translation is the first character string.Preferably, the first character string is the character string being made of spcial character.
In the present embodiment, training translation model obtains machine learning model in advance, and the machine learning model is in the machine of progress When device is translated, the name entity in language material can be directly translated as to corresponding first character string of the name entity.Such as:If step Rapid 101 language materials to be translated received are " one hour 15 dollars, I closes at the end of wishing daily.", if machine learning model is It is " _ MONEY_ ", then this machine learning model to be arrived through training study in advance by the Named entity translation of " numeral+currency " composition Translation result to language material to be translated is:" i get_MONEY_an hour, and i expect to be paid at the conclusion of each day.”
Specifically, first character string in first translation result is treated with its corresponding name entity described Translate the positional information in language material.Record has initial position of the name entity in language material to be translated (as originated in positional information Character position sequence number) and name entity the information such as character length.
Step 103, obtain name entity corresponding with first character string from the language material to be translated.
In the present embodiment, the positional information that can be carried with specific reference to first character string, from the language material to be translated The middle source language text strings for obtaining the positional information pointed location, it is real as name corresponding with first character string Body.
Step 104, according to default translation rule by the Named entity translation be object language text strings.
In the present embodiment, the Named entity translation for being obtained previous step according to translation rule is object language text strings. Used translation rule can use existing any translation rule, such as certain advises translator of Chinese for the translation of English Then, or using current machine translation model the translation rule specified is translated.
Step 105, by first character string in first translation result replace with the object language text strings, Obtain the second translation result.
The present embodiment, the language material to be translated comprising name entity by machine learning model be translated as including the first character First translation result of string, will then obtain after the corresponding name entity of the first character string in the first translation result individually translation Second translation result is, it can be achieved that to the accurate translation containing the language material for naming entity, the name for realizing rule-based approach is real The identification and translation of body, the translation scheme is independent of language material scale, it is not necessary to mark and training, have to higher identification and The translation accuracy rate of language material of the translation containing name entity, improves the performance of statictic machine translation system.
Fig. 2 is the acquisition methods flow of the machine learning model in language material interpretation method of the present invention containing name entity Figure, including:
Step 201, prepare some groups of bilingual training corpus.
Wherein, the bilingual training corpus can be multilingual form, for example, in-English, in-day, in-moral etc. mutually Corresponding language material.
In the present embodiment, the preparation method of bilingual training corpus can be as shown in figure 3, comprise the following steps:
Step 301, obtain some groups of bilingual corporas;
Step 302, carry out data cleansing to the bilingual corpora;
Step 303, segment the Chinese language material after cleaning, and the word in Latin class language material is filled into row label (token)。
For example, there is one group of corresponding training corpus of Chinese and English to be:" loss of the producer of no quota is 200,000,000 dollars of " " producers without quotas were worse off by $ 200million. ", by step 201 to this group Language material is handled, and is prepared as:
" loss of the producer of no quota is 200,000,000 dollars.”
“producers without quotas were worse off by$200million.”
Step 202, identify name entity in the bilingual training corpus.
Wherein it is possible to entity is named by artificial or machine recognition.
For example, in the example of previous step 201, from " loss of the producer of no quota is identification in 200,000,000 dollars of " Go out to name entity " 200,000,000 dollars ", from " producers without quotas were worse off by $ Name entity " $ 200million " are identified in 200million. ".
Step 203, according to The Rules of Normalization set in advance, name in the bilingual training corpus that will identify that is real Body standardizes.
In the present embodiment, the form of standardization oneself can define as needed.For example, it is above-mentioned identify correspond Bilingual training corpus in name entity for " 200,000,000 dollars " and " $ 200million ", this step 203 by it is therein Chinese number Word, the Chinese and English expression way specification of currency turn to the expression way of " numeral+currency ", i.e., by above-mentioned bilingual training corpus specification Obtained after change:
" loss of the producer of no quota is 200000000.0 dollars.”
“producers without quotas were worse offby 200000000.0dollor.”
Step 204, the recognition rule according to name entity prepared in advance, after standardizing in the bilingual training corpus Name entity replace with the first character string respectively.
Wherein, the spoken and written languages string and the first character of standardization are preserved in the recognition rule of name entity prepared in advance Correspondence between string.Such as:The name entity of " numeral+currency " class corresponds to the first character string " _ MONEY_ ", " name " class Name entity correspond to the first character string " _ NAME_ ", the name entity of " place name " class correspond to the first character string " _ STATE_ ", The name entity of " date " class corresponds to first character string " _ DATE_ " etc..
Such as:If fixed character defined in the recognition rule of name entity prepared in advance " _ MONEY_ " it is corresponding " numeral+ Currency ", then be replaced to obtain to the name entity in above-mentioned steps 201-203 examples:
" loss of the producer of no quota is _ MONEY_.”
“producers without quotas were worse off by_MONEY_.”
Step 205, replace with name entity and instructed in the bilingual training corpus input translation model of the first character string Practice and obtain machine learning model.
In the present embodiment, training in the training corpus input translation model that many is corresponded, for example, " will not match somebody with somebody The loss of the producer of volume is _ MONEY_." and " producers without quotas were worse off by_ MONEY_. training in translation model " is inputted, final training obtains machine learning model, which can learn described Name the recognition rule of entity, such as association that " numeral+currency " is translated as " _ MONEY_ ".
The present embodiment, by advance training machine learning model, can make machine learning model by language material to be translated Name entity is directly translated as defined first character string.
A specific embodiment is used below, and the technical solution of the method for the present invention embodiment is described in detail.
(1) language material to be translated is received first as " one hour 15 dollars, I closes at the end of wishing daily.”
(2) machine learning model translates language material to be translated, obtains the first translation result and is:“i get_MONEY_ An hour, and i expect to be paid at the conclusion of each day. "
(3) according to the first character string in the first translation result, " positional information that _ MONEY_ " is carried matches " _ MONEY_ " Represent " 15 dollars ".
(4) " 15 dollars " are translated as " 15dollars " according to the Chinese-English translation rule used.
(5) by " i get_MONEY_an hour, the and i expect to be paid in the first translation result In the conclusion of each day. " " _ MONEY_ " replaces with " 15dollars ", obtains final second translation As a result it is:" i get 15dollars an hour, and i expect to be paid at the conclusion of each day.”
Fig. 4 is the structure diagram of language material translating equipment embodiment one of the present invention containing name entity, as shown in figure 4, The device of the present embodiment can include receiving module 11, the first translation module 12, matching module 13, the second translation module 14 and knot Fruit generation module 15;Wherein, receiving module 11, for receiving the language material to be translated containing name entity;First translation module 12, For being translated by machine learning model to language material to be translated, the first translation result is obtained;Wherein, machine learning model pair It is the first character string by the Named entity translation in language material to be translated when language material to be translated is translated;Matching module 13, is used for Name entity corresponding with the first character string is obtained from language material to be translated;Second translation module 14, for being turned over according to default It is object language text strings that rule, which is translated, by Named entity translation;Result-generation module 15, for by the first translation result One character string replaces with object language text strings, obtains the second translation result.
The device of the present embodiment, can be used for the technical solution for performing embodiment of the method shown in Fig. 1, its realization principle and skill Art effect is similar, and details are not described herein again.
Fig. 5 is the structure diagram of language material translating equipment embodiment two of the present invention containing name entity, as shown in figure 5, The device of the present embodiment is on the basis of Fig. 4 shown device structures, further, further includes language material preparation module 16, identification mould Block 17, specification module 18, replacement module 19 and training module 20;Wherein, language material preparation module 16, it is bilingual for preparing some groups Training corpus;Identification module 17, for identifying the name entity in bilingual training corpus;Specification module 18, for according to pre- The The Rules of Normalization first set, the name entity in the bilingual training corpus that will identify that standardize;Replacement module 19, is used In the recognition rule according to name entity prepared in advance, the name entity after standardizing in bilingual training corpus is replaced respectively For the first character string;Name pair preserved in the recognition rule of entity between the spoken and written languages string of standardization and the first character string It should be related to;Training module 20, the bilingual training corpus for entity will to be named to replace with the first character string are inputted in translation model It is trained acquisition machine learning model.
The device of the present embodiment, can be used for the technical solution for performing embodiment of the method shown in Fig. 2, its realization principle and skill Art effect is similar, and details are not described herein again.
Preferably, the first character string in the first translation result that the translation of the first translation module 12 obtains is corresponding with its Name positional information of the entity in language material to be translated;
Matching module 13, specifically for the positional information carried according to the first character string, position is obtained from language material to be translated Confidence ceases the source language text strings of pointed location, as name entity corresponding with the first character string.
Fig. 6 is the structure diagram of language material preparation module 16, and as shown in Figure 6, language material preparation module 16 can include obtaining Take submodule 161, cleaning submodule 162 and processing submodule 163;Wherein, acquisition submodule 161, it is double for obtaining some groups Language language material;Submodule 162 is cleaned, for carrying out data cleansing to bilingual corpora;Submodule 163 is handled, after to cleaning Chinese language material is segmented, and the word in Latin class language material is filled into row label.
The device of the present embodiment, can be used for the technical solution for performing embodiment of the method shown in Fig. 3, its realization principle and skill Art effect is similar, and details are not described herein again.
Preferably, it is any of the above-described containing name entity language material translating equipment translation name entity include at least name, Place name, currency, one kind of date or ordinary numbers.
Preferably, the first character string that any of the above-described language material translating equipment containing name entity uses is by spcial character The character string of composition.
Corresponding to the language material interpretation method provided in an embodiment of the present invention containing name entity, the embodiment of the present invention also provides A kind of electronic equipment, Fig. 7 are the structure diagram of electronic equipment one embodiment of the present invention, it is possible to achieve shown in Fig. 1 of the present invention The flow of embodiment, as shown in fig. 7, above-mentioned electronic equipment can include:Housing 21, processor 22, memory 23, circuit board 24 With power circuit 25, wherein, circuit board 24 is placed in the interior volume that housing 21 surrounds, and processor 22 and memory 23 are arranged on On circuit board 24;Power circuit 25, for each circuit or the device power supply for above-mentioned electronic equipment;Memory 23 is used to store Executable program code;Processor 22 is run and executable journey by reading the executable program code stored in memory 23 The corresponding program of sequence code, for performing the language material interpretation method containing name entity of foregoing any embodiment.
Above-mentioned electronic equipment exists in a variety of forms, includes but not limited to:
(1) mobile communication equipment:The characteristics of this kind equipment is that possess mobile communication function, and to provide speech, data Communicate as main target.This Terminal Type includes:Smart mobile phone (such as iPhone), multimedia handset, feature mobile phone, and it is low Hold mobile phone etc..
(2) super mobile personal computer equipment:This kind equipment belongs to the category of personal computer, there is calculating and processing work( Can, generally also possess mobile Internet access characteristic.This Terminal Type includes:PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device:This kind equipment can show and play content of multimedia.The kind equipment includes:Audio, Video playback module (such as iPod), handheld device, e-book, and intelligent toy and portable car-mounted navigation equipment.
(4) server:The equipment for providing the service of calculating, the composition of server are total including processor, hard disk, memory, system Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, security, scalability, manageability etc. are more demanding.
(5) other have the function of the electronic equipment of data interaction.
The present invention also provides a kind of computer-readable recording medium, the computer-readable recording medium memory contains calculating Machine program, the computer program are used for realization the language material containing name entity of foregoing any embodiment when being executed by processor Interpretation method step.
The embodiment of the present invention additionally provides a kind of application program, for performing the real containing name of foregoing any embodiment The language material interpretation method of body.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any this actual relation or order.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer read/write memory medium In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (RandomAccess Memory, RAM) etc..
The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, the change or replacement that can readily occur in, all should It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to scope of the claims.

Claims (10)

  1. A kind of 1. language material interpretation method containing name entity, it is characterised in that including:
    Receive the language material to be translated containing name entity;
    The language material to be translated is translated by machine learning model, obtains the first translation result;Wherein, the engineering It is the first character by the Named entity translation in the language material to be translated when habit model translates the language material to be translated String;
    Name entity corresponding with first character string is obtained from the language material to be translated;
    By the Named entity translation it is object language text strings according to default translation rule;
    First character string in first translation result is replaced with into the object language text strings, obtains the second translation As a result.
  2. 2. the language material interpretation method as claimed in claim 1 containing name entity, it is characterised in that the machine learning model Obtained according to following method:
    Prepare some groups of bilingual training corpus;
    Identify the name entity in the bilingual training corpus;
    According to The Rules of Normalization set in advance, the name entity in the bilingual training corpus that will identify that carries out specification Change;
    According to the recognition rule of name entity prepared in advance, the name entity point after standardizing in the bilingual training corpus The first character string is not replaced with;The spoken and written languages string and the first character of standardization are preserved in the recognition rule of the name entity Correspondence between string;
    Entity will be named to replace with the bilingual training corpus input translation model of the first character string and be trained the acquisition machine Device learning model.
  3. 3. the language material interpretation method as claimed in claim 1 containing name entity, it is characterised in that first translation result In first character string with its corresponding positional information of the name entity in the language material to be translated;
    The acquisition name entity corresponding with first character string from the language material to be translated, including:
    The positional information carried according to first character string, obtains the positional information meaning position from the language material to be translated The source language text strings put, as name entity corresponding with first character string.
  4. 4. the language material interpretation method as claimed in claim 2 containing name entity, it is characterised in that described to prepare some groups pairs Language training corpus, including:
    Obtain some groups of bilingual corporas;
    Data cleansing is carried out to the bilingual corpora;
    Chinese language material after cleaning is segmented, the word in Latin class language material is filled into row label.
  5. 5. as claim 1-4 any one of them contains the language material interpretation method of name entity, it is characterised in that the name Entity includes at least one kind of name, place name, currency, date or ordinary numbers.
  6. 6. as claim 1-4 any one of them contains the language material interpretation method of name entity, it is characterised in that described first Character string is the character string being made of spcial character.
  7. A kind of 7. language material translating equipment containing name entity, it is characterised in that including:
    Receiving module, for receiving the language material to be translated containing name entity;
    First translation module, for being translated by machine learning model to the language material to be translated, obtains the first translation knot Fruit;Wherein, it is when the machine learning model translates the language material to be translated, the name in the language material to be translated is real Body is translated as the first character string;
    Matching module, for obtaining name entity corresponding with first character string from the language material to be translated;
    Second translation module, for by the Named entity translation being object language text strings according to default translation rule;
    Result-generation module, for first character string in first translation result to be replaced with the object language text Word string, obtains the second translation result.
  8. 8. the language material translating equipment as claimed in claim 7 containing name entity, it is characterised in that further include:
    Language material preparation module, for preparing some groups of bilingual training corpus;
    Identification module, for identifying the name entity in the bilingual training corpus;
    Specification module, for according to The Rules of Normalization set in advance, the name in the bilingual training corpus that will identify that Entity standardizes;
    Replacement module, for the recognition rule according to name entity prepared in advance, will standardize in the bilingual training corpus Name entity afterwards replaces with the first character string respectively;The language text of standardization is preserved in the recognition rule of the name entity Correspondence between word string and the first character string;
    Training module, is instructed for entity will to be named to replace with the bilingual training corpus input translation model of the first character string Practice and obtain the machine learning model.
  9. 9. a kind of electronic equipment, it is characterised in that the electronic equipment includes:Housing, processor, memory, circuit board and electricity Source circuit, wherein, circuit board is placed in the interior volume that housing surrounds, and processor and memory are set on circuit boards;Power supply Circuit, for each circuit or the device power supply for above-mentioned client;Memory is used to store executable program code;Processor The executable program code stored by reading in memory runs program corresponding with executable program code, for performing Preceding claims 1-6 any one of them contains the language material interpretation method of name entity.
  10. 10. a kind of computer-readable recording medium, it is characterised in that the computer-readable recording medium memory contains computer Program, when the computer program is executed by processor or realizes that claim 1-6 any one of them contains name entity Language material interpretation method step.
CN201711245629.2A 2017-11-30 2017-11-30 Corpus translation method and device containing named entity, electronic equipment and storage medium Pending CN108009160A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711245629.2A CN108009160A (en) 2017-11-30 2017-11-30 Corpus translation method and device containing named entity, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711245629.2A CN108009160A (en) 2017-11-30 2017-11-30 Corpus translation method and device containing named entity, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN108009160A true CN108009160A (en) 2018-05-08

Family

ID=62055689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711245629.2A Pending CN108009160A (en) 2017-11-30 2017-11-30 Corpus translation method and device containing named entity, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108009160A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062909A (en) * 2018-07-23 2018-12-21 传神语联网网络科技股份有限公司 A kind of pluggable component
CN111144111A (en) * 2019-12-30 2020-05-12 北京世纪好未来教育科技有限公司 Translation method, device, equipment and storage medium
CN111222342A (en) * 2020-04-15 2020-06-02 北京金山数字娱乐科技有限公司 Translation method and device
CN113011141A (en) * 2021-03-17 2021-06-22 平安科技(深圳)有限公司 Buddha note model training method, Buddha note generation method and related equipment
WO2021134416A1 (en) * 2019-12-31 2021-07-08 深圳市优必选科技股份有限公司 Text transformation method and apparatus, computer device, and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1643511A (en) * 2002-03-11 2005-07-20 南加利福尼亚大学 Named entity translation
US20090319257A1 (en) * 2008-02-23 2009-12-24 Matthias Blume Translation of entity names
CN104298662A (en) * 2014-04-29 2015-01-21 中国专利信息中心 Machine translation method and translation system based on organism named entities
CN106874256A (en) * 2015-12-11 2017-06-20 北京国双科技有限公司 Name the method and device of entity in identification field

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1643511A (en) * 2002-03-11 2005-07-20 南加利福尼亚大学 Named entity translation
US20090319257A1 (en) * 2008-02-23 2009-12-24 Matthias Blume Translation of entity names
CN104298662A (en) * 2014-04-29 2015-01-21 中国专利信息中心 Machine translation method and translation system based on organism named entities
CN106874256A (en) * 2015-12-11 2017-06-20 北京国双科技有限公司 Name the method and device of entity in identification field

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
夏青: "汉柬命名实体翻译等价对获取方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *
王松: "中文机构名称及地址的汉英翻译方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062909A (en) * 2018-07-23 2018-12-21 传神语联网网络科技股份有限公司 A kind of pluggable component
CN111144111A (en) * 2019-12-30 2020-05-12 北京世纪好未来教育科技有限公司 Translation method, device, equipment and storage medium
WO2021134416A1 (en) * 2019-12-31 2021-07-08 深圳市优必选科技股份有限公司 Text transformation method and apparatus, computer device, and computer readable storage medium
CN111222342A (en) * 2020-04-15 2020-06-02 北京金山数字娱乐科技有限公司 Translation method and device
CN111222342B (en) * 2020-04-15 2020-08-11 北京金山数字娱乐科技有限公司 Translation method and device
CN113011141A (en) * 2021-03-17 2021-06-22 平安科技(深圳)有限公司 Buddha note model training method, Buddha note generation method and related equipment

Similar Documents

Publication Publication Date Title
CN108009160A (en) Corpus translation method and device containing named entity, electronic equipment and storage medium
CN108959256B (en) Short text generation method and device, storage medium and terminal equipment
CN108345672A (en) Intelligent response method, electronic device and storage medium
CN108549646B (en) Neural network machine translation system based on capsule and information data processing terminal
CN107861954B (en) Information output method and device based on artificial intelligence
KR20190125863A (en) Multilingual translation device and multilingual translation method
CN108537176A (en) Recognition methods, device, terminal and the storage medium of target barrage
CN110222330B (en) Semantic recognition method and device, storage medium and computer equipment
CN104239289B (en) Syllabification method and syllabification equipment
CN111143556B (en) Automatic counting method and device for software function points, medium and electronic equipment
CN111311459B (en) Interactive question-setting method and system for international Chinese teaching
CN108304387B (en) Method, device, server group and storage medium for recognizing noise words in text
CN110046637A (en) A kind of training method, device and the equipment of contract paragraph marking model
Muñoz Cognitive and psycholinguistic approaches
CN115952272A (en) Method, device and equipment for generating dialogue information and readable storage medium
CN112287698A (en) Chapter translation method and device, electronic equipment and storage medium
CN104916177A (en) Electronic device and data output method of the electronic device
CN103678270B (en) Semantic primitive abstracting method and semantic primitive extracting device
Hu Analysis of the feasibility and advantages of using big data technology for English translation
CN111125550A (en) Interest point classification method, device, equipment and storage medium
CN113626576A (en) Method and device for extracting relational characteristics in remote supervision, terminal and storage medium
CN112466277A (en) Rhythm model training method and device, electronic equipment and storage medium
CN117131155A (en) Multi-category identification method, device, electronic equipment and storage medium
CN112100355A (en) Intelligent interaction method, device and equipment
EP3185132B1 (en) Method for writing a foreign language in a pseudo language phonetically resembling native language of the speaker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180508