CN108009160A - Corpus translation method and device containing named entity, electronic equipment and storage medium - Google Patents
Corpus translation method and device containing named entity, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN108009160A CN108009160A CN201711245629.2A CN201711245629A CN108009160A CN 108009160 A CN108009160 A CN 108009160A CN 201711245629 A CN201711245629 A CN 201711245629A CN 108009160 A CN108009160 A CN 108009160A
- Authority
- CN
- China
- Prior art keywords
- language material
- translation
- name entity
- translated
- character string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention provides a corpus translation method and device containing a named entity, electronic equipment and a storage medium, relates to the field of machine translation, and aims to solve the problem that the existing machine translation method is low in corpus translation accuracy of the named entity. The corpus translation method containing the named entity comprises the following steps: receiving a corpus to be translated containing a named entity; translating the linguistic data to be translated through a machine learning model to obtain a first translation result; when the machine learning model translates the corpus to be translated, translating a named entity in the corpus to be translated into a first character string; acquiring a named entity corresponding to the first character string from the corpus to be translated; translating the named entity into a target language literal string according to a preset translation rule; and replacing the first character string in the first translation result with the target language character string to obtain a second translation result. The invention is applicable to various machine translation models.
Description
Technical field
The present invention relates to machine translation field, more particularly to a kind of language material interpretation method, device, electricity containing name entity
Sub- equipment and storage medium.
Background technology
Machine translation is that a kind of natural language (original language) is converted to another natural language (target language using computer
Speech) process.It is a branch of computational linguistics, is one of ultimate aim of artificial intelligence, and there is important science to grind
Study carefully value.And translation is in itself the business memory block with potential quality, the prosperity of international exchange, more expands the need to translation
Ask.
Machine translation method based on deep learning is developed rapidly from after proposing, becomes current machine translation field
Research hotspot.At present due to language material size limit, naming the translation effect of entity cannot reach qualified horizontal.Name entity
Identification and the important step that translation is language material preprocessing tasks in statistical machine translation, to follow-up model training and system
Performance has important influence.
At present, the identification of entity is named and method that interpretation method is based primarily upon statistics, it utilizes the language material manually marked
Translation model is trained, translation model is identified from language phenomenon learning and translation knowledge, automatic discrimination simultaneously translate name
Entity.But the machine learning method based on statistics needs the support of large-scale corpus, when language material scale is smaller, can reduce
Identification and the translation accuracy rate of entity are named, finally influences follow-up natural language processing task.And in daily language material on
The language material scale is smaller of entity is named, it is not high using current translation model translation accuracy rate.
The content of the invention
In view of this, the embodiment of the present invention provides a kind of language material interpretation method, device, electronic equipment containing name entity
And storage medium, it can solve the problems, such as that existing machine translation method is not high to the language material translation accuracy rate containing name entity.
In a first aspect, the embodiment of the present invention provides a kind of language material interpretation method containing name entity, including:
Receive the language material to be translated containing name entity;
The language material to be translated is translated by machine learning model, obtains the first translation result;Wherein, the machine
It is the first word by the Named entity translation in the language material to be translated when device learning model translates the language material to be translated
Symbol string;
Name entity corresponding with first character string is obtained from the language material to be translated;
By the Named entity translation it is object language text strings according to default translation rule;
First character string in first translation result is replaced with into the object language text strings, obtains second
Translation result.
With reference to first aspect, in the first embodiment of first aspect, the machine learning model is according to such as lower section
Method obtains:
Prepare some groups of bilingual training corpus;
Identify the name entity in the bilingual training corpus;
According to The Rules of Normalization set in advance, name entity in the bilingual training corpus that will identify that is into professional etiquette
Generalized;
It is according to the recognition rule of name entity prepared in advance, the name after standardizing in the bilingual training corpus is real
Body replaces with the first character string respectively;The spoken and written languages string and first of standardization is preserved in the recognition rule of the name entity
Correspondence between character string;
Entity will be named to replace with the bilingual training corpus input translation model of the first character string and be trained acquisition institute
State machine learning model.
With reference to first aspect, in second of embodiment of first aspect, described in first translation result
One character string is with its positional information of the corresponding name entity in the language material to be translated;
The acquisition name entity corresponding with first character string from the language material to be translated, including:
The positional information carried according to first character string, obtains the positional information institute from the language material to be translated
Refer to the source language text strings of position, as name entity corresponding with first character string.
The first embodiment with reference to first aspect, in the third embodiment of first aspect, if described prepare
The dry bilingual training corpus of group, including:
Obtain some groups of bilingual corporas;
Data cleansing is carried out to the bilingual corpora;
Chinese language material after cleaning is segmented, the word in Latin class language material is filled into row label.
Any embodiment into the third embodiment of first aspect with reference to first aspect, the of first aspect
In four kinds of embodiments, the name entity includes at least one kind of name, place name, currency, date or ordinary numbers.
Any embodiment into the third embodiment of first aspect with reference to first aspect, the of first aspect
In five kinds of embodiments, first character string is the character string being made of spcial character.
Second aspect, the embodiment of the present invention provide a kind of language material translating equipment containing name entity, including:
Receiving module, for receiving the language material to be translated containing name entity;
First translation module, for being translated by machine learning model to the language material to be translated, obtains first and turns over
Translate result;Wherein, when the machine learning model translates the language material to be translated, by the life in the language material to be translated
Name entity is translated as the first character string;
Matching module, for obtaining name entity corresponding with first character string from the language material to be translated;
Second translation module, for by the Named entity translation being object language word according to default translation rule
String;
Result-generation module, for first character string in first translation result to be replaced with the target language
Say text strings, obtain the second translation result.
With reference to second aspect, in the first embodiment of second aspect, the language material translation containing name entity
Device, further includes:
Language material preparation module, for preparing some groups of bilingual training corpus;
Identification module, for identifying the name entity in the bilingual training corpus;
Specification module, for according to The Rules of Normalization set in advance, in the bilingual training corpus that will identify that
Name entity standardizes;
Replacement module, for the recognition rule according to name entity prepared in advance, will advise in the bilingual training corpus
Name entity after generalized replaces with the first character string respectively;The language of standardization is preserved in the recognition rule of the name entity
Say the correspondence between text strings and the first character string;
Training module, for by name entity replace with the first character string bilingual training corpus input translation model in into
Row training obtains the machine learning model.
With reference to second aspect, in second of embodiment of second aspect, what first translation module was translated
First character string in first translation result is with its position of the corresponding name entity in the language material to be translated
Confidence ceases;
The matching module, specifically for the positional information carried according to first character string, from the language to be translated
The source language text strings of the positional information pointed location are obtained in material, it is real as name corresponding with first character string
Body.
With reference to the first embodiment of second aspect, in the third embodiment of second aspect, the language material is accurate
Standby module includes:
Acquisition submodule, for obtaining some groups of bilingual corporas;
Submodule is cleaned, for carrying out data cleansing to the bilingual corpora;
Submodule is handled, for being segmented to the Chinese language material after cleaning, the word in Latin class language material is carried out
Label fills.
With reference to second aspect to second aspect the third embodiment in any embodiment, the of second aspect
In four kinds of embodiments, the name entity includes at least one kind of name, place name, currency, date or ordinary numbers.
With reference to second aspect to second aspect the third embodiment in any embodiment, the of second aspect
In five kinds of embodiments, first character string is the character string being made of spcial character.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, and the electronic equipment includes:Housing, processor, deposit
Reservoir, circuit board and power circuit, wherein, circuit board is placed in the interior volume that housing surrounds, and processor and memory are set
On circuit boards;Power circuit, for each circuit or the device power supply for above-mentioned client;Memory is used to store executable
Program code;The executable program code that processor is stored by reading in memory is corresponding with executable program code to run
Program, for perform described in foregoing any embodiment containing name entity language material interpretation method.
Fourth aspect, the embodiment of the present invention provide a kind of computer-readable recording medium, the computer-readable storage medium
Matter memory contains computer program, when the computer program is executed by processor or realizes containing described in foregoing any embodiment
There is the language material interpretation method step of name entity.
5th aspect, the embodiment of the present invention provide a kind of application program, and the application program is used to perform foregoing any reality
Apply the language material interpretation method containing name entity described in example.
A kind of language material interpretation method, device, electronic equipment and storage containing name entity provided in an embodiment of the present invention
Medium, on the basis of the name entity composing law in considering source language and object language bilingual, passes through engineering
Practising model can will be translated as including the first translation result of the first character string comprising the language material to be translated of name entity, pass through by
Individually translation can be achieved to the language material containing name entity the corresponding name entity of the first character string in first translation result
Accurate translation, realize rule-based approach name entity identification and translation.Translation scheme provided by the invention is disobeyed
Rely in language material scale, it is not necessary to which mark and training, have the translation to the language material of higher identification and translation containing name entity
Accuracy rate, improves the performance of statictic machine translation system.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow chart of the language material interpretation method embodiment containing name entity of the present invention;
Fig. 2 is the acquisition methods flow of the machine learning model in language material interpretation method of the present invention containing name entity
Figure;
Fig. 3 is the preparation method flow chart of the bilingual training corpus of the present invention;
Fig. 4 is the structure diagram of language material translating equipment embodiment one of the present invention containing name entity;
Fig. 5 is the structure diagram of language material translating equipment embodiment two of the present invention containing name entity;
Fig. 6 is the structure diagram of language material preparation module 16;
Fig. 7 is the structure diagram of electronic equipment one embodiment of the present invention.
Embodiment
Since the form of the name entity such as time class and numeric class is fairly simple, also have in naming rule more obvious
Regularity can be followed, therefore the present invention translates it using rule-based method.For make the invention solves technology ask
Topic, technical solution and advantage are clearer, are described in detail below in conjunction with the accompanying drawings and the specific embodiments.
A kind of to the embodiment of the present invention language material interpretation method containing name entity and device carry out detailed below in conjunction with the accompanying drawings
Thin description.
It will be appreciated that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.Base
Embodiment in the present invention, those of ordinary skill in the art obtained without creative efforts it is all its
Its embodiment, belongs to the scope of protection of the invention.
Fig. 1 is a kind of flow chart of the language material interpretation method embodiment containing name entity of the present invention, as shown in Figure 1, this
The method of embodiment can include:
Step 101, receive the language material to be translated containing name entity.
In the present embodiment, name entity includes at least one kind of name, place name, currency, date or ordinary numbers.It can lead to
Cross human-computer interaction interface and receive the language material to be translated containing name entity.Language material clearly for the not entity containing name is also can
Normally to be translated, details are not described herein again.
Step 102, by machine learning model translate the language material to be translated, obtains the first translation result.
Wherein, when the machine learning model translates the language material to be translated, by the language material to be translated
Named entity translation is the first character string.Preferably, the first character string is the character string being made of spcial character.
In the present embodiment, training translation model obtains machine learning model in advance, and the machine learning model is in the machine of progress
When device is translated, the name entity in language material can be directly translated as to corresponding first character string of the name entity.Such as:If step
Rapid 101 language materials to be translated received are " one hour 15 dollars, I closes at the end of wishing daily.", if machine learning model is
It is " _ MONEY_ ", then this machine learning model to be arrived through training study in advance by the Named entity translation of " numeral+currency " composition
Translation result to language material to be translated is:" i get_MONEY_an hour, and i expect to be paid at the
conclusion of each day.”
Specifically, first character string in first translation result is treated with its corresponding name entity described
Translate the positional information in language material.Record has initial position of the name entity in language material to be translated (as originated in positional information
Character position sequence number) and name entity the information such as character length.
Step 103, obtain name entity corresponding with first character string from the language material to be translated.
In the present embodiment, the positional information that can be carried with specific reference to first character string, from the language material to be translated
The middle source language text strings for obtaining the positional information pointed location, it is real as name corresponding with first character string
Body.
Step 104, according to default translation rule by the Named entity translation be object language text strings.
In the present embodiment, the Named entity translation for being obtained previous step according to translation rule is object language text strings.
Used translation rule can use existing any translation rule, such as certain advises translator of Chinese for the translation of English
Then, or using current machine translation model the translation rule specified is translated.
Step 105, by first character string in first translation result replace with the object language text strings,
Obtain the second translation result.
The present embodiment, the language material to be translated comprising name entity by machine learning model be translated as including the first character
First translation result of string, will then obtain after the corresponding name entity of the first character string in the first translation result individually translation
Second translation result is, it can be achieved that to the accurate translation containing the language material for naming entity, the name for realizing rule-based approach is real
The identification and translation of body, the translation scheme is independent of language material scale, it is not necessary to mark and training, have to higher identification and
The translation accuracy rate of language material of the translation containing name entity, improves the performance of statictic machine translation system.
Fig. 2 is the acquisition methods flow of the machine learning model in language material interpretation method of the present invention containing name entity
Figure, including:
Step 201, prepare some groups of bilingual training corpus.
Wherein, the bilingual training corpus can be multilingual form, for example, in-English, in-day, in-moral etc. mutually
Corresponding language material.
In the present embodiment, the preparation method of bilingual training corpus can be as shown in figure 3, comprise the following steps:
Step 301, obtain some groups of bilingual corporas;
Step 302, carry out data cleansing to the bilingual corpora;
Step 303, segment the Chinese language material after cleaning, and the word in Latin class language material is filled into row label
(token)。
For example, there is one group of corresponding training corpus of Chinese and English to be:" loss of the producer of no quota is 200,000,000 dollars of "
" producers without quotas were worse off by $ 200million. ", by step 201 to this group
Language material is handled, and is prepared as:
" loss of the producer of no quota is 200,000,000 dollars.”
“producers without quotas were worse off by$200million.”
Step 202, identify name entity in the bilingual training corpus.
Wherein it is possible to entity is named by artificial or machine recognition.
For example, in the example of previous step 201, from " loss of the producer of no quota is identification in 200,000,000 dollars of "
Go out to name entity " 200,000,000 dollars ", from " producers without quotas were worse off by $
Name entity " $ 200million " are identified in 200million. ".
Step 203, according to The Rules of Normalization set in advance, name in the bilingual training corpus that will identify that is real
Body standardizes.
In the present embodiment, the form of standardization oneself can define as needed.For example, it is above-mentioned identify correspond
Bilingual training corpus in name entity for " 200,000,000 dollars " and " $ 200million ", this step 203 by it is therein Chinese number
Word, the Chinese and English expression way specification of currency turn to the expression way of " numeral+currency ", i.e., by above-mentioned bilingual training corpus specification
Obtained after change:
" loss of the producer of no quota is 200000000.0 dollars.”
“producers without quotas were worse offby 200000000.0dollor.”
Step 204, the recognition rule according to name entity prepared in advance, after standardizing in the bilingual training corpus
Name entity replace with the first character string respectively.
Wherein, the spoken and written languages string and the first character of standardization are preserved in the recognition rule of name entity prepared in advance
Correspondence between string.Such as:The name entity of " numeral+currency " class corresponds to the first character string " _ MONEY_ ", " name " class
Name entity correspond to the first character string " _ NAME_ ", the name entity of " place name " class correspond to the first character string " _ STATE_ ",
The name entity of " date " class corresponds to first character string " _ DATE_ " etc..
Such as:If fixed character defined in the recognition rule of name entity prepared in advance " _ MONEY_ " it is corresponding " numeral+
Currency ", then be replaced to obtain to the name entity in above-mentioned steps 201-203 examples:
" loss of the producer of no quota is _ MONEY_.”
“producers without quotas were worse off by_MONEY_.”
Step 205, replace with name entity and instructed in the bilingual training corpus input translation model of the first character string
Practice and obtain machine learning model.
In the present embodiment, training in the training corpus input translation model that many is corresponded, for example, " will not match somebody with somebody
The loss of the producer of volume is _ MONEY_." and " producers without quotas were worse off by_
MONEY_. training in translation model " is inputted, final training obtains machine learning model, which can learn described
Name the recognition rule of entity, such as association that " numeral+currency " is translated as " _ MONEY_ ".
The present embodiment, by advance training machine learning model, can make machine learning model by language material to be translated
Name entity is directly translated as defined first character string.
A specific embodiment is used below, and the technical solution of the method for the present invention embodiment is described in detail.
(1) language material to be translated is received first as " one hour 15 dollars, I closes at the end of wishing daily.”
(2) machine learning model translates language material to be translated, obtains the first translation result and is:“i get_MONEY_
An hour, and i expect to be paid at the conclusion of each day. "
(3) according to the first character string in the first translation result, " positional information that _ MONEY_ " is carried matches " _ MONEY_ "
Represent " 15 dollars ".
(4) " 15 dollars " are translated as " 15dollars " according to the Chinese-English translation rule used.
(5) by " i get_MONEY_an hour, the and i expect to be paid in the first translation result
In the conclusion of each day. " " _ MONEY_ " replaces with " 15dollars ", obtains final second translation
As a result it is:" i get 15dollars an hour, and i expect to be paid at the conclusion of
each day.”
Fig. 4 is the structure diagram of language material translating equipment embodiment one of the present invention containing name entity, as shown in figure 4,
The device of the present embodiment can include receiving module 11, the first translation module 12, matching module 13, the second translation module 14 and knot
Fruit generation module 15;Wherein, receiving module 11, for receiving the language material to be translated containing name entity;First translation module 12,
For being translated by machine learning model to language material to be translated, the first translation result is obtained;Wherein, machine learning model pair
It is the first character string by the Named entity translation in language material to be translated when language material to be translated is translated;Matching module 13, is used for
Name entity corresponding with the first character string is obtained from language material to be translated;Second translation module 14, for being turned over according to default
It is object language text strings that rule, which is translated, by Named entity translation;Result-generation module 15, for by the first translation result
One character string replaces with object language text strings, obtains the second translation result.
The device of the present embodiment, can be used for the technical solution for performing embodiment of the method shown in Fig. 1, its realization principle and skill
Art effect is similar, and details are not described herein again.
Fig. 5 is the structure diagram of language material translating equipment embodiment two of the present invention containing name entity, as shown in figure 5,
The device of the present embodiment is on the basis of Fig. 4 shown device structures, further, further includes language material preparation module 16, identification mould
Block 17, specification module 18, replacement module 19 and training module 20;Wherein, language material preparation module 16, it is bilingual for preparing some groups
Training corpus;Identification module 17, for identifying the name entity in bilingual training corpus;Specification module 18, for according to pre-
The The Rules of Normalization first set, the name entity in the bilingual training corpus that will identify that standardize;Replacement module 19, is used
In the recognition rule according to name entity prepared in advance, the name entity after standardizing in bilingual training corpus is replaced respectively
For the first character string;Name pair preserved in the recognition rule of entity between the spoken and written languages string of standardization and the first character string
It should be related to;Training module 20, the bilingual training corpus for entity will to be named to replace with the first character string are inputted in translation model
It is trained acquisition machine learning model.
The device of the present embodiment, can be used for the technical solution for performing embodiment of the method shown in Fig. 2, its realization principle and skill
Art effect is similar, and details are not described herein again.
Preferably, the first character string in the first translation result that the translation of the first translation module 12 obtains is corresponding with its
Name positional information of the entity in language material to be translated;
Matching module 13, specifically for the positional information carried according to the first character string, position is obtained from language material to be translated
Confidence ceases the source language text strings of pointed location, as name entity corresponding with the first character string.
Fig. 6 is the structure diagram of language material preparation module 16, and as shown in Figure 6, language material preparation module 16 can include obtaining
Take submodule 161, cleaning submodule 162 and processing submodule 163;Wherein, acquisition submodule 161, it is double for obtaining some groups
Language language material;Submodule 162 is cleaned, for carrying out data cleansing to bilingual corpora;Submodule 163 is handled, after to cleaning
Chinese language material is segmented, and the word in Latin class language material is filled into row label.
The device of the present embodiment, can be used for the technical solution for performing embodiment of the method shown in Fig. 3, its realization principle and skill
Art effect is similar, and details are not described herein again.
Preferably, it is any of the above-described containing name entity language material translating equipment translation name entity include at least name,
Place name, currency, one kind of date or ordinary numbers.
Preferably, the first character string that any of the above-described language material translating equipment containing name entity uses is by spcial character
The character string of composition.
Corresponding to the language material interpretation method provided in an embodiment of the present invention containing name entity, the embodiment of the present invention also provides
A kind of electronic equipment, Fig. 7 are the structure diagram of electronic equipment one embodiment of the present invention, it is possible to achieve shown in Fig. 1 of the present invention
The flow of embodiment, as shown in fig. 7, above-mentioned electronic equipment can include:Housing 21, processor 22, memory 23, circuit board 24
With power circuit 25, wherein, circuit board 24 is placed in the interior volume that housing 21 surrounds, and processor 22 and memory 23 are arranged on
On circuit board 24;Power circuit 25, for each circuit or the device power supply for above-mentioned electronic equipment;Memory 23 is used to store
Executable program code;Processor 22 is run and executable journey by reading the executable program code stored in memory 23
The corresponding program of sequence code, for performing the language material interpretation method containing name entity of foregoing any embodiment.
Above-mentioned electronic equipment exists in a variety of forms, includes but not limited to:
(1) mobile communication equipment:The characteristics of this kind equipment is that possess mobile communication function, and to provide speech, data
Communicate as main target.This Terminal Type includes:Smart mobile phone (such as iPhone), multimedia handset, feature mobile phone, and it is low
Hold mobile phone etc..
(2) super mobile personal computer equipment:This kind equipment belongs to the category of personal computer, there is calculating and processing work(
Can, generally also possess mobile Internet access characteristic.This Terminal Type includes:PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device:This kind equipment can show and play content of multimedia.The kind equipment includes:Audio,
Video playback module (such as iPod), handheld device, e-book, and intelligent toy and portable car-mounted navigation equipment.
(4) server:The equipment for providing the service of calculating, the composition of server are total including processor, hard disk, memory, system
Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy
Power, stability, reliability, security, scalability, manageability etc. are more demanding.
(5) other have the function of the electronic equipment of data interaction.
The present invention also provides a kind of computer-readable recording medium, the computer-readable recording medium memory contains calculating
Machine program, the computer program are used for realization the language material containing name entity of foregoing any embodiment when being executed by processor
Interpretation method step.
The embodiment of the present invention additionally provides a kind of application program, for performing the real containing name of foregoing any embodiment
The language material interpretation method of body.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any this actual relation or order.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer read/write memory medium
In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (RandomAccess
Memory, RAM) etc..
The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, the change or replacement that can readily occur in, all should
It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to scope of the claims.
Claims (10)
- A kind of 1. language material interpretation method containing name entity, it is characterised in that including:Receive the language material to be translated containing name entity;The language material to be translated is translated by machine learning model, obtains the first translation result;Wherein, the engineering It is the first character by the Named entity translation in the language material to be translated when habit model translates the language material to be translated String;Name entity corresponding with first character string is obtained from the language material to be translated;By the Named entity translation it is object language text strings according to default translation rule;First character string in first translation result is replaced with into the object language text strings, obtains the second translation As a result.
- 2. the language material interpretation method as claimed in claim 1 containing name entity, it is characterised in that the machine learning model Obtained according to following method:Prepare some groups of bilingual training corpus;Identify the name entity in the bilingual training corpus;According to The Rules of Normalization set in advance, the name entity in the bilingual training corpus that will identify that carries out specification Change;According to the recognition rule of name entity prepared in advance, the name entity point after standardizing in the bilingual training corpus The first character string is not replaced with;The spoken and written languages string and the first character of standardization are preserved in the recognition rule of the name entity Correspondence between string;Entity will be named to replace with the bilingual training corpus input translation model of the first character string and be trained the acquisition machine Device learning model.
- 3. the language material interpretation method as claimed in claim 1 containing name entity, it is characterised in that first translation result In first character string with its corresponding positional information of the name entity in the language material to be translated;The acquisition name entity corresponding with first character string from the language material to be translated, including:The positional information carried according to first character string, obtains the positional information meaning position from the language material to be translated The source language text strings put, as name entity corresponding with first character string.
- 4. the language material interpretation method as claimed in claim 2 containing name entity, it is characterised in that described to prepare some groups pairs Language training corpus, including:Obtain some groups of bilingual corporas;Data cleansing is carried out to the bilingual corpora;Chinese language material after cleaning is segmented, the word in Latin class language material is filled into row label.
- 5. as claim 1-4 any one of them contains the language material interpretation method of name entity, it is characterised in that the name Entity includes at least one kind of name, place name, currency, date or ordinary numbers.
- 6. as claim 1-4 any one of them contains the language material interpretation method of name entity, it is characterised in that described first Character string is the character string being made of spcial character.
- A kind of 7. language material translating equipment containing name entity, it is characterised in that including:Receiving module, for receiving the language material to be translated containing name entity;First translation module, for being translated by machine learning model to the language material to be translated, obtains the first translation knot Fruit;Wherein, it is when the machine learning model translates the language material to be translated, the name in the language material to be translated is real Body is translated as the first character string;Matching module, for obtaining name entity corresponding with first character string from the language material to be translated;Second translation module, for by the Named entity translation being object language text strings according to default translation rule;Result-generation module, for first character string in first translation result to be replaced with the object language text Word string, obtains the second translation result.
- 8. the language material translating equipment as claimed in claim 7 containing name entity, it is characterised in that further include:Language material preparation module, for preparing some groups of bilingual training corpus;Identification module, for identifying the name entity in the bilingual training corpus;Specification module, for according to The Rules of Normalization set in advance, the name in the bilingual training corpus that will identify that Entity standardizes;Replacement module, for the recognition rule according to name entity prepared in advance, will standardize in the bilingual training corpus Name entity afterwards replaces with the first character string respectively;The language text of standardization is preserved in the recognition rule of the name entity Correspondence between word string and the first character string;Training module, is instructed for entity will to be named to replace with the bilingual training corpus input translation model of the first character string Practice and obtain the machine learning model.
- 9. a kind of electronic equipment, it is characterised in that the electronic equipment includes:Housing, processor, memory, circuit board and electricity Source circuit, wherein, circuit board is placed in the interior volume that housing surrounds, and processor and memory are set on circuit boards;Power supply Circuit, for each circuit or the device power supply for above-mentioned client;Memory is used to store executable program code;Processor The executable program code stored by reading in memory runs program corresponding with executable program code, for performing Preceding claims 1-6 any one of them contains the language material interpretation method of name entity.
- 10. a kind of computer-readable recording medium, it is characterised in that the computer-readable recording medium memory contains computer Program, when the computer program is executed by processor or realizes that claim 1-6 any one of them contains name entity Language material interpretation method step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711245629.2A CN108009160A (en) | 2017-11-30 | 2017-11-30 | Corpus translation method and device containing named entity, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711245629.2A CN108009160A (en) | 2017-11-30 | 2017-11-30 | Corpus translation method and device containing named entity, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108009160A true CN108009160A (en) | 2018-05-08 |
Family
ID=62055689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711245629.2A Pending CN108009160A (en) | 2017-11-30 | 2017-11-30 | Corpus translation method and device containing named entity, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108009160A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109062909A (en) * | 2018-07-23 | 2018-12-21 | 传神语联网网络科技股份有限公司 | A kind of pluggable component |
CN111144111A (en) * | 2019-12-30 | 2020-05-12 | 北京世纪好未来教育科技有限公司 | Translation method, device, equipment and storage medium |
CN111222342A (en) * | 2020-04-15 | 2020-06-02 | 北京金山数字娱乐科技有限公司 | Translation method and device |
CN113011141A (en) * | 2021-03-17 | 2021-06-22 | 平安科技(深圳)有限公司 | Buddha note model training method, Buddha note generation method and related equipment |
WO2021134416A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市优必选科技股份有限公司 | Text transformation method and apparatus, computer device, and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1643511A (en) * | 2002-03-11 | 2005-07-20 | 南加利福尼亚大学 | Named entity translation |
US20090319257A1 (en) * | 2008-02-23 | 2009-12-24 | Matthias Blume | Translation of entity names |
CN104298662A (en) * | 2014-04-29 | 2015-01-21 | 中国专利信息中心 | Machine translation method and translation system based on organism named entities |
CN106874256A (en) * | 2015-12-11 | 2017-06-20 | 北京国双科技有限公司 | Name the method and device of entity in identification field |
-
2017
- 2017-11-30 CN CN201711245629.2A patent/CN108009160A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1643511A (en) * | 2002-03-11 | 2005-07-20 | 南加利福尼亚大学 | Named entity translation |
US20090319257A1 (en) * | 2008-02-23 | 2009-12-24 | Matthias Blume | Translation of entity names |
CN104298662A (en) * | 2014-04-29 | 2015-01-21 | 中国专利信息中心 | Machine translation method and translation system based on organism named entities |
CN106874256A (en) * | 2015-12-11 | 2017-06-20 | 北京国双科技有限公司 | Name the method and device of entity in identification field |
Non-Patent Citations (2)
Title |
---|
夏青: "汉柬命名实体翻译等价对获取方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 * |
王松: "中文机构名称及地址的汉英翻译方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109062909A (en) * | 2018-07-23 | 2018-12-21 | 传神语联网网络科技股份有限公司 | A kind of pluggable component |
CN111144111A (en) * | 2019-12-30 | 2020-05-12 | 北京世纪好未来教育科技有限公司 | Translation method, device, equipment and storage medium |
WO2021134416A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市优必选科技股份有限公司 | Text transformation method and apparatus, computer device, and computer readable storage medium |
CN111222342A (en) * | 2020-04-15 | 2020-06-02 | 北京金山数字娱乐科技有限公司 | Translation method and device |
CN111222342B (en) * | 2020-04-15 | 2020-08-11 | 北京金山数字娱乐科技有限公司 | Translation method and device |
CN113011141A (en) * | 2021-03-17 | 2021-06-22 | 平安科技(深圳)有限公司 | Buddha note model training method, Buddha note generation method and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108009160A (en) | Corpus translation method and device containing named entity, electronic equipment and storage medium | |
CN108959256B (en) | Short text generation method and device, storage medium and terminal equipment | |
CN108345672A (en) | Intelligent response method, electronic device and storage medium | |
CN108549646B (en) | Neural network machine translation system based on capsule and information data processing terminal | |
CN107861954B (en) | Information output method and device based on artificial intelligence | |
KR20190125863A (en) | Multilingual translation device and multilingual translation method | |
CN108537176A (en) | Recognition methods, device, terminal and the storage medium of target barrage | |
CN110222330B (en) | Semantic recognition method and device, storage medium and computer equipment | |
CN104239289B (en) | Syllabification method and syllabification equipment | |
CN111143556B (en) | Automatic counting method and device for software function points, medium and electronic equipment | |
CN111311459B (en) | Interactive question-setting method and system for international Chinese teaching | |
CN108304387B (en) | Method, device, server group and storage medium for recognizing noise words in text | |
CN110046637A (en) | A kind of training method, device and the equipment of contract paragraph marking model | |
Muñoz | Cognitive and psycholinguistic approaches | |
CN115952272A (en) | Method, device and equipment for generating dialogue information and readable storage medium | |
CN112287698A (en) | Chapter translation method and device, electronic equipment and storage medium | |
CN104916177A (en) | Electronic device and data output method of the electronic device | |
CN103678270B (en) | Semantic primitive abstracting method and semantic primitive extracting device | |
Hu | Analysis of the feasibility and advantages of using big data technology for English translation | |
CN111125550A (en) | Interest point classification method, device, equipment and storage medium | |
CN113626576A (en) | Method and device for extracting relational characteristics in remote supervision, terminal and storage medium | |
CN112466277A (en) | Rhythm model training method and device, electronic equipment and storage medium | |
CN117131155A (en) | Multi-category identification method, device, electronic equipment and storage medium | |
CN112100355A (en) | Intelligent interaction method, device and equipment | |
EP3185132B1 (en) | Method for writing a foreign language in a pseudo language phonetically resembling native language of the speaker |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180508 |