CN109284503A - Translate Statement Completion judgment method and system - Google Patents
Translate Statement Completion judgment method and system Download PDFInfo
- Publication number
- CN109284503A CN109284503A CN201811226769.XA CN201811226769A CN109284503A CN 109284503 A CN109284503 A CN 109284503A CN 201811226769 A CN201811226769 A CN 201811226769A CN 109284503 A CN109284503 A CN 109284503A
- Authority
- CN
- China
- Prior art keywords
- sentence
- paragraph
- text
- processed
- currently pending
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000013519 translation Methods 0.000 claims abstract description 13
- 230000000052 comparative effect Effects 0.000 claims description 30
- 238000012545 processing Methods 0.000 claims description 11
- 238000010801 machine learning Methods 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 4
- 230000008676 import Effects 0.000 claims description 2
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 230000008569 process Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 210000001072 colon Anatomy 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
Present applicant proposes a kind of translation Statement Completion judgment method and systems, can be recognized accurately whether one section of continuous text terminates to constitute a sentence from text to be processed, so that sentence completion terminates to judge.The system includes text gatherer, paragraph identification device, sentence identification device, semantic combination device and reliability discriminant device.The present invention identifies have the sentence of complete meaning in text to be processed from semantically rather than using punctuation mark as judgment criteria.
Description
Technical field
The application belongs to machine learning field more particularly to a kind of translation Statement Completion judgment method and system.
Background technique
In translation process, it usually needs to longer this progress of waiting for translating cutting.One necessary condition of cutting
It is each subdivision after cutting all should be a complete independent corpus, the upper lower half sentence of a sentence cannot be sliced into
In different subdivisions;In addition, translation process usually requires the auxiliary of machine translation, translator usually requires will be in waiting for translating sheet
It reaches in machine translation tools, although existing MT engine supports whole section of upload to translate, this mode is translated
As a result poor, therefore, translator is typically required one one and uploads single complete words, can just be compared
Compared with the result of completion;In another scene, it is also necessary to which whether the result after proof translation is correct, is also required at this time with complete
Sentence is that unit upload text is checked.In this process, the major issue faced is exactly: how cutting has obtained
Whole sentence.
One simple judgment mode is, using sentence terminating symbol as judgment basis, such as, it is generally recognized that if certain section connects
Continuous text is terminated with fullstop, question mark, exclamation mark, then it is assumed that the sentence terminates, it is believed that the continuous text constitutes one
Complete sentence;Based on this thinking, it can realize that sentence terminates detection to complete by the way of detecting specific symbol
Sentence cutting.Certainly, it is that text to be processed is just strictly observed when being formed that this mode, which can be realized the premise of desired effects,
Punctuation mark uses rule.
Obviously, in current language environment, few people use punctuation mark in strict accordance with regulation, and most people are in addition to section
It falls except end and article end, other parts never use fullstop, and a comma ceaselessly uses on earth or directly
Branch;Let alone the phenomenon that abusing question mark, exclamation mark, is accustomed to (such as roar body) in various special styles.Therefore, only
The sentence with complete meaning in text cannot be recognized accurately only with judgment mode above-mentioned.
Summary of the invention
To solve the above problems, especially need accurately to be syncopated as in complete meaning in translation process sentence the problem of,
Present applicant proposes a kind of translation Statement Completion judgment method and systems, and one section can be recognized accurately from text to be processed
Whether continuous text terminates to constitute a sentence, so that sentence completion terminates to judge.
In the first aspect of the invention, a kind of translation Statement Completion is provided and judges system, which includes that text imports
Device, paragraph identification device, sentence identification device, semantic combination device and reliability discriminant device;
When specific implementation, text to be processed is imported into the system by the text gatherer;Then described section is run
Fall identification device;
The paragraph identification device carries out preliminary treatment to the text to be processed of importing, obtains the paragraph sub-portion as unit of paragraph
Divide set, such as identify paragraph beginning and ending, may recognize that the full text of text to be processed ends up;Then, described section
Partial set of beginning enters sentence identification device paragraph by paragraph;
The sentence identification device is handled the paragraph subdivision set according to as unit of paragraph, specific processing step
Suddenly include:
(1) continuous since first for working as previous paragraphs does not read character to read remaining character, until reading pause symbol;
The continuation character of reading constitutes sentence to be processed;
(2) multiple sentence trunk words are extracted from the sentence to be processed;The sentence trunk word refers to the reality for having movement meaning
Word;
(3) the multiple sentence trunk word is inputted into the semantic combination device, the semantic combination device is based on cloud corpus
Library exports at least one comparative sentence;
(4) by the sentence to be processed, it is described at least one compare the input reliability discriminant device;
(5) the reliability discriminant device output differentiates result.
Detect pause symbol, it is meant that the continuation character read is possible to constitute a complete sentence, meaning
It is adopted independent, therefore, it is considered as potential candidate sentences;But potential candidate sentences, which also need further judgement just to can determine that, is
No be a meaning completely independent sentence really;Using these potential candidate sentences as sentence to be processed, into next step
Reason;
Sentence to be processed is handled in next step, then is the core place of the technical solution of the application.Processing design are as follows:
Multiple sentence trunk words are extracted from the sentence to be processed;
The multiple sentence trunk word is inputted into the semantic combination device, it is defeated that the semantic combination device is based on cloud corpus
At least one comparative sentence out.
Automatic study based on large-scale corpus, the automatic study that the application can be realized text are write with sentence.
Certainly, the comparative sentence generated on the basis of extracting multiple sentence trunk words from the sentence to be processed based on cloud corpus,
It itself is the independent sentence an of complete meaning.
Next, by currently pending sentence and this generate comparative sentence be compared, thus can judge currently to
Handling sentence is independent sentence, this process is realized by reliability discriminant device described herein.
It specifically includes:
By the sentence to be processed, it is described at least one compare the input reliability discriminant device;
The reliability discriminant device output differentiates result.
Specifically judgment criteria can be one of following or a combination thereof,
◆ whether the length of more currently pending sentence and this comparative sentence generated judges length difference in first threshold range
It is interior;
◆ currently pending sentence and this comparative sentence generated are subjected to similarity-rough set, judge similarity whether in second threshold
Within the scope of;
Wherein, the method for obtaining length difference is fairly simple, it is easy to accomplish;The method of similarity-rough set can then use the prior art
Existing text similarity comparative approach, the present invention repeat no more.
If length difference meets first threshold range condition, and/or, similarity meets second threshold range of condition, then may be used
Reliability discriminating gear determines that currently pending sentence is a complete sentence;
At this point, the currently pending sentence of text to be processed has handled and identified completion, it can be used for actual operation (cutting
Or upload etc.);Then, technical solution of the present invention continues to read character, and repeat the above steps (1-5), that is, reads next
Sentence to be processed determines whether to constitute complete words;
If length difference is unsatisfactory for first threshold range condition, and/or, similarity is unsatisfactory for second threshold range of condition, then when
Preceding sentence to be processed is not a complete words, at this time, then it represents that currently pending sentence is subsequent, and there are also more belong to the sentence
Character, therefore, technical solution of the present invention further comprises: character is not read after the continuous reading current dwell symbol of continuation,
Until reading next pause symbol;The continuation character of reading is added in currently pending sentence;
In this way, the character quantity of currently pending sentence increases, more sentence trunk words can be obtained, aforementioned step is next repeated
Suddenly (2-5), can be realized sentence to be processed whether be complete words judgement.
As it can be seen that technical solution of the present invention can be realized using the command language of computer process, it is specific to identify and sentence
Break as the process of an iterative cycles, including the inside partial circulating of single sentence to be processed, termination condition be currently to
Processing sentence has constituted a complete sentence, and the identification subsequently into next sentence to be processed judges;It is single with paragraph
When position inputs text to be processed, then the termination condition of this processing is to read paragraph closing tag;Text full text to be processed is defeated
Fashionable, the termination condition of this processing is to read full text closing tag.
Therefore, in the second aspect of the invention, provide a kind of computer implemented recognition methods, for identification currently to
The complete independent sentence of meaning in text is handled, described method includes following steps:
S1: the current untreated paragraph of currently pending text is read;
S2: continuous since first of current untreated paragraph is not read character to read character;
S3: whether the character that judgement is currently read is the symbol that pauses;If it is, entering step S4;Otherwise, step S2 is repeated;
S4: the currently pending sentence that the character based on reading is formed extracts multiple sentence trunk words;
S5: according to the multiple sentence trunk word, at least one comparative sentence is exported;
S6: based at least one described comparative sentence compared with currently pending sentence, judge whether currently pending sentence has been constituted
Whole sentence;
S7: judge whether current dwell symbol is full text closing tag symbol, if it is, ending processing;Otherwise, S8 is entered step;
S8: judge whether current dwell symbol is paragraph closing tag symbol, if it is, entering step S1;Otherwise, into S2.
Wherein, step S5 is specifically included: the multiple sentence trunk word is inputted the machine learning based on cloud corpus
Engine exports at least one comparative sentence;
Wherein, whether step S6 includes: the length of more currently pending sentence He at least one comparative sentence, judge length difference
In three threshold ranges;And/or currently pending sentence and at least one comparative sentence are subjected to similarity-rough set, judge that similarity is
It is no within four threshold ranges;
Further, if the length difference and/or similarity judge currently pending sentence within corresponding threshold range
Constitute complete sentence;
Further, the threshold range is adjustable.A threshold range adjustment module can be set, it is described for adjusting
The size of first to fourth threshold range.
The third aspect of the invention provides a kind of computer readable storage medium, and being stored thereon with computer can hold
Row instruction, by computer storage and processor, executes the executable instruction, for realizing a kind of present invention meter above-mentioned
The recognition methods that calculation machine is realized, for identification complete independent sentence of meaning in currently pending text.
Technical solution of the present invention has been at least up to following effect outstanding:
◆ from semantically rather than using punctuation mark as judgment criteria, identify have the sentence of complete meaning in text to be processed
Son;
◆ judgment criteria is based on extensive semantic study, and combines the advanced technology of machine learning;
◆ although the prior art is belonged to based on the automatic article generation technique of semantic machine people, the present invention is applied to for the first time
Translate corpus identification;Also, the purpose of the present invention and prior art difference, are not configured to generate text and generate text, but
As judgment criteria;
◆ the prior art is all based on the article that existing keyword generates entire chapter, it is required that the entire article of output is unique and most
May be accurate, and the present invention is concerned with the diversity based on existing a small amount of keyword output result, it is more acurrate in this way
For judging.
The present invention further implements and advantage will be illustrated in specific embodiment part.
Detailed description of the invention
Fig. 1 is the frame diagram that translation Statement Completion of the invention judges system
Fig. 2 is the method for the invention computer implementation flow chart
Specific embodiment
Referring to Fig. 1, a kind of translation Statement Completion of the invention judges system, which includes text gatherer, paragraph
Identification device, sentence identification device, semantic combination device and reliability discriminant device.
In the present embodiment, text to be processed is imported into the system by the text gatherer;Then institute is run
State paragraph identification device;
The paragraph identification device carries out preliminary treatment to the text to be processed of importing, obtains the paragraph sub-portion as unit of paragraph
Divide set, such as identify paragraph beginning and ending, may recognize that the full text of text to be processed ends up;Then, described section
Partial set of beginning enters sentence identification device paragraph by paragraph;
The sentence identification device is handled the paragraph subdivision set according to as unit of paragraph, specific processing step
Suddenly include:
(1) continuous since first for working as previous paragraphs does not read character to read remaining character, until reading pause symbol;
The continuation character of reading constitutes sentence to be processed;
(2) multiple sentence trunk words are extracted from the sentence to be processed;The sentence trunk word refers to the reality for having movement meaning
Word;
(3) the multiple sentence trunk word is inputted into the semantic combination device, the semantic combination device is based on cloud corpus
Library exports at least one comparative sentence;
(4) by the sentence to be processed, it is described at least one compare the input reliability discriminant device;
(5) the reliability discriminant device output differentiates result.
Wherein, not reading character when first of previous paragraphs can be single word, word and can be used in paragraph or sentence
Son beginning punctuation mark, such as single opening quote ", double opening quote " etc.;
For normal, if text to be processed uses punctuation mark in strict accordance with punctuation mark application method, only need to read
Can be formed by complete words until fullstop, question mark, exclamation mark, but as previously mentioned, the text to be processed of the prior art simultaneously
It is not necessarily to the execution of this standard.Therefore, to solve this problem, the symbol decision that the application has abandoned the prior art is asked
Topic, and read since first for working as previous paragraphs does not read character, until reading pause symbol, the continuation character of reading
Constitute sentence to be processed.
Here pause symbol refers to the punctuation mark for reading and can indicating that sentence pauses, including fullstop, question mark, sense
Exclamation, pause mark, comma, quotation marks (single closing quote, single opening quote), branch etc. can make the symbol of sentence pause, can be with
Understand, dash, punctuation marks used to enclose the title, bracket etc. will not cause sentence pause to be not intended as pause symbol;Although colon can pause,
The part after colon is still considered as the continuous of previous sentence under normal conditions;Therefore, colon is also not intended as pause symbol;This
Outside, the technical solution of the application includes paragraph identification device, and therefore, pause symbol further includes the section that paragraph identification device identifies
Fall closing tag symbol and full text closing tag symbol.
Examples detailed above is only to enumerate rather than exhaustive, and those skilled in the art in specific implementation, can pre-establish
One pause assemble of symbol is used for subsequent inquiry judging.
Detect pause symbol, it is meant that the continuation character read is possible to constitute a complete sentence, meaning
It is adopted independent, therefore, it is considered as potential candidate sentences;But potential candidate sentences, which also need further judgement just to can determine that, is
No be a meaning completely independent sentence really;Using these potential candidate sentences as sentence to be processed, into next step
Reason;
Sentence to be processed is handled in next step, then is the core place of the technical solution of the application.Processing design are as follows:
Multiple sentence trunk words are extracted from the sentence to be processed;
The multiple sentence trunk word is inputted into the semantic combination device, it is defeated that the semantic combination device is based on cloud corpus
At least one comparative sentence out.
Specifically, sentence to be processed is made of multiple words, these words are notional word a bit, some are function words.So-called notional word,
Refer to have the word of practical significance, such as " today ", " next ", " estimating ", " submit ", " line " etc.;So-called function word, then lead to
Often indicate connection relationship, modification etc., single word cannot embody practical significance, such as " ", " so ", "AND", " described ",
" the ", " should ", " does ", " such " etc.;In natural language processing, there are related arts for being syncopated as reality
Perhaps there may be differences for the standard of function word cutting or identification for word, but concrete meaning is consistent, and the application is herein not
It repeats again.
The prior art based on cutting notional word or function word, the application extract multiple sentence trunks from the sentence to be processed
Word, sentence trunk word here can be the notional word in currently pending sentence;
Next, the multiple sentence trunk word is inputted the semantic combination device, the semantic combination device is based on cloud
Corpus exports at least one comparative sentence.
Automatic study based on large-scale corpus, the automatic study that the application can be realized text are write with sentence.
Certainly, there is also similar machine learning techniques for the prior art, for example, have been carried out in recent years robotic news writer, from
Dynamic article writting robot etc., several trunk words (keyword, prompt word) etc. that these robots can be inputted by user, from
Dynamic to generate a news release or article, effect all can not close to the level of professional news copywriter or even reader completely
Differentiating article is completed by robot.
The inventors discovered that the automatic study that this kind of Machine learning tools are all based on Large Scale Corpus was completed, because
This, the application can also provide the corpus based on cloud for machine learning to establish machine learning engine, such as
Semantic combination device of the invention.Multiple sentence trunk words of aforementioned extraction are inputted into the semantic combination device again.In this way, institute
Predicate justice combination unit exports at least one comparative sentence based on cloud corpus, be similar to robotic news writer above-mentioned,
Automatic article writting robot completes work.
Certainly, the present invention does not need to export whole section of news release or the article of entire chapter, it is only necessary to export one completely
Sentence, therefore, machine learning engine of the invention can be more simple and quick, and it is complete that output result can be multiple meanings
Whole and completely self-contained sentence, rather than only one relative to existing robotic news writer, automatic article as a result, write
Write the better effect of robot;This is because inventor it is creative be used for translate the embodiment of special needs.
It is generated on the basis of extracting multiple sentence trunk words from the sentence to be processed based on large-scale corpus
Comparative sentence, the independent sentence of necessarily complete meaning itself.
Next, by currently pending sentence and this generate comparative sentence be compared, thus can judge currently to
Handling sentence is independent sentence, this process is realized by reliability discriminant device described herein.
It specifically includes:
By the sentence to be processed, it is described at least one compare the input reliability discriminant device;
The reliability discriminant device output differentiates result.
Specifically judgment criteria can be one of following or a combination thereof,
◆ whether the length of more currently pending sentence and this comparative sentence generated judges length difference in first threshold range
It is interior;
◆ currently pending sentence and this comparative sentence generated are subjected to similarity-rough set, judge similarity whether in second threshold
Within the scope of;
Wherein, the method for obtaining length difference is fairly simple, it is easy to accomplish;The method of similarity-rough set can then use the prior art
Existing text similarity comparative approach, the present invention repeat no more.
If length difference meets first threshold range condition, and/or, similarity meets second threshold range of condition, then may be used
Reliability discriminating gear determines that currently pending sentence is a complete sentence;
At this point, the currently pending sentence of text to be processed has handled and identified completion, it can be used for actual operation (cutting
Or upload etc.);Then, technical solution of the present invention continues to read character, and repeat the above steps (1-5), that is, reads next
Sentence to be processed determines whether to constitute complete words;
If length difference is unsatisfactory for first threshold range condition, and/or, similarity is unsatisfactory for second threshold range of condition, then when
Preceding sentence to be processed is not a complete words, at this time, then it represents that currently pending sentence is subsequent, and there are also more belong to the sentence
Character, therefore, technical solution of the present invention further comprises: character is not read after the continuous reading current dwell symbol of continuation,
Until reading next pause symbol;The continuation character of reading is added in currently pending sentence;
In this way, the character quantity of currently pending sentence increases, more sentence trunk words can be obtained, aforementioned step is next repeated
Suddenly (2-5), can be realized sentence to be processed whether be complete words judgement.
Referring to fig. 2, a kind of computer implemented recognition methods is provided, in this embodiment, this method specific implementation includes Fig. 2
Step S1-S8.
Specifically, the function that each step executes is as follows:
S1: the current untreated paragraph of currently pending text is read;
S2: continuous since first of current untreated paragraph is not read character to read character;
S3: whether the character that judgement is currently read is the symbol that pauses;If it is, entering step S4;Otherwise, step S2 is repeated;
S4: the currently pending sentence that the character based on reading is formed extracts multiple sentence trunk words;
S5: according to the multiple sentence trunk word, at least one comparative sentence is exported;
S6: based at least one described comparative sentence compared with currently pending sentence, identify whether currently pending sentence is complete
Sentence;
S7: judge whether current dwell symbol is full text closing tag symbol, if it is, ending processing;Otherwise, S8 is entered step;
S8: judge whether current dwell symbol is paragraph closing tag symbol, if it is, entering step S1;Otherwise, into S2.
Claims (10)
1. a kind of translation Statement Completion judges system, which includes text gatherer, paragraph identification device, sentence identification dress
It sets, semantic combination device and reliability discriminant device;The text gatherer imports text to be processed, the paragraph identification
Device carries out preliminary treatment to the text to be processed of importing, obtains the paragraph subdivision set as unit of paragraph;
It is characterized by:
The sentence identification device is handled the paragraph subdivision set according to as unit of paragraph,
Specifically processing step includes:
(1) continuous since first for working as previous paragraphs does not read character to read remaining character, until reading pause symbol;
The continuation character of reading constitutes sentence to be processed;
(2) multiple sentence trunk words are extracted from the sentence to be processed;
(3) the multiple sentence trunk word is inputted into the semantic combination device, the semantic combination device is based on cloud corpus
Library exports at least one comparative sentence;
(4) by the sentence to be processed, it is described at least one compare the input reliability discriminant device;
The reliability discriminant device output differentiates result.
2. the system as claimed in claim 1, wherein the semantic combination device is based on cloud corpus and exports at least one ratio
Compared with sentence, specifically include: cloud corpus generates ratio on the basis of the multiple sentence trunk words extracted from the sentence to be processed
Compared with sentence, the comparative sentence is the independent sentence for having complete meaning.
3. system as claimed in claim 1 or 2, wherein the reliability discriminant device output differentiates as a result, specifically including:
The comparative sentence of currently pending sentence and generation is compared, whether predetermined condition is met based on comparison condition, output differentiates knot
Fruit.
4. system as claimed in claim 3, wherein further include predetermined condition setup module, for adjusting the predetermined condition
Range.
5. a kind of computer implemented recognition methods, which is characterized in that described method includes following steps:
S1: the current untreated paragraph of currently pending text is read;
S2: continuous since first of current untreated paragraph is not read character to read character;
S3: whether the character that judgement is currently read is the symbol that pauses;If it is, entering step S4;Otherwise, step S2 is repeated;
S4: the currently pending sentence that the character based on reading is formed extracts multiple sentence trunk words;
S5: according to the multiple sentence trunk word, at least one comparative sentence is exported;
S6: based at least one described comparative sentence compared with currently pending sentence, identify whether currently pending sentence has been constituted
Whole sentence;
S7: judge whether current dwell symbol is full text closing tag symbol, if it is, ending processing;Otherwise, S8 is entered step;
S8: judge whether current dwell symbol is paragraph closing tag symbol, if it is, entering step S1;Otherwise, into S2.
6. method as claimed in claim 5, wherein step S5 is specifically included: the input of the multiple sentence trunk word is based on
The machine learning engine of cloud corpus exports at least one comparative sentence.
7. such as method described in claim 5 or 6, wherein step S6 includes: more currently pending sentence and at least one compares
Whether the length of sentence, judge length difference in third threshold range;And/or by currently pending sentence and at least one comparative sentence
Similarity-rough set is carried out, judges similarity whether within four threshold ranges.
8. the method for claim 7, further comprising: if the length difference, and/or, similarity is in corresponding threshold
Within the scope of value, then identify that currently pending sentence constitutes complete sentence.
9. method according to claim 8, wherein the threshold range is adjustable.
10. a kind of computer readable storage medium, is stored thereon with computer executable instructions, pass through computer storage and place
Device is managed, the executable instruction is executed, for realizing a kind of described in any item computer implemented knowledges of preceding claims 5-9
Other method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811226769.XA CN109284503B (en) | 2018-10-22 | 2018-10-22 | Translation statement ending judgment method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811226769.XA CN109284503B (en) | 2018-10-22 | 2018-10-22 | Translation statement ending judgment method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109284503A true CN109284503A (en) | 2019-01-29 |
CN109284503B CN109284503B (en) | 2023-08-18 |
Family
ID=65178226
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811226769.XA Active CN109284503B (en) | 2018-10-22 | 2018-10-22 | Translation statement ending judgment method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109284503B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321532A (en) * | 2019-06-06 | 2019-10-11 | 数译(成都)信息技术有限公司 | Language pre-processes punctuate method, computer equipment and computer readable storage medium |
CN111326154A (en) * | 2020-03-02 | 2020-06-23 | 珠海格力电器股份有限公司 | Voice interaction method and device, storage medium and electronic equipment |
CN112464644A (en) * | 2020-12-04 | 2021-03-09 | 北京中科凡语科技有限公司 | Automatic sentence-breaking model establishing method and automatic sentence-breaking method |
CN112711662A (en) * | 2021-03-29 | 2021-04-27 | 贝壳找房(北京)科技有限公司 | Text acquisition method and device, readable storage medium and electronic equipment |
CN113836905A (en) * | 2021-09-24 | 2021-12-24 | 网易(杭州)网络有限公司 | Theme extraction method and device, terminal and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923540A (en) * | 2010-07-20 | 2010-12-22 | 陈洁 | Language translation quality auditing method |
US20120209587A1 (en) * | 2011-02-16 | 2012-08-16 | Kabushiki Kaisha Toshiba | Machine translation apparatus, machine translation method and computer program product for machine tranalation |
CN104750687A (en) * | 2013-12-25 | 2015-07-01 | 株式会社东芝 | Method for improving bilingual corpus, device for improving bilingual corpus, machine translation method and machine translation device |
CN107305550A (en) * | 2016-04-19 | 2017-10-31 | 中兴通讯股份有限公司 | A kind of intelligent answer method and device |
CN107766325A (en) * | 2017-09-27 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Text joining method and its device |
CN108519970A (en) * | 2018-02-06 | 2018-09-11 | 平安科技(深圳)有限公司 | The identification method of sensitive information, electronic device and readable storage medium storing program for executing in text |
-
2018
- 2018-10-22 CN CN201811226769.XA patent/CN109284503B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923540A (en) * | 2010-07-20 | 2010-12-22 | 陈洁 | Language translation quality auditing method |
US20120209587A1 (en) * | 2011-02-16 | 2012-08-16 | Kabushiki Kaisha Toshiba | Machine translation apparatus, machine translation method and computer program product for machine tranalation |
CN104750687A (en) * | 2013-12-25 | 2015-07-01 | 株式会社东芝 | Method for improving bilingual corpus, device for improving bilingual corpus, machine translation method and machine translation device |
CN107305550A (en) * | 2016-04-19 | 2017-10-31 | 中兴通讯股份有限公司 | A kind of intelligent answer method and device |
CN107766325A (en) * | 2017-09-27 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Text joining method and its device |
CN108519970A (en) * | 2018-02-06 | 2018-09-11 | 平安科技(深圳)有限公司 | The identification method of sensitive information, electronic device and readable storage medium storing program for executing in text |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321532A (en) * | 2019-06-06 | 2019-10-11 | 数译(成都)信息技术有限公司 | Language pre-processes punctuate method, computer equipment and computer readable storage medium |
CN111326154A (en) * | 2020-03-02 | 2020-06-23 | 珠海格力电器股份有限公司 | Voice interaction method and device, storage medium and electronic equipment |
CN112464644A (en) * | 2020-12-04 | 2021-03-09 | 北京中科凡语科技有限公司 | Automatic sentence-breaking model establishing method and automatic sentence-breaking method |
CN112464644B (en) * | 2020-12-04 | 2024-03-29 | 北京中科凡语科技有限公司 | Automatic sentence-breaking model building method and automatic sentence-breaking method |
CN112711662A (en) * | 2021-03-29 | 2021-04-27 | 贝壳找房(北京)科技有限公司 | Text acquisition method and device, readable storage medium and electronic equipment |
CN113836905A (en) * | 2021-09-24 | 2021-12-24 | 网易(杭州)网络有限公司 | Theme extraction method and device, terminal and storage medium |
CN113836905B (en) * | 2021-09-24 | 2023-08-08 | 网易(杭州)网络有限公司 | Theme extraction method, device, terminal and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109284503B (en) | 2023-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109284503A (en) | Translate Statement Completion judgment method and system | |
CN107729300B (en) | Text similarity processing method, device and equipment and computer storage medium | |
US10714089B2 (en) | Speech recognition method and device based on a similarity of a word and N other similar words and similarity of the word and other words in its sentence | |
CN105975499B (en) | A kind of text subject detection method and system | |
CN107480143B (en) | Method and system for segmenting conversation topics based on context correlation | |
US20160306783A1 (en) | Method and apparatus for phonetically annotating text | |
CN104050256A (en) | Initiative study-based questioning and answering method and questioning and answering system adopting initiative study-based questioning and answering method | |
WO2014117553A1 (en) | Method and system of adding punctuation and establishing language model | |
US9811517B2 (en) | Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text | |
WO2017177809A1 (en) | Word segmentation method and system for language text | |
EP3620994A1 (en) | Methods, apparatuses, devices, and computer-readable storage media for determining category of entity | |
US20180157646A1 (en) | Command transformation method and system | |
CN108280057A (en) | A kind of microblogging rumour detection method based on BLSTM | |
US20200243082A1 (en) | Dialog system and dialog method | |
CN110427612A (en) | Based on multilingual entity disambiguation method, device, equipment and storage medium | |
CN112016271A (en) | Language style conversion model training method, text processing method and device | |
CN113468894A (en) | Dialogue interaction method and device, electronic equipment and computer-readable storage medium | |
CN109325237B (en) | Complete sentence recognition method and system for machine translation | |
EP4060526A1 (en) | Text processing method and device | |
TR202022040A1 (en) | A METHOD OF MEASURING TEXT SUMMARY SUCCESS THAT IS SENSITIVE TO SUBJECT CLASSIFICATION AND A SUMMARY SYSTEM USING THIS METHOD | |
CN104408036A (en) | Correlated topic recognition method and device | |
CN114090885B (en) | Product title core word extraction method, related device and computer program product | |
US20220245340A1 (en) | Electronic device for processing user's inquiry, and operation method of the electronic device | |
CN116049370A (en) | Information query method and training method and device of information generation model | |
CN111401069A (en) | Intention recognition method and intention recognition device for conversation text and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231219 Address after: 610, Floor 6, Block A, No. 2, Lize Middle Second Road, Chaoyang District, Beijing 100102 Patentee after: Zhongguancun Technology Leasing Co.,Ltd. Address before: 430073 5th floor, building E2, Guanggu e city, Middle Software Park Road, Donghu hi tech Development Zone, Wuhan City, Hubei Province Patentee before: TRANSN IOL TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right |