CN109284503A - Translate Statement Completion judgment method and system - Google Patents

Translate Statement Completion judgment method and system Download PDF

Info

Publication number
CN109284503A
CN109284503A CN201811226769.XA CN201811226769A CN109284503A CN 109284503 A CN109284503 A CN 109284503A CN 201811226769 A CN201811226769 A CN 201811226769A CN 109284503 A CN109284503 A CN 109284503A
Authority
CN
China
Prior art keywords
sentence
paragraph
text
processed
currently pending
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811226769.XA
Other languages
Chinese (zh)
Other versions
CN109284503B (en
Inventor
何恩培
郑丽华
王莲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongguancun Technology Leasing Co ltd
Original Assignee
Expressive Language Networking Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Expressive Language Networking Polytron Technologies Inc filed Critical Expressive Language Networking Polytron Technologies Inc
Priority to CN201811226769.XA priority Critical patent/CN109284503B/en
Publication of CN109284503A publication Critical patent/CN109284503A/en
Application granted granted Critical
Publication of CN109284503B publication Critical patent/CN109284503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

Present applicant proposes a kind of translation Statement Completion judgment method and systems, can be recognized accurately whether one section of continuous text terminates to constitute a sentence from text to be processed, so that sentence completion terminates to judge.The system includes text gatherer, paragraph identification device, sentence identification device, semantic combination device and reliability discriminant device.The present invention identifies have the sentence of complete meaning in text to be processed from semantically rather than using punctuation mark as judgment criteria.

Description

Translate Statement Completion judgment method and system
Technical field
The application belongs to machine learning field more particularly to a kind of translation Statement Completion judgment method and system.
Background technique
In translation process, it usually needs to longer this progress of waiting for translating cutting.One necessary condition of cutting It is each subdivision after cutting all should be a complete independent corpus, the upper lower half sentence of a sentence cannot be sliced into In different subdivisions;In addition, translation process usually requires the auxiliary of machine translation, translator usually requires will be in waiting for translating sheet It reaches in machine translation tools, although existing MT engine supports whole section of upload to translate, this mode is translated As a result poor, therefore, translator is typically required one one and uploads single complete words, can just be compared Compared with the result of completion;In another scene, it is also necessary to which whether the result after proof translation is correct, is also required at this time with complete Sentence is that unit upload text is checked.In this process, the major issue faced is exactly: how cutting has obtained Whole sentence.
One simple judgment mode is, using sentence terminating symbol as judgment basis, such as, it is generally recognized that if certain section connects Continuous text is terminated with fullstop, question mark, exclamation mark, then it is assumed that the sentence terminates, it is believed that the continuous text constitutes one Complete sentence;Based on this thinking, it can realize that sentence terminates detection to complete by the way of detecting specific symbol Sentence cutting.Certainly, it is that text to be processed is just strictly observed when being formed that this mode, which can be realized the premise of desired effects, Punctuation mark uses rule.
Obviously, in current language environment, few people use punctuation mark in strict accordance with regulation, and most people are in addition to section It falls except end and article end, other parts never use fullstop, and a comma ceaselessly uses on earth or directly Branch;Let alone the phenomenon that abusing question mark, exclamation mark, is accustomed to (such as roar body) in various special styles.Therefore, only The sentence with complete meaning in text cannot be recognized accurately only with judgment mode above-mentioned.
Summary of the invention
To solve the above problems, especially need accurately to be syncopated as in complete meaning in translation process sentence the problem of, Present applicant proposes a kind of translation Statement Completion judgment method and systems, and one section can be recognized accurately from text to be processed Whether continuous text terminates to constitute a sentence, so that sentence completion terminates to judge.
In the first aspect of the invention, a kind of translation Statement Completion is provided and judges system, which includes that text imports Device, paragraph identification device, sentence identification device, semantic combination device and reliability discriminant device;
When specific implementation, text to be processed is imported into the system by the text gatherer;Then described section is run Fall identification device;
The paragraph identification device carries out preliminary treatment to the text to be processed of importing, obtains the paragraph sub-portion as unit of paragraph Divide set, such as identify paragraph beginning and ending, may recognize that the full text of text to be processed ends up;Then, described section Partial set of beginning enters sentence identification device paragraph by paragraph;
The sentence identification device is handled the paragraph subdivision set according to as unit of paragraph, specific processing step Suddenly include:
(1) continuous since first for working as previous paragraphs does not read character to read remaining character, until reading pause symbol; The continuation character of reading constitutes sentence to be processed;
(2) multiple sentence trunk words are extracted from the sentence to be processed;The sentence trunk word refers to the reality for having movement meaning Word;
(3) the multiple sentence trunk word is inputted into the semantic combination device, the semantic combination device is based on cloud corpus Library exports at least one comparative sentence;
(4) by the sentence to be processed, it is described at least one compare the input reliability discriminant device;
(5) the reliability discriminant device output differentiates result.
Detect pause symbol, it is meant that the continuation character read is possible to constitute a complete sentence, meaning It is adopted independent, therefore, it is considered as potential candidate sentences;But potential candidate sentences, which also need further judgement just to can determine that, is No be a meaning completely independent sentence really;Using these potential candidate sentences as sentence to be processed, into next step Reason;
Sentence to be processed is handled in next step, then is the core place of the technical solution of the application.Processing design are as follows:
Multiple sentence trunk words are extracted from the sentence to be processed;
The multiple sentence trunk word is inputted into the semantic combination device, it is defeated that the semantic combination device is based on cloud corpus At least one comparative sentence out.
Automatic study based on large-scale corpus, the automatic study that the application can be realized text are write with sentence. Certainly, the comparative sentence generated on the basis of extracting multiple sentence trunk words from the sentence to be processed based on cloud corpus, It itself is the independent sentence an of complete meaning.
Next, by currently pending sentence and this generate comparative sentence be compared, thus can judge currently to Handling sentence is independent sentence, this process is realized by reliability discriminant device described herein.
It specifically includes:
By the sentence to be processed, it is described at least one compare the input reliability discriminant device;
The reliability discriminant device output differentiates result.
Specifically judgment criteria can be one of following or a combination thereof,
◆ whether the length of more currently pending sentence and this comparative sentence generated judges length difference in first threshold range It is interior;
◆ currently pending sentence and this comparative sentence generated are subjected to similarity-rough set, judge similarity whether in second threshold Within the scope of;
Wherein, the method for obtaining length difference is fairly simple, it is easy to accomplish;The method of similarity-rough set can then use the prior art Existing text similarity comparative approach, the present invention repeat no more.
If length difference meets first threshold range condition, and/or, similarity meets second threshold range of condition, then may be used Reliability discriminating gear determines that currently pending sentence is a complete sentence;
At this point, the currently pending sentence of text to be processed has handled and identified completion, it can be used for actual operation (cutting Or upload etc.);Then, technical solution of the present invention continues to read character, and repeat the above steps (1-5), that is, reads next Sentence to be processed determines whether to constitute complete words;
If length difference is unsatisfactory for first threshold range condition, and/or, similarity is unsatisfactory for second threshold range of condition, then when Preceding sentence to be processed is not a complete words, at this time, then it represents that currently pending sentence is subsequent, and there are also more belong to the sentence Character, therefore, technical solution of the present invention further comprises: character is not read after the continuous reading current dwell symbol of continuation, Until reading next pause symbol;The continuation character of reading is added in currently pending sentence;
In this way, the character quantity of currently pending sentence increases, more sentence trunk words can be obtained, aforementioned step is next repeated Suddenly (2-5), can be realized sentence to be processed whether be complete words judgement.
As it can be seen that technical solution of the present invention can be realized using the command language of computer process, it is specific to identify and sentence Break as the process of an iterative cycles, including the inside partial circulating of single sentence to be processed, termination condition be currently to Processing sentence has constituted a complete sentence, and the identification subsequently into next sentence to be processed judges;It is single with paragraph When position inputs text to be processed, then the termination condition of this processing is to read paragraph closing tag;Text full text to be processed is defeated Fashionable, the termination condition of this processing is to read full text closing tag.
Therefore, in the second aspect of the invention, provide a kind of computer implemented recognition methods, for identification currently to The complete independent sentence of meaning in text is handled, described method includes following steps:
S1: the current untreated paragraph of currently pending text is read;
S2: continuous since first of current untreated paragraph is not read character to read character;
S3: whether the character that judgement is currently read is the symbol that pauses;If it is, entering step S4;Otherwise, step S2 is repeated;
S4: the currently pending sentence that the character based on reading is formed extracts multiple sentence trunk words;
S5: according to the multiple sentence trunk word, at least one comparative sentence is exported;
S6: based at least one described comparative sentence compared with currently pending sentence, judge whether currently pending sentence has been constituted Whole sentence;
S7: judge whether current dwell symbol is full text closing tag symbol, if it is, ending processing;Otherwise, S8 is entered step;
S8: judge whether current dwell symbol is paragraph closing tag symbol, if it is, entering step S1;Otherwise, into S2.
Wherein, step S5 is specifically included: the multiple sentence trunk word is inputted the machine learning based on cloud corpus Engine exports at least one comparative sentence;
Wherein, whether step S6 includes: the length of more currently pending sentence He at least one comparative sentence, judge length difference In three threshold ranges;And/or currently pending sentence and at least one comparative sentence are subjected to similarity-rough set, judge that similarity is It is no within four threshold ranges;
Further, if the length difference and/or similarity judge currently pending sentence within corresponding threshold range Constitute complete sentence;
Further, the threshold range is adjustable.A threshold range adjustment module can be set, it is described for adjusting The size of first to fourth threshold range.
The third aspect of the invention provides a kind of computer readable storage medium, and being stored thereon with computer can hold Row instruction, by computer storage and processor, executes the executable instruction, for realizing a kind of present invention meter above-mentioned The recognition methods that calculation machine is realized, for identification complete independent sentence of meaning in currently pending text.
Technical solution of the present invention has been at least up to following effect outstanding:
◆ from semantically rather than using punctuation mark as judgment criteria, identify have the sentence of complete meaning in text to be processed Son;
◆ judgment criteria is based on extensive semantic study, and combines the advanced technology of machine learning;
◆ although the prior art is belonged to based on the automatic article generation technique of semantic machine people, the present invention is applied to for the first time Translate corpus identification;Also, the purpose of the present invention and prior art difference, are not configured to generate text and generate text, but As judgment criteria;
◆ the prior art is all based on the article that existing keyword generates entire chapter, it is required that the entire article of output is unique and most May be accurate, and the present invention is concerned with the diversity based on existing a small amount of keyword output result, it is more acurrate in this way For judging.
The present invention further implements and advantage will be illustrated in specific embodiment part.
Detailed description of the invention
Fig. 1 is the frame diagram that translation Statement Completion of the invention judges system
Fig. 2 is the method for the invention computer implementation flow chart
Specific embodiment
Referring to Fig. 1, a kind of translation Statement Completion of the invention judges system, which includes text gatherer, paragraph Identification device, sentence identification device, semantic combination device and reliability discriminant device.
In the present embodiment, text to be processed is imported into the system by the text gatherer;Then institute is run State paragraph identification device;
The paragraph identification device carries out preliminary treatment to the text to be processed of importing, obtains the paragraph sub-portion as unit of paragraph Divide set, such as identify paragraph beginning and ending, may recognize that the full text of text to be processed ends up;Then, described section Partial set of beginning enters sentence identification device paragraph by paragraph;
The sentence identification device is handled the paragraph subdivision set according to as unit of paragraph, specific processing step Suddenly include:
(1) continuous since first for working as previous paragraphs does not read character to read remaining character, until reading pause symbol; The continuation character of reading constitutes sentence to be processed;
(2) multiple sentence trunk words are extracted from the sentence to be processed;The sentence trunk word refers to the reality for having movement meaning Word;
(3) the multiple sentence trunk word is inputted into the semantic combination device, the semantic combination device is based on cloud corpus Library exports at least one comparative sentence;
(4) by the sentence to be processed, it is described at least one compare the input reliability discriminant device;
(5) the reliability discriminant device output differentiates result.
Wherein, not reading character when first of previous paragraphs can be single word, word and can be used in paragraph or sentence Son beginning punctuation mark, such as single opening quote ", double opening quote " etc.;
For normal, if text to be processed uses punctuation mark in strict accordance with punctuation mark application method, only need to read Can be formed by complete words until fullstop, question mark, exclamation mark, but as previously mentioned, the text to be processed of the prior art simultaneously It is not necessarily to the execution of this standard.Therefore, to solve this problem, the symbol decision that the application has abandoned the prior art is asked Topic, and read since first for working as previous paragraphs does not read character, until reading pause symbol, the continuation character of reading Constitute sentence to be processed.
Here pause symbol refers to the punctuation mark for reading and can indicating that sentence pauses, including fullstop, question mark, sense Exclamation, pause mark, comma, quotation marks (single closing quote, single opening quote), branch etc. can make the symbol of sentence pause, can be with Understand, dash, punctuation marks used to enclose the title, bracket etc. will not cause sentence pause to be not intended as pause symbol;Although colon can pause, The part after colon is still considered as the continuous of previous sentence under normal conditions;Therefore, colon is also not intended as pause symbol;This Outside, the technical solution of the application includes paragraph identification device, and therefore, pause symbol further includes the section that paragraph identification device identifies Fall closing tag symbol and full text closing tag symbol.
Examples detailed above is only to enumerate rather than exhaustive, and those skilled in the art in specific implementation, can pre-establish One pause assemble of symbol is used for subsequent inquiry judging.
Detect pause symbol, it is meant that the continuation character read is possible to constitute a complete sentence, meaning It is adopted independent, therefore, it is considered as potential candidate sentences;But potential candidate sentences, which also need further judgement just to can determine that, is No be a meaning completely independent sentence really;Using these potential candidate sentences as sentence to be processed, into next step Reason;
Sentence to be processed is handled in next step, then is the core place of the technical solution of the application.Processing design are as follows:
Multiple sentence trunk words are extracted from the sentence to be processed;
The multiple sentence trunk word is inputted into the semantic combination device, it is defeated that the semantic combination device is based on cloud corpus At least one comparative sentence out.
Specifically, sentence to be processed is made of multiple words, these words are notional word a bit, some are function words.So-called notional word, Refer to have the word of practical significance, such as " today ", " next ", " estimating ", " submit ", " line " etc.;So-called function word, then lead to Often indicate connection relationship, modification etc., single word cannot embody practical significance, such as " ", " so ", "AND", " described ", " the ", " should ", " does ", " such " etc.;In natural language processing, there are related arts for being syncopated as reality Perhaps there may be differences for the standard of function word cutting or identification for word, but concrete meaning is consistent, and the application is herein not It repeats again.
The prior art based on cutting notional word or function word, the application extract multiple sentence trunks from the sentence to be processed Word, sentence trunk word here can be the notional word in currently pending sentence;
Next, the multiple sentence trunk word is inputted the semantic combination device, the semantic combination device is based on cloud Corpus exports at least one comparative sentence.
Automatic study based on large-scale corpus, the automatic study that the application can be realized text are write with sentence. Certainly, there is also similar machine learning techniques for the prior art, for example, have been carried out in recent years robotic news writer, from Dynamic article writting robot etc., several trunk words (keyword, prompt word) etc. that these robots can be inputted by user, from Dynamic to generate a news release or article, effect all can not close to the level of professional news copywriter or even reader completely Differentiating article is completed by robot.
The inventors discovered that the automatic study that this kind of Machine learning tools are all based on Large Scale Corpus was completed, because This, the application can also provide the corpus based on cloud for machine learning to establish machine learning engine, such as Semantic combination device of the invention.Multiple sentence trunk words of aforementioned extraction are inputted into the semantic combination device again.In this way, institute Predicate justice combination unit exports at least one comparative sentence based on cloud corpus, be similar to robotic news writer above-mentioned, Automatic article writting robot completes work.
Certainly, the present invention does not need to export whole section of news release or the article of entire chapter, it is only necessary to export one completely Sentence, therefore, machine learning engine of the invention can be more simple and quick, and it is complete that output result can be multiple meanings Whole and completely self-contained sentence, rather than only one relative to existing robotic news writer, automatic article as a result, write Write the better effect of robot;This is because inventor it is creative be used for translate the embodiment of special needs.
It is generated on the basis of extracting multiple sentence trunk words from the sentence to be processed based on large-scale corpus Comparative sentence, the independent sentence of necessarily complete meaning itself.
Next, by currently pending sentence and this generate comparative sentence be compared, thus can judge currently to Handling sentence is independent sentence, this process is realized by reliability discriminant device described herein.
It specifically includes:
By the sentence to be processed, it is described at least one compare the input reliability discriminant device;
The reliability discriminant device output differentiates result.
Specifically judgment criteria can be one of following or a combination thereof,
◆ whether the length of more currently pending sentence and this comparative sentence generated judges length difference in first threshold range It is interior;
◆ currently pending sentence and this comparative sentence generated are subjected to similarity-rough set, judge similarity whether in second threshold Within the scope of;
Wherein, the method for obtaining length difference is fairly simple, it is easy to accomplish;The method of similarity-rough set can then use the prior art Existing text similarity comparative approach, the present invention repeat no more.
If length difference meets first threshold range condition, and/or, similarity meets second threshold range of condition, then may be used Reliability discriminating gear determines that currently pending sentence is a complete sentence;
At this point, the currently pending sentence of text to be processed has handled and identified completion, it can be used for actual operation (cutting Or upload etc.);Then, technical solution of the present invention continues to read character, and repeat the above steps (1-5), that is, reads next Sentence to be processed determines whether to constitute complete words;
If length difference is unsatisfactory for first threshold range condition, and/or, similarity is unsatisfactory for second threshold range of condition, then when Preceding sentence to be processed is not a complete words, at this time, then it represents that currently pending sentence is subsequent, and there are also more belong to the sentence Character, therefore, technical solution of the present invention further comprises: character is not read after the continuous reading current dwell symbol of continuation, Until reading next pause symbol;The continuation character of reading is added in currently pending sentence;
In this way, the character quantity of currently pending sentence increases, more sentence trunk words can be obtained, aforementioned step is next repeated Suddenly (2-5), can be realized sentence to be processed whether be complete words judgement.
Referring to fig. 2, a kind of computer implemented recognition methods is provided, in this embodiment, this method specific implementation includes Fig. 2 Step S1-S8.
Specifically, the function that each step executes is as follows:
S1: the current untreated paragraph of currently pending text is read;
S2: continuous since first of current untreated paragraph is not read character to read character;
S3: whether the character that judgement is currently read is the symbol that pauses;If it is, entering step S4;Otherwise, step S2 is repeated;
S4: the currently pending sentence that the character based on reading is formed extracts multiple sentence trunk words;
S5: according to the multiple sentence trunk word, at least one comparative sentence is exported;
S6: based at least one described comparative sentence compared with currently pending sentence, identify whether currently pending sentence is complete Sentence;
S7: judge whether current dwell symbol is full text closing tag symbol, if it is, ending processing;Otherwise, S8 is entered step;
S8: judge whether current dwell symbol is paragraph closing tag symbol, if it is, entering step S1;Otherwise, into S2.

Claims (10)

1. a kind of translation Statement Completion judges system, which includes text gatherer, paragraph identification device, sentence identification dress It sets, semantic combination device and reliability discriminant device;The text gatherer imports text to be processed, the paragraph identification Device carries out preliminary treatment to the text to be processed of importing, obtains the paragraph subdivision set as unit of paragraph;
It is characterized by:
The sentence identification device is handled the paragraph subdivision set according to as unit of paragraph,
Specifically processing step includes:
(1) continuous since first for working as previous paragraphs does not read character to read remaining character, until reading pause symbol; The continuation character of reading constitutes sentence to be processed;
(2) multiple sentence trunk words are extracted from the sentence to be processed;
(3) the multiple sentence trunk word is inputted into the semantic combination device, the semantic combination device is based on cloud corpus Library exports at least one comparative sentence;
(4) by the sentence to be processed, it is described at least one compare the input reliability discriminant device;
The reliability discriminant device output differentiates result.
2. the system as claimed in claim 1, wherein the semantic combination device is based on cloud corpus and exports at least one ratio Compared with sentence, specifically include: cloud corpus generates ratio on the basis of the multiple sentence trunk words extracted from the sentence to be processed Compared with sentence, the comparative sentence is the independent sentence for having complete meaning.
3. system as claimed in claim 1 or 2, wherein the reliability discriminant device output differentiates as a result, specifically including: The comparative sentence of currently pending sentence and generation is compared, whether predetermined condition is met based on comparison condition, output differentiates knot Fruit.
4. system as claimed in claim 3, wherein further include predetermined condition setup module, for adjusting the predetermined condition Range.
5. a kind of computer implemented recognition methods, which is characterized in that described method includes following steps:
S1: the current untreated paragraph of currently pending text is read;
S2: continuous since first of current untreated paragraph is not read character to read character;
S3: whether the character that judgement is currently read is the symbol that pauses;If it is, entering step S4;Otherwise, step S2 is repeated;
S4: the currently pending sentence that the character based on reading is formed extracts multiple sentence trunk words;
S5: according to the multiple sentence trunk word, at least one comparative sentence is exported;
S6: based at least one described comparative sentence compared with currently pending sentence, identify whether currently pending sentence has been constituted Whole sentence;
S7: judge whether current dwell symbol is full text closing tag symbol, if it is, ending processing;Otherwise, S8 is entered step;
S8: judge whether current dwell symbol is paragraph closing tag symbol, if it is, entering step S1;Otherwise, into S2.
6. method as claimed in claim 5, wherein step S5 is specifically included: the input of the multiple sentence trunk word is based on The machine learning engine of cloud corpus exports at least one comparative sentence.
7. such as method described in claim 5 or 6, wherein step S6 includes: more currently pending sentence and at least one compares Whether the length of sentence, judge length difference in third threshold range;And/or by currently pending sentence and at least one comparative sentence Similarity-rough set is carried out, judges similarity whether within four threshold ranges.
8. the method for claim 7, further comprising: if the length difference, and/or, similarity is in corresponding threshold Within the scope of value, then identify that currently pending sentence constitutes complete sentence.
9. method according to claim 8, wherein the threshold range is adjustable.
10. a kind of computer readable storage medium, is stored thereon with computer executable instructions, pass through computer storage and place Device is managed, the executable instruction is executed, for realizing a kind of described in any item computer implemented knowledges of preceding claims 5-9 Other method.
CN201811226769.XA 2018-10-22 2018-10-22 Translation statement ending judgment method and system Active CN109284503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811226769.XA CN109284503B (en) 2018-10-22 2018-10-22 Translation statement ending judgment method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811226769.XA CN109284503B (en) 2018-10-22 2018-10-22 Translation statement ending judgment method and system

Publications (2)

Publication Number Publication Date
CN109284503A true CN109284503A (en) 2019-01-29
CN109284503B CN109284503B (en) 2023-08-18

Family

ID=65178226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811226769.XA Active CN109284503B (en) 2018-10-22 2018-10-22 Translation statement ending judgment method and system

Country Status (1)

Country Link
CN (1) CN109284503B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321532A (en) * 2019-06-06 2019-10-11 数译(成都)信息技术有限公司 Language pre-processes punctuate method, computer equipment and computer readable storage medium
CN111326154A (en) * 2020-03-02 2020-06-23 珠海格力电器股份有限公司 Voice interaction method and device, storage medium and electronic equipment
CN112464644A (en) * 2020-12-04 2021-03-09 北京中科凡语科技有限公司 Automatic sentence-breaking model establishing method and automatic sentence-breaking method
CN112711662A (en) * 2021-03-29 2021-04-27 贝壳找房(北京)科技有限公司 Text acquisition method and device, readable storage medium and electronic equipment
CN113836905A (en) * 2021-09-24 2021-12-24 网易(杭州)网络有限公司 Theme extraction method and device, terminal and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923540A (en) * 2010-07-20 2010-12-22 陈洁 Language translation quality auditing method
US20120209587A1 (en) * 2011-02-16 2012-08-16 Kabushiki Kaisha Toshiba Machine translation apparatus, machine translation method and computer program product for machine tranalation
CN104750687A (en) * 2013-12-25 2015-07-01 株式会社东芝 Method for improving bilingual corpus, device for improving bilingual corpus, machine translation method and machine translation device
CN107305550A (en) * 2016-04-19 2017-10-31 中兴通讯股份有限公司 A kind of intelligent answer method and device
CN107766325A (en) * 2017-09-27 2018-03-06 百度在线网络技术(北京)有限公司 Text joining method and its device
CN108519970A (en) * 2018-02-06 2018-09-11 平安科技(深圳)有限公司 The identification method of sensitive information, electronic device and readable storage medium storing program for executing in text

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923540A (en) * 2010-07-20 2010-12-22 陈洁 Language translation quality auditing method
US20120209587A1 (en) * 2011-02-16 2012-08-16 Kabushiki Kaisha Toshiba Machine translation apparatus, machine translation method and computer program product for machine tranalation
CN104750687A (en) * 2013-12-25 2015-07-01 株式会社东芝 Method for improving bilingual corpus, device for improving bilingual corpus, machine translation method and machine translation device
CN107305550A (en) * 2016-04-19 2017-10-31 中兴通讯股份有限公司 A kind of intelligent answer method and device
CN107766325A (en) * 2017-09-27 2018-03-06 百度在线网络技术(北京)有限公司 Text joining method and its device
CN108519970A (en) * 2018-02-06 2018-09-11 平安科技(深圳)有限公司 The identification method of sensitive information, electronic device and readable storage medium storing program for executing in text

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321532A (en) * 2019-06-06 2019-10-11 数译(成都)信息技术有限公司 Language pre-processes punctuate method, computer equipment and computer readable storage medium
CN111326154A (en) * 2020-03-02 2020-06-23 珠海格力电器股份有限公司 Voice interaction method and device, storage medium and electronic equipment
CN112464644A (en) * 2020-12-04 2021-03-09 北京中科凡语科技有限公司 Automatic sentence-breaking model establishing method and automatic sentence-breaking method
CN112464644B (en) * 2020-12-04 2024-03-29 北京中科凡语科技有限公司 Automatic sentence-breaking model building method and automatic sentence-breaking method
CN112711662A (en) * 2021-03-29 2021-04-27 贝壳找房(北京)科技有限公司 Text acquisition method and device, readable storage medium and electronic equipment
CN113836905A (en) * 2021-09-24 2021-12-24 网易(杭州)网络有限公司 Theme extraction method and device, terminal and storage medium
CN113836905B (en) * 2021-09-24 2023-08-08 网易(杭州)网络有限公司 Theme extraction method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN109284503B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN109284503A (en) Translate Statement Completion judgment method and system
CN107729300B (en) Text similarity processing method, device and equipment and computer storage medium
US10714089B2 (en) Speech recognition method and device based on a similarity of a word and N other similar words and similarity of the word and other words in its sentence
CN105975499B (en) A kind of text subject detection method and system
CN107480143B (en) Method and system for segmenting conversation topics based on context correlation
US20160306783A1 (en) Method and apparatus for phonetically annotating text
CN104050256A (en) Initiative study-based questioning and answering method and questioning and answering system adopting initiative study-based questioning and answering method
WO2014117553A1 (en) Method and system of adding punctuation and establishing language model
US9811517B2 (en) Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text
WO2017177809A1 (en) Word segmentation method and system for language text
EP3620994A1 (en) Methods, apparatuses, devices, and computer-readable storage media for determining category of entity
US20180157646A1 (en) Command transformation method and system
CN108280057A (en) A kind of microblogging rumour detection method based on BLSTM
US20200243082A1 (en) Dialog system and dialog method
CN110427612A (en) Based on multilingual entity disambiguation method, device, equipment and storage medium
CN112016271A (en) Language style conversion model training method, text processing method and device
CN113468894A (en) Dialogue interaction method and device, electronic equipment and computer-readable storage medium
CN109325237B (en) Complete sentence recognition method and system for machine translation
EP4060526A1 (en) Text processing method and device
TR202022040A1 (en) A METHOD OF MEASURING TEXT SUMMARY SUCCESS THAT IS SENSITIVE TO SUBJECT CLASSIFICATION AND A SUMMARY SYSTEM USING THIS METHOD
CN104408036A (en) Correlated topic recognition method and device
CN114090885B (en) Product title core word extraction method, related device and computer program product
US20220245340A1 (en) Electronic device for processing user's inquiry, and operation method of the electronic device
CN116049370A (en) Information query method and training method and device of information generation model
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231219

Address after: 610, Floor 6, Block A, No. 2, Lize Middle Second Road, Chaoyang District, Beijing 100102

Patentee after: Zhongguancun Technology Leasing Co.,Ltd.

Address before: 430073 5th floor, building E2, Guanggu e city, Middle Software Park Road, Donghu hi tech Development Zone, Wuhan City, Hubei Province

Patentee before: TRANSN IOL TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right