CN109284503B - Translation statement ending judgment method and system - Google Patents

Translation statement ending judgment method and system Download PDF

Info

Publication number
CN109284503B
CN109284503B CN201811226769.XA CN201811226769A CN109284503B CN 109284503 B CN109284503 B CN 109284503B CN 201811226769 A CN201811226769 A CN 201811226769A CN 109284503 B CN109284503 B CN 109284503B
Authority
CN
China
Prior art keywords
sentence
processed
current
comparison
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811226769.XA
Other languages
Chinese (zh)
Other versions
CN109284503A (en
Inventor
何恩培
郑丽华
王莲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongguancun Technology Leasing Co ltd
Original Assignee
Transn Iol Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transn Iol Technology Co ltd filed Critical Transn Iol Technology Co ltd
Priority to CN201811226769.XA priority Critical patent/CN109284503B/en
Publication of CN109284503A publication Critical patent/CN109284503A/en
Application granted granted Critical
Publication of CN109284503B publication Critical patent/CN109284503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a method and a system for judging the end of a translation sentence, which can accurately identify whether a section of continuous text ends to form a sentence from the text to be processed, thereby completing the judgment of the end of the sentence. The system comprises a text importing device, a paragraph recognition device, a sentence recognition device, a semantic combining device and a credibility judging device. The application recognizes sentences with complete meanings in the text to be processed by semantically rather than using punctuation marks as a judgment standard.

Description

Translation statement ending judgment method and system
Technical Field
The application belongs to the field of machine learning, and particularly relates to a translation sentence ending judgment method and system.
Background
In the translation process, a longer text to be translated is usually required to be cut. One of the requirements of the segmentation is that each segmented sub-part should be a complete and independent corpus, and the upper and lower half sentences of a sentence cannot be segmented into different sub-parts; in addition, the translation process usually requires the assistance of machine translation, and a translator usually needs to upload the text to be translated into a machine translation tool, and although the existing machine translation engine supports the uploading translation of the whole segment, the translation result is poor in this way, so that the translator usually needs to upload a single complete sentence one sentence by one sentence to obtain the result of relatively complete comparison; in another scenario, it is also necessary to check if the translated result is correct, and at this time, it is also necessary to upload text in complete sentence units for inspection. In this process, an important problem is faced: how to cut to get a complete sentence.
A simple way of determining is based on the sentence ending symbol, for example, it is generally considered that if a segment of continuous text ends with a period, question mark, exclamation mark, the sentence is considered to end, and the continuous text can be considered to constitute a complete sentence; based on the thought, sentence ending detection can be realized by adopting a mode of detecting specific symbols, so that sentence segmentation is completed. Of course, this approach provides the predetermined effect that the text to be processed is formed in strict compliance with punctuation usage rules.
Obviously, in the current language environment, few people strictly use punctuation marks according to regulations, most people never use periods except the end of paragraphs and the end of articles, and a comma is at the bottom or directly and continuously adopts semicolons; stated another way, the phenomenon of disuse of question marks, exclamation marks is common among a variety of special literature (e.g., growling). Therefore, sentences having complete meanings in the text cannot be accurately recognized only by the aforementioned judgment.
Disclosure of Invention
In order to solve the problems, particularly the problem that sentences in the complete meaning need to be accurately segmented in the translation process, the application provides a method and a system for judging the end of a translation sentence, which can accurately identify whether a section of continuous text ends to form a sentence from the text to be processed, thereby completing the judgment of the end of the sentence.
In a first aspect of the present application, there is provided a translation sentence end judgment system including a text importing device, a paragraph identifying device, a sentence identifying device, a semantic combining device, and a credibility judging device; in the concrete implementation, the text to be processed is imported into the system through the text importing device; then operating the paragraph identification device;
the paragraph identification device performs preliminary processing on the imported text to be processed to obtain a paragraph sub-part set taking the paragraph as a unit, for example, the beginning and the end of the paragraph are identified, and the full text end of the text to be processed can also be identified; then, the paragraph sub-part set enters a sentence recognition device segment by segment;
the sentence identifying device processes the paragraph sub-part set by taking paragraphs as units, and the specific processing steps comprise:
(1) Continuously reading the remaining characters from the first unread character of the current paragraph until the pause symbol is read; the read continuous characters form a sentence to be processed;
(2) Extracting a plurality of sentence trunk words from the sentence to be processed; the main words of the sentences are real words with action meanings;
(3) Inputting the plurality of sentence trunk words into the semantic combining device, wherein the semantic combining device outputs at least one comparison sentence based on a cloud corpus;
(4) Inputting the sentence to be processed and the at least one comparison into the credibility judging device;
(5) The reliability judging device outputs a judging result.
Detecting a pause sign means that consecutive characters that have been read are likely to form a complete sentence, have independent meaning, and are therefore considered potential candidate sentences; however, further judgment is needed for the potential candidate sentences to determine whether the candidate sentences are truly a complete independent sentence; taking the potential candidate sentences as sentences to be processed, and entering the next step of processing;
the next step of processing the sentence to be processed is the core of the technical scheme of the application. The treatment concept is as follows:
extracting a plurality of sentence trunk words from the sentence to be processed;
inputting the plurality of sentence trunk words into the semantic combining device, and outputting at least one comparison sentence by the semantic combining device based on a cloud corpus.
Based on automatic learning of a large-scale corpus, the application can realize automatic learning of texts and sentence writing. Of course, the comparison sentence generated based on the cloud corpus on the basis of extracting the trunk words of a plurality of sentences from the sentences to be processed is a complete independent sentence.
And then comparing the current sentence to be processed with the generated comparison sentence, so as to judge whether the current sentence to be processed is an independent sentence or not, wherein the process is realized by the credibility judging device.
The method specifically comprises the following steps:
inputting the sentence to be processed and the at least one comparison into the credibility judging device;
the reliability judging device outputs a judging result.
The specific decision criteria may be one or a combination of the following,
comparing the lengths of the current sentence to be processed and the generated comparison sentence, and judging whether the length difference is in a first threshold range or not;
performing similarity comparison on the current sentence to be processed and the generated comparison sentence, and judging whether the similarity is within a second threshold range or not;
the method for acquiring the length difference is simple and easy to realize; the method for comparing the similarity can adopt the text similarity comparison method existing in the prior art, and the application is not repeated.
If the length difference meets the first threshold range condition and/or the similarity meets the second threshold range condition, the reliability judging device judges that the current sentence to be processed is a complete sentence;
at this time, the current sentence to be processed of the text to be processed is already processed and recognized, and can be used for actual operations (segmentation or uploading, etc.); then, the technical scheme of the application continues to read the characters, and repeats the steps (1-5), namely, reads the next sentence to be processed, and judges whether the complete sentence is formed;
if the length difference does not meet the first threshold range condition and/or the similarity does not meet the second threshold range condition, the current sentence to be processed is not a complete sentence, and at this time, it indicates that more characters belonging to the sentence follow the current sentence to be processed, so the technical scheme of the present application further includes: continuously reading unread characters after the current pause symbol until the next pause symbol is read; the read continuous characters are added into the current sentence to be processed;
thus, the number of characters of the current sentence to be processed is increased, more sentence trunk words can be obtained, and then the steps (2-5) are repeated, so that the judgment of whether the sentence to be processed is a complete sentence can be realized.
Therefore, the technical scheme of the application can be realized by adopting a computer-flow instruction language, and the process of specifically identifying and judging as an iterative loop comprises an internal small loop of a single sentence to be processed, wherein the termination condition is that the current sentence to be processed already forms a complete sentence, and then the next sentence to be processed is identified and judged; when a text to be processed is input by taking a paragraph as a unit, the termination condition of the processing is that a paragraph ending mark is read; when the text to be processed is input in full text, the termination condition of the processing is that the full text ending mark is read.
Accordingly, in a second aspect of the present application, there is provided a computer-implemented translation sentence end judgment method for identifying a sentence having complete and independent meaning in a text currently to be processed, the method comprising the steps of: s1: reading a current unprocessed paragraph of a current text to be processed;
s2, starting to continuously read characters from the first unread character of the current unprocessed paragraph;
s3: judging whether the currently read character is a pause character or not; if yes, go to step S4; otherwise, repeating the step S2;
s4: extracting a plurality of sentence trunk words based on a current sentence to be processed formed by the read characters;
s5: outputting at least one comparison sentence according to the plurality of sentence trunk words;
s6: judging whether the current sentence to be processed forms a complete sentence or not based on the comparison of the at least one comparison sentence and the current sentence to be processed;
s7: judging whether the current pause symbol is a full-text ending marker, if so, ending the processing; otherwise, enter step S8;
s8, judging whether the current pause symbol is a paragraph end marker, if so, entering a step S1; otherwise, S2 is entered.
The step S5 specifically includes: inputting the plurality of sentence trunk words into a machine learning engine based on a cloud corpus, and outputting at least one comparison sentence;
wherein, step S6 includes: comparing the lengths of the current sentence to be processed and at least one comparison sentence, and judging whether the length difference is in a third threshold range or not; and/or comparing the similarity between the current sentence to be processed and at least one comparison sentence, and judging whether the similarity is within a fourth threshold range;
further, if the length difference and/or the similarity are within the corresponding threshold range, judging that the current sentence to be processed forms a complete sentence;
further, the threshold range may be adjustable. A threshold range adjustment module may be provided for adjusting the size of the first to fourth threshold ranges.
In a third aspect of the present application, a computer readable storage medium is provided, on which computer executable instructions are stored, and the executable instructions are executed by a computer memory and a processor, so as to implement the foregoing method for determining the end of a translation sentence according to the present application, where the method is used for identifying a sentence having complete and independent meaning in a text to be processed currently.
The technical scheme of the application at least achieves the following outstanding effects:
recognizing sentences with complete meanings in the text to be processed by semantically rather than using punctuation marks as judgment standards;
the judgment standard is based on large-scale semantic learning, and advanced technology of machine learning is combined;
although the automatic article generation technology based on semantic robots belongs to the prior art, the method is applied to translation corpus recognition for the first time; moreover, unlike the prior art, the object of the present application is not to generate text for generating text, but to use it as a criterion;
the prior art is to generate the whole article based on the existing keywords, which requires the output of the whole article to be unique and as accurate as possible, while the application focuses on the diversity of the output result based on the existing few keywords, so that the judgment is more accurate.
Further embodiments and advantages of the present application will be described in detail in the detailed description.
Drawings
Fig. 1 is a frame diagram of the translation sentence completion judgment system according to the present application.
Fig. 2 is a computer-implemented flowchart of the method of the present application.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
Referring to fig. 1, the system for judging the end of a translation sentence of the present application includes a text importing device, a paragraph identifying device, a sentence identifying device, a semantic combining device, and a credibility judging device.
In this embodiment, the text to be processed is imported into the system through the text importing device; then operating the paragraph identification device;
the paragraph identification device performs preliminary processing on the imported text to be processed to obtain a paragraph sub-part set taking the paragraph as a unit, for example, the beginning and the end of the paragraph are identified, and the full text end of the text to be processed can also be identified; then, the paragraph sub-part set enters a sentence recognition device segment by segment;
the sentence identifying device processes the paragraph sub-part set by taking paragraphs as units, and the specific processing steps comprise:
(1) Continuously reading the remaining characters from the first unread character of the current paragraph until the pause symbol is read; the read continuous characters form a sentence to be processed;
(2) Extracting a plurality of sentence trunk words from the sentence to be processed; the main words of the sentences are real words with action meanings;
(3) Inputting the plurality of sentence trunk words into the semantic combining device, wherein the semantic combining device outputs at least one comparison sentence based on a cloud corpus;
(4) Inputting the sentence to be processed and the at least one comparison into the credibility judging device;
(5) The reliability judging device outputs a judging result.
Wherein the first unread character of the current paragraph may be a single word, a word, and a punctuation mark that may be used at the beginning of the paragraph or sentence, such as left Shan Yinhao', left double quotation mark ", etc.;
normally, if the text to be processed uses punctuation marks strictly according to the punctuation mark using method, a complete sentence can be formed only by reading a period, a question mark and an exclamation mark, but as mentioned above, the text to be processed of the prior art is not strictly executed according to this standard. Therefore, to solve this problem, the present application discards the symbol judgment problem of the prior art, and starts reading from the first unread character of the current paragraph until the pause symbol is read, and the consecutive characters read constitute the sentence to be processed.
The pause symbol here refers to a punctuation symbol which is read and can represent the pause of a sentence, and comprises a sentence symbol, a question mark, an exclamation mark, a pause mark, a comma, a quotation mark (right single quotation mark, left Shan Yinhao), a semicolon and the like, wherein the symbol can cause the temporary pause of the sentence, and it can be understood that the pause of the sentence is not caused by the dash mark, the signature mark, the bracket and the like and is not regarded as the pause symbol; although a colon may be stopped, the portion following the colon is generally considered to be a continuation of the previous sentence; thus, a colon is also not considered a stall symbol; in addition, the technical scheme of the application comprises a paragraph identification device, so that the pause symbol also comprises a paragraph ending mark symbol and a full-text ending mark symbol which are identified by the paragraph identification device.
The above examples are merely illustrative and not exhaustive, and those skilled in the art may pre-establish a set of pause symbols for subsequent query determinations in particular implementations.
Detecting a pause sign means that consecutive characters that have been read are likely to form a complete sentence, have independent meaning, and are therefore considered potential candidate sentences; however, further judgment is needed for the potential candidate sentences to determine whether the candidate sentences are truly a complete independent sentence; taking the potential candidate sentences as sentences to be processed, and entering the next step of processing;
the next step of processing the sentence to be processed is the core of the technical scheme of the application. The treatment concept is as follows:
extracting a plurality of sentence trunk words from the sentence to be processed;
inputting the plurality of sentence trunk words into the semantic combining device, and outputting at least one comparison sentence by the semantic combining device based on a cloud corpus.
Specifically, the sentence to be processed is composed of a plurality of words, some of which are real words and some of which are imaginary words. The real words are words having actual meanings, such as "today", "work-down", "estimated", "submit", "line", etc.; by "article" is meant generally a connection, modification, etc., and individual words do not represent actual meanings such as "then", "and", "the", "should", "does", "such", etc.; in natural language processing, there are related prior arts for segmenting real words or imaginary words, and the segmentation or recognition standards may be different, but the specific meanings are consistent, which will not be repeated herein.
Based on the prior art of segmentation of real words or virtual words, the method extracts a plurality of sentence trunk words from the sentence to be processed, wherein the sentence trunk words can be the real words in the current sentence to be processed;
next, the plurality of sentence trunk words are input into the semantic combining device, which outputs at least one comparison sentence based on a cloud corpus.
Based on automatic learning of a large-scale corpus, the application can realize automatic learning of texts and sentence writing. Of course, similar machine learning techniques exist in the prior art, such as a robot news writer, an automatic article writer robot, etc. which have been realized in recent years, and these robots can automatically generate a news draft or an article through several trunk words (keywords, prompt words) and the like input by a user, and the effect is completely similar to the level of a professional news writer, and even readers cannot distinguish that the article is completed by the robot.
The present inventors have found that such machine learning tools are all based on automatic learning of a large corpus, and thus the present application may also provide a cloud-based corpus for machine learning to build a machine learning engine, such as the semantic combining means of the present application. And inputting the extracted plurality of sentence trunk words into the semantic combination device. Thus, the semantic combining device outputs at least one comparison sentence based on the cloud corpus, which is similar to the robot news writer and the automatic article writer robot described above.
Of course, the application does not need to output the whole news manuscript or the whole article, only needs to output a complete sentence, so the machine learning engine can be simpler and faster, the output result can be a plurality of sentences with complete meaning and completely independent, and the effect of the robot is better compared with the existing robot news writer and automatic article writer instead of only one result; this is because the inventors creatively used them for translating the specific needs of the embodiment.
Based on a large-scale corpus, the comparison sentence generated on the basis of extracting a plurality of sentence trunk words from the sentence to be processed is a complete independent sentence.
And then comparing the current sentence to be processed with the generated comparison sentence, so as to judge whether the current sentence to be processed is an independent sentence or not, wherein the process is realized by the credibility judging device.
The method specifically comprises the following steps:
inputting the sentence to be processed and the at least one comparison into the credibility judging device;
the reliability judging device outputs a judging result.
The specific decision criteria may be one or a combination of the following,
comparing the lengths of the current sentence to be processed and the generated comparison sentence, and judging whether the length difference is in a first threshold range or not;
performing similarity comparison on the current sentence to be processed and the generated comparison sentence, and judging whether the similarity is within a second threshold range or not;
the method for acquiring the length difference is simple and easy to realize; the method for comparing the similarity can adopt the text similarity comparison method existing in the prior art, and the application is not repeated.
If the length difference meets the first threshold range condition and/or the similarity meets the second threshold range condition, the reliability judging device judges that the current sentence to be processed is a complete sentence;
at this time, the current sentence to be processed of the text to be processed is already processed and recognized, and can be used for actual operations (segmentation or uploading, etc.); then, the technical scheme of the application continues to read the characters, and repeats the steps (1-5), namely, reads the next sentence to be processed, and judges whether the complete sentence is formed;
if the length difference does not meet the first threshold range condition and/or the similarity does not meet the second threshold range condition, the current sentence to be processed is not a complete sentence, and at this time, it indicates that more characters belonging to the sentence follow the current sentence to be processed, so the technical scheme of the present application further includes: continuously reading unread characters after the current pause symbol until the next pause symbol is read; the read continuous characters are added into the current sentence to be processed;
thus, the number of characters of the current sentence to be processed is increased, more sentence trunk words can be obtained, and then the steps (2-5) are repeated, so that the judgment of whether the sentence to be processed is a complete sentence can be realized.
Referring to fig. 2, a computer-implemented method for judging the end of a translation sentence is provided, and in this embodiment, the method specifically includes steps S1 to S8 of fig. 2.
Specifically, each step performs the following functions:
s1: reading a current unprocessed paragraph of a current text to be processed;
s2, starting to continuously read characters from the first unread character of the current unprocessed paragraph;
s3: judging whether the currently read character is a pause character or not; if yes, go to step S4; otherwise, repeating the step S2;
s4: extracting a plurality of sentence trunk words based on a current sentence to be processed formed by the read characters;
s5: outputting at least one comparison sentence according to the plurality of sentence trunk words;
s6: based on the comparison of the at least one comparison sentence and the current sentence to be processed, identifying whether the current sentence to be processed is a complete sentence;
s7: judging whether the current pause symbol is a full-text ending marker, if so, ending the processing; otherwise, enter step S8;
s8, judging whether the current pause symbol is a paragraph end marker, if so, entering a step S1; otherwise, S2 is entered.

Claims (5)

1. A translation sentence end judging system comprises a text importing device, a paragraph identifying device, a sentence identifying device, a semantic combining device and a credibility judging device; the text importing device imports a text to be processed, and the paragraph identifying device carries out preliminary processing on the imported text to be processed to obtain a paragraph sub-part set taking a paragraph as a unit;
the method is characterized in that:
the sentence identifying means processes the sub-portion set of paragraphs in units of paragraphs,
the specific processing steps comprise:
(1) Continuously reading the remaining characters from the first unread character of the current paragraph until the pause symbol is read; the read continuous characters form a sentence to be processed;
(2) Extracting a plurality of sentence trunk words from the sentence to be processed;
(3) Inputting the plurality of sentence trunk words into the semantic combination device, wherein the semantic combination device generates a comparison sentence based on the cloud corpus on the basis of the plurality of sentence trunk words extracted from the sentence to be processed, and the comparison sentence is an independent sentence with complete meaning;
(4) Inputting the sentence to be processed and the comparison sentence into the credibility judging device;
the credibility judging device outputs a judging result based on the comparison condition;
the comparison condition includes one or a combination of the following:
comparing the lengths of the current sentence to be processed and the generated comparison sentence, and judging whether the length difference is in a first threshold range or not; and comparing the similarity between the current sentence to be processed and the generated comparison sentence, and judging whether the similarity is within a second threshold range.
2. The translation sentence ending judgment system according to claim 1, wherein: the device also comprises a preset condition setting module which is used for adjusting the range of the preset conditions.
3. A computer-implemented translation sentence ending judgment method, the method comprising the steps of:
s1: reading a current unprocessed paragraph of a current text to be processed;
s2: continuously reading characters from a first unread character of a current unprocessed paragraph;
s3: judging whether the currently read character is a pause character or not; if yes, go to step S4; otherwise, repeating the step S2;
s4: extracting a plurality of sentence trunk words based on a current sentence to be processed formed by the read characters;
s5: outputting at least one comparison sentence according to the plurality of sentence trunk words;
s6: comparing the lengths of the current sentence to be processed and at least one comparison sentence, and judging whether the length difference is in a third threshold range or not; and/or comparing the similarity between the current sentence to be processed and at least one comparison sentence, and judging whether the similarity is within a fourth threshold range;
if the length difference and/or the similarity is within the corresponding threshold range, identifying that the current sentence to be processed forms a complete sentence;
s7: judging whether the current pause symbol is a full-text ending marker, if so, ending the processing; otherwise, enter step S8;
s8: judging whether the current pause symbol is a paragraph end marker, if so, entering a step S1; otherwise, entering S2; the step S5 specifically includes: inputting the plurality of sentence trunk words into a machine learning engine based on a cloud corpus, and generating comparison sentences on the basis of the plurality of sentence trunk words, wherein the comparison sentences are independent sentences with complete meanings.
4. The computer-implemented translation sentence ending judgment method according to claim 3, wherein: the threshold range is adjustable.
5. A computer-readable storage medium having stored thereon computer-executable instructions, which are executed by a computer memory and a processor, for implementing all the steps of a computer-implemented translation statement end judgment method as claimed in any one of the preceding claims 3 or 4.
CN201811226769.XA 2018-10-22 2018-10-22 Translation statement ending judgment method and system Active CN109284503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811226769.XA CN109284503B (en) 2018-10-22 2018-10-22 Translation statement ending judgment method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811226769.XA CN109284503B (en) 2018-10-22 2018-10-22 Translation statement ending judgment method and system

Publications (2)

Publication Number Publication Date
CN109284503A CN109284503A (en) 2019-01-29
CN109284503B true CN109284503B (en) 2023-08-18

Family

ID=65178226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811226769.XA Active CN109284503B (en) 2018-10-22 2018-10-22 Translation statement ending judgment method and system

Country Status (1)

Country Link
CN (1) CN109284503B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321532A (en) * 2019-06-06 2019-10-11 数译(成都)信息技术有限公司 Language pre-processes punctuate method, computer equipment and computer readable storage medium
CN111326154B (en) * 2020-03-02 2022-11-22 珠海格力电器股份有限公司 Voice interaction method and device, storage medium and electronic equipment
CN112464644B (en) * 2020-12-04 2024-03-29 北京中科凡语科技有限公司 Automatic sentence-breaking model building method and automatic sentence-breaking method
CN112711662A (en) * 2021-03-29 2021-04-27 贝壳找房(北京)科技有限公司 Text acquisition method and device, readable storage medium and electronic equipment
CN113836905B (en) * 2021-09-24 2023-08-08 网易(杭州)网络有限公司 Theme extraction method, device, terminal and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923540A (en) * 2010-07-20 2010-12-22 陈洁 Language translation quality auditing method
CN104750687A (en) * 2013-12-25 2015-07-01 株式会社东芝 Method for improving bilingual corpus, device for improving bilingual corpus, machine translation method and machine translation device
CN107305550A (en) * 2016-04-19 2017-10-31 中兴通讯股份有限公司 A kind of intelligent answer method and device
CN107766325A (en) * 2017-09-27 2018-03-06 百度在线网络技术(北京)有限公司 Text joining method and its device
CN108519970A (en) * 2018-02-06 2018-09-11 平安科技(深圳)有限公司 The identification method of sensitive information, electronic device and readable storage medium storing program for executing in text

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5666937B2 (en) * 2011-02-16 2015-02-12 株式会社東芝 Machine translation apparatus, machine translation method, and machine translation program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923540A (en) * 2010-07-20 2010-12-22 陈洁 Language translation quality auditing method
CN104750687A (en) * 2013-12-25 2015-07-01 株式会社东芝 Method for improving bilingual corpus, device for improving bilingual corpus, machine translation method and machine translation device
CN107305550A (en) * 2016-04-19 2017-10-31 中兴通讯股份有限公司 A kind of intelligent answer method and device
CN107766325A (en) * 2017-09-27 2018-03-06 百度在线网络技术(北京)有限公司 Text joining method and its device
CN108519970A (en) * 2018-02-06 2018-09-11 平安科技(深圳)有限公司 The identification method of sensitive information, electronic device and readable storage medium storing program for executing in text

Also Published As

Publication number Publication date
CN109284503A (en) 2019-01-29

Similar Documents

Publication Publication Date Title
CN109284503B (en) Translation statement ending judgment method and system
US10114809B2 (en) Method and apparatus for phonetically annotating text
WO2017177809A1 (en) Word segmentation method and system for language text
CN111753531A (en) Text error correction method and device based on artificial intelligence, computer equipment and storage medium
US9811517B2 (en) Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text
CN107341143B (en) Sentence continuity judgment method and device and electronic equipment
WO2014117553A1 (en) Method and system of adding punctuation and establishing language model
CN105760359B (en) Question processing system and method thereof
CN111046660B (en) Method and device for identifying text professional terms
CN110083832B (en) Article reprint relation identification method, device, equipment and readable storage medium
CN110096572B (en) Sample generation method, device and computer readable medium
CN110866095A (en) Text similarity determination method and related equipment
US20150055866A1 (en) Optical character recognition by iterative re-segmentation of text images using high-level cues
WO2020232864A1 (en) Data processing method and related apparatus
CN112016271A (en) Language style conversion model training method, text processing method and device
CN109325237B (en) Complete sentence recognition method and system for machine translation
CN110929514B (en) Text collation method, text collation apparatus, computer-readable storage medium, and electronic device
CN112632956A (en) Text matching method, device, terminal and storage medium
US11755659B2 (en) Document search device, document search program, and document search method
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN109446321B (en) Text classification method, text classification device, terminal and computer readable storage medium
CN108021918B (en) Character recognition method and device
CN110929749B (en) Text recognition method, text recognition device, text recognition medium and electronic equipment
CN114722153A (en) Intention classification method and device
CN107870905B (en) Method for identifying specific vocabulary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231219

Address after: 610, Floor 6, Block A, No. 2, Lize Middle Second Road, Chaoyang District, Beijing 100102

Patentee after: Zhongguancun Technology Leasing Co.,Ltd.

Address before: 430073 5th floor, building E2, Guanggu e city, Middle Software Park Road, Donghu hi tech Development Zone, Wuhan City, Hubei Province

Patentee before: TRANSN IOL TECHNOLOGY Co.,Ltd.