CN111090986A - Method for correcting errors of official document - Google Patents

Method for correcting errors of official document Download PDF

Info

Publication number
CN111090986A
CN111090986A CN201911197178.9A CN201911197178A CN111090986A CN 111090986 A CN111090986 A CN 111090986A CN 201911197178 A CN201911197178 A CN 201911197178A CN 111090986 A CN111090986 A CN 111090986A
Authority
CN
China
Prior art keywords
error
document
model
word
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911197178.9A
Other languages
Chinese (zh)
Inventor
李建华
谢可
庄莉
梁懿
苏江文
王秋琳
刘泽三
邱镇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Information and Telecommunication Co Ltd
Fujian Yirong Information Technology Co Ltd
Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd
Original Assignee
State Grid Information and Telecommunication Co Ltd
Fujian Yirong Information Technology Co Ltd
Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Information and Telecommunication Co Ltd, Fujian Yirong Information Technology Co Ltd, Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd filed Critical State Grid Information and Telecommunication Co Ltd
Priority to CN201911197178.9A priority Critical patent/CN111090986A/en
Publication of CN111090986A publication Critical patent/CN111090986A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

A method for correcting the error of official document includes such steps as detecting the document type, training the document type recognition model by machine learning, and classifying the document type into the types of notice, report, reply, report, letter, meeting summary and request; the error detection step comprises the steps of cutting words through a Chinese word segmentation device, detecting errors from the aspects of word granularity and word granularity, and integrating suspected error results of the two granularity detections to form a suspected error position candidate set; and (3) using a bidirectional character level N-gram LM deep learning model to score characters in the sentence, regarding the position with low score as a position to be corrected, combining the position to be corrected with the context for dictionary word searching, and when all combinations cannot be searched in the dictionary, regarding the combinations as wrong words and adding a wrong position candidate set. The scheme combines the examination requirements of the technical scheme such as the rule of the Chinese language, incomplete content, unclear standing problem, grammar error correction, smoothness detection, context association and the like, and the technical scheme is innovated, reformed and combined, so that the error correction effect of the electronic official document of the enterprise can be effectively improved through testing.

Description

Method for correcting errors of official document
Technical Field
The invention relates to the field of text analysis, in particular to a method for assisting official document error correction.
Background
With the continuous promotion of informatization construction and the rapid development of paperless office work, business departments at all levels generate a large number of electronic documents which serve as information resources for enterprise production and operation, the document quality control and management are directly related to enterprise image and office efficiency, and particularly, the guarantee of the document quality of enterprise official documents is work at a very rich challenge and professional level. Therefore, real-time ubiquitous guiding, error correcting and assisting functions are provided, full help of a manuscript imitation person in a manuscript imitation process is guaranteed to the maximum extent, and quality management of official document contents can be really enhanced from the source.
Although the quality problems of the documents of the enterprise documents are complicated and have different performances, the quality problems can be generally classified into two types: form and content. Namely, a formal problem represented by an element layout and a format error and a content problem represented by an element content deviation. The official document management system can intelligently guide and control the official document format, the literary rule and the like of the official document in real time, and integrates the official document management approval rule and the operation management and control thought of a company into the error correction and check of the electronic official document through a clear and friendly human-computer interaction interface, so that the official document management quality of an enterprise can be greatly improved, the standardized and informationized development is promoted, and the enterprise development is assisted.
The invention provides a method and a system for correcting the error of an enterprise electronic official document, which fully utilize the characteristics of the enterprise official document and design a targeted algorithm and a solution, thereby effectively improving the accuracy, coverage and effect of correcting the error of the enterprise official document.
Disclosure of Invention
Therefore, a method for correcting errors of the document of the official document needs to be provided, and the problem that the error correction of a specific type of document is not comprehensive enough is solved.
In order to achieve the above object, the inventor provides a method for correcting the error of a document of a official document, which comprises the steps of detecting the document type, training a document type recognition model by using machine learning, and classifying the document type into the types of notification, report, batch, report, letter, meeting summary and request;
the error detection step comprises the steps of cutting words through a Chinese word segmentation device, detecting errors from the aspects of word granularity and word granularity, and integrating suspected error results of the two granularity detections to form a suspected error position candidate set;
using a bidirectional character level N-gram LM deep learning model to score characters in a sentence, regarding the part with low score as a position to be corrected, combining the position to be corrected with a context for dictionary word searching, and when all combinations cannot be searched in a dictionary, regarding the combinations as wrong words and adding a wrong position candidate set;
judging whether the input word sequence is in accordance with a given grammar through a traditional language model, analyzing the syntactic structure of a sentence in accordance with the grammar, scoring, and bringing the syntactic structure which is lower than a threshold value into a standard error candidate set;
a knowledge calculation step, which is to correct errors by using local knowledge of text association and text understanding two dimensions, wherein the associated knowledge correction comprises supplementing accurate local knowledge related to an original title by a mode of searching or context pattern matching of the original wrong title in a standard corpus, and using the local knowledge to assist in error correction sequencing; the text understanding and error correction comprises the steps of carrying out semantic analysis on a text to obtain semantic features, and utilizing an LSTMs model to express and apply the semantic features to an error correction sequencing model.
And further, the method also comprises a candidate recall, and also comprises the step of generating the candidate recall by combining the official document specification and content detection and generating an error correction candidate based on an HMM and a graph theory method.
Specifically, the establishment of the culture recognition model comprises the following steps,
based on the dictionary matching method, the vocabulary in the word stock with the culture type K is searched in the text,
and extracting a lexical expression of each title from the text, screening out a newly added lexical expression model, adding the lexical expression model into a candidate pattern library with the type of K, calculating the score of each candidate pattern, and selecting the pattern with the score larger than a threshold value T1 to be added into a pattern library T with the type of K.
Compared with the prior art, the method can fully utilize the characteristics (including clear politics, manufactured and issued by legal authors, legal authority and administrative constraint, strict timeliness and specific style) of the electronic document of the enterprise, combines the examination and check requirements of administrative specification, incomplete content, unclear standing question, grammar error correction, smoothness detection, context association and the like, innovatively modifies and combines the prior technical scheme, and can effectively improve the error correction effect of the electronic document of the enterprise through tests.
Drawings
FIG. 1 is a flow chart according to an embodiment of the present invention;
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
The method for correcting the error of the enterprise official document is to carry out all-around error detection on spelling errors, word errors, grammatical errors, syntax errors, literary specification errors and management and control specifications of company operation on the given enterprise official document, and key steps of document error correction core are error detection, candidate recall and error correction evaluation sequencing. Different from the existing method, the method aims to solve the problem of error correction of the electronic document of the enterprise, integrates the solution thought of the rule and the deep learning module, adds the auditing requirement of the processing regulation of the electronic document of the enterprise and the management and control specification of the enterprise, and constructs the error correction solution thought corresponding to the document of the enterprise. The enterprise electronic official document mainly comprises types of notification, report, batch reply, report, letter, conference summary, letter, request and the like, and has the following remarkable characteristics:
1) there is a clear form of the literary specification, and in order to maintain the seriousness and beauty of the official document form, all the official document elements have a relatively uniform and specific strict format specification. The external representation of various case formats of the official documents, namely the specification of various official document elements in the aspects of fonts, word sizes, placing positions, arrangement modes and the like. In order to maintain the seriousness and beauty of the official document form, all official document elements have relatively uniform and specific strict format rules.
On the basis of the specifications of the lines, such as the fitting of official document titles, there are problems of wear of Zhangguan plum, misuse of culture, combination of culture, and missing elements, which are common examples:
"culture used in combination" means that two legal documents are used simultaneously in the title, and common errors such as "report on … …", "petition on … …", "notice (announcement, resolution) on … … decision (batch)", and the like. The document error correction reasonably selects one of the documents according to the content of the document and the line-text relationship, and the ' request report ' is changed into the ' letter ' if the letter is ' indivisible ' request report '.
The problem of "element missing" is that the general official documents including the culture, the literary units and the affairs are not able to be omitted arbitrarily, and even if they are omitted, it is not preferable to omit two items of "three elements". Common errors are determined by the company XX, modified by the company XX about … …; the official title of only the literature is irregular, such as "Notification" (x issue [ 19 xx ]) 140, notification ", and the like.
2) The subject matter with core content is a concrete presentation surrounding a subject matter, namely a specific expression of various official document elements in a specific official document. How well the privacy level, urgency, etc. are determined for a document alone? How does the letter number of the letter? The title language, the distinction between main transmitting unit and copying unit, the identification of countersigning unit, the structure of text, language, punctuation, etc. all have clear requirements.
On the content theme, for example, according to the company document management method, documents with specific functions must sign a specific functional unit, such as "large conference, event arrangement, meeting office, involving leadership of group company; manpower resource department is required to be signed when personnel management and labor wage items are involved; the financial management items are involved, and a financial headquarter is required to be signed; legal matters such as litigation and arbitration, and the legal department of countersigning. In order to guide the drafter to correctly identify the countersigning unit, the system can automatically judge the functional departments to be countersigned by the official document in a keyword mode according to the information such as the document type, the title and the like to the maximum extent, and display recommended or mandatory options in a column of the countersigning unit by default. For example, documents relating to company secrets should indicate the level and duration of confidentiality wherein "confidential" and "confidential" level documents should also indicate the number of copies. The mark of the letter sending unit should use the whole name of the letter sending unit or the normalized short for the combined line, the host unit is arranged in front, the letter number of the letter sending should include the generation, the year and the serial number of the company, the combined line only marks the letter number of the letter sending of the host unit, etc.
Referring to fig. 1, the present invention fully utilizes the above-mentioned characteristics of the enterprise document, and improves the algorithm and process of the existing document error correction, and the main flow is as follows:
step 1: and preparing data, a dictionary and an error model. And collecting sample data of standard enterprise official document files in the past year, and labeling, classifying and filing according to the culture. According to relevant official document processing regulations, various language standard words and common typical errors are combined, a dictionary is constructed from spelling errors, word errors, grammars, syntax errors, literary standard errors and management and control specifications of company operation, besides general phonetics and shape dictionaries, a language model, a word model, a grammar model, a syntax model and a standard model are constructed for the types of the languages, and the dictionary is used for auxiliary judgment of error detection.
On the basis of the traditional error correction of the spelling error text, an error correction review model is added, a multi-dimensional error detection index is established according to the literary specification of the enterprise official document and the auditing requirement of the subject content, and a corresponding grading mechanism is set.
For example, for the proposed literary specification problem of the above official document title, the subject elements are the three elements of literary units, affairs and literary categories, and the process of establishing the model is as follows:
(1) and searching the vocabulary in the word bank with the culture type K in the text based on a dictionary matching method.
(2) And extracting the lexical expression of each title from the text, screening out a newly added lexical expression model, and adding the newly added lexical expression model into a candidate pattern library with the type of K.
The related modes comprise subject words, relation words, modes and auxiliary stop words, the combination is the basis of generating the mode, and a mode is generated by marking the result of the combination.
The pattern consists of a combination + result;
the format is "combined ═ > entity 1, entity 2, relationship; ";
the combination is a generalized sentence, and the 'entity 1, entity 2, relationship' is collectively called a result;
mode "about + COMPANY + TYPE ═ 2, -1, -2; "in," about + compare + TYPE "is a combination, and as a result, the first 2 bit represents the position of the entity compare, the second-1 bit represents that the position is empty, the third bit represents a relationship, if the third bit is positive, the meaning is the same as the first bit, the position of an entity is represented, if the third bit is negative, the relationship is normalized, the-1 bit represents that the specification is met, and the-2 bit represents that the specification is not met.
(3) And calculating the score of each candidate pattern, and selecting the pattern with the score larger than a threshold value T1 to be added into the pattern library T with the type K.
And matching the pattern library by the subsequent error detection indexes according to the type of the culture, identifying the titles of the corresponding culture by a pattern matching method, calculating the title scores, and selecting the titles with the scores smaller than a threshold t2 to be added into an error candidate set. The score (T | K) of title pattern T with genre K is calculated as shown in the formula:
Figure BDA0002294960390000061
n (T | K) represents the total number of title instances of type K mined using pattern T, and N (T) represents the total number of title instances of all types using pattern T.
Step 2: and (5) automatically identifying the culture types. The culture recognition is a text classification problem and mainly comprises several main links, such as text data preparation, text preprocessing, text feature processing, training models, model evaluation and output models. Based on enterprise official document documents in various formats throughout the year, model training is carried out by utilizing machine learning, and a culture classifier is constructed and used for automatically identifying the culture of the documents. The scheme is based on a HanLP open source component, classification training is carried out by adopting a naive Bayes method algorithm, a classification model is output, the culture type is identified, and different language models, dictionaries and related error detection auxiliary models are loaded.
And step 3: and (4) error detection. The error detection mainly aims at judging whether errors exist in a text and need to be corrected, the method adopts two modes of a rule and a deep learning model, and the main error detection steps are as follows:
1) and (3) cutting words by a Chinese word segmentation device in the aspect of rules, detecting errors from two aspects of word granularity and word granularity, and integrating suspected error results of the two granularities to form a suspected error position candidate set.
2) And (3) in the aspect of a deep learning model, a bidirectional character level N-gram LM is used for scoring characters in the sentence, and the place with low score is regarded as the position to be corrected. And combining the positions to be corrected with the context for dictionary word searching, and when all the combinations cannot be searched in the dictionary, regarding the combinations as wrong characters and a wrong position candidate set.
3) On the traditional language model, the specification of the line is checked by combining a word/sentence/grammar analysis language model. In the sentence, there is a certain combination relation between words, and the sentence can be divided into different components according to different relations. Judging whether the structure of an input word sequence (generally a sentence) is in accordance with a given grammar, analyzing the syntactic structure of the sentence in accordance with the grammar, scoring, and incorporating the syntactic structure which is lower than a threshold value into a standard error candidate set.
The pattern matching result shows the pattern matching result of the currently selected file, and the pattern matching result comprises the appeared entities, relations, patterns, involved sentences, weights and matched relation words. For example, the title appears with two verbs, as shown:
4) and a knowledge calculation link is added, more accurate local knowledge is provided from two dimensions of text association and text understanding for error correction, and the problem of knowledge generalization in the low-frequency field is solved.
And in the aspect of associated knowledge, the method can be supplemented to a large amount of accurate local knowledge related to the original title in a manner of searching or context pattern matching of the original wrong title in the standard corpus. These precise local knowledge is utilized to assist in error correction sequencing.
In terms of text understanding, it is clearly inappropriate to employ statistically derived language models to correct errors without understanding the content of the sentence expression. The generalization problem of low frequency domain knowledge needs to be solved from the global understanding of the sentence content and the understanding of each component of the sentence. The method specifically comprises the steps of carrying out semantic analysis on a text to obtain semantic features (such as Recurrent Neural Networks (RNNs)), utilizing an LSTMs (Long short-Term Memory, long-and-short-Term Memory model), wherein the model can better express dependence on the long-and-short-Term Memory model, and is applied to an error correction sequencing model to obtain a better error correction result compared with common RNNs (Neural Networks) only by making hands and feet on a hidden layer.
5) And integrating the four steps, carrying out all-around error detection on spelling errors, word errors, grammars, syntax errors, literary specification errors and management and control specifications of company operation, and generating a final error candidate set.
And 4, step 4: and (6) candidate recalls. Different recall strategies are employed for different types of errors.
1) And generating a corresponding confusion set aiming at the types of pronunciation confusion, shape confusion and the like, and selectively replacing the confusion set, the rules, the word list or the language model.
2) And performing mode replacement according to the rule base aiming at the types of word incompactness, behavior specification incompactness, content attribute incorredness and the like.
3) And generating candidate recalls by combining the official document specifications and content detection, and generating error correction candidates based on an HMM and a graph theory method. And by using the scoring function of the self, pre-screening can be carried out in the process of generating error correction candidates. The error-correcting words are regarded as original words and obtained through transition matrix conversion, how the error-correcting words (represented by S) obtain the maximum possible original words (represented by T) is known, the error-correcting words are obtained through conversion by a Bayesian formula, and for specific S, P (S) is invariable, P (T) is prior probability, P (S | T) is transition probability, and the error-correcting words and the transition probabilities can be obtained by establishing a language model and a transition matrix (also called an error model) based on a training corpus.
Figure BDA0002294960390000081
The determinants for generating an Error correction candidate are two, one is the language model of the candidate T and one is a conditional probability model, also called Error model. The main difference between the different types of methods is the error model. If only replacement errors are considered, then it is understood as a post-alignment character error model.
And 5: and (4) evaluating and correcting error correction. And after all suspected errors are positioned through error detection, all candidate processing modes are selected, candidate schemes are used for replacement, candidate sorting results of similar translation models are obtained based on the language model, and an optimal correction scheme is obtained. And evaluating based on a language model, scoring by using the confusion degree and mutual information of sentences, scoring by using a forward algorithm and character grades and other language models, training a discrimination model by using multi-class statistical characteristics, and if no score in the candidate sentences is higher than that of the original sentences or is not higher than a threshold value compared with that of the original scores, determining that the original sentences have no errors. Otherwise, the candidate sentence with the highest score is output as the error correction result.
Step 6: and (4) intelligent recommendation. Based on the error correction, aiming at the error types, combining word/syntax analysis results, recommending standard and standard expressions of the same type of languages, classical case writing and related material libraries, and providing auxiliary reference.
The steps are the innovative method for realizing the error correction of the official document electronic document. The electronic official document error correction system developed based on the method can well correct the document errors of various official documents, including notification, report, batch, report, letter, meeting summary, letter, request and the like, provides real-time ubiquitous guide, error correction and auxiliary functions, furthest ensures that a manuscript imitation person obtains all-round help in the manuscript imitation process, and can fundamentally strengthen the quality management of official document contents.
It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.

Claims (3)

1. A method for correcting the error of a document of official documents comprises the steps of detecting the document type, training a document type recognition model by using machine learning, and classifying the document type into the types of notification, report, batch, report, letter, meeting summary and request;
the error detection step comprises the steps of cutting words through a Chinese word segmentation device, detecting errors from the aspects of word granularity and word granularity, and integrating suspected error results of the two granularity detections to form a suspected error position candidate set;
using a bidirectional character level N-gram LM deep learning model to score characters in a sentence, regarding the part with low score as a position to be corrected, combining the position to be corrected with a context for dictionary word searching, and when all combinations cannot be searched in a dictionary, regarding the combinations as wrong words and adding a wrong position candidate set;
judging whether the input word sequence is in accordance with a given grammar through a traditional language model, analyzing the syntactic structure of a sentence in accordance with the grammar, scoring, and bringing the syntactic structure which is lower than a threshold value into a standard error candidate set;
a knowledge calculation step, which is to correct errors by using local knowledge of text association and text understanding two dimensions, wherein the associated knowledge correction comprises supplementing accurate local knowledge related to an original title by a mode of searching or context pattern matching of the original wrong title in a standard corpus, and using the local knowledge to assist in error correction sequencing; the text understanding and error correction comprises the steps of carrying out semantic analysis on a text to obtain semantic features, and utilizing an LSTMs model to express and apply the semantic features to an error correction sequencing model.
2. The method of correcting errors in a document according to claim 1, further comprising generating candidate recalls based on HMMs and graph theory methods in conjunction with document travel specifications and content detection.
3. The method of claim 1, wherein said document identification model building comprises the steps of,
based on the dictionary matching method, the vocabulary in the word stock with the culture type K is searched in the text,
and extracting a lexical expression of each title from the text, screening out a newly added lexical expression model, adding the lexical expression model into a candidate pattern library with the type of K, calculating the score of each candidate pattern, and selecting the pattern with the score larger than a threshold value T1 to be added into a pattern library T with the type of K.
CN201911197178.9A 2019-11-29 2019-11-29 Method for correcting errors of official document Pending CN111090986A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911197178.9A CN111090986A (en) 2019-11-29 2019-11-29 Method for correcting errors of official document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911197178.9A CN111090986A (en) 2019-11-29 2019-11-29 Method for correcting errors of official document

Publications (1)

Publication Number Publication Date
CN111090986A true CN111090986A (en) 2020-05-01

Family

ID=70393249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911197178.9A Pending CN111090986A (en) 2019-11-29 2019-11-29 Method for correcting errors of official document

Country Status (1)

Country Link
CN (1) CN111090986A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444706A (en) * 2020-06-15 2020-07-24 四川大学 Referee document text error correction method and system based on deep learning
CN111651978A (en) * 2020-07-13 2020-09-11 深圳市智搜信息技术有限公司 Entity-based lexical examination method and device, computer equipment and storage medium
CN111680634A (en) * 2020-06-10 2020-09-18 平安科技(深圳)有限公司 Document file processing method and device, computer equipment and storage medium
CN111753544A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Document error correction method, device, equipment and medium based on RPA and AI
CN111753529A (en) * 2020-06-03 2020-10-09 杭州云嘉云计算有限公司 Chinese text error correction method based on pinyin identity or similarity
CN111881679A (en) * 2020-08-04 2020-11-03 医渡云(北京)技术有限公司 Text standardization processing method and device, electronic equipment and computer medium
CN112016304A (en) * 2020-09-03 2020-12-01 平安科技(深圳)有限公司 Text error correction method and device, electronic equipment and storage medium
CN112396459A (en) * 2020-11-19 2021-02-23 上海源慧信息科技股份有限公司 Cloud auditing method for shopping certificate verification
CN112836493A (en) * 2020-12-04 2021-05-25 国家计算机网络与信息安全管理中心 Transcribed text proofreading method and storage medium
CN112949291A (en) * 2021-03-02 2021-06-11 赛飞特工程技术集团有限公司 Report error correction system and method
WO2021135444A1 (en) * 2020-06-28 2021-07-08 平安科技(深圳)有限公司 Text error correction method and apparatus based on artificial intelligence, computer device and storage medium
CN113704498A (en) * 2021-09-01 2021-11-26 云知声(上海)智能科技有限公司 Intelligent auditing method and system for document
CN113919326A (en) * 2020-07-07 2022-01-11 阿里巴巴集团控股有限公司 Text error correction method and device
WO2022012687A1 (en) * 2020-07-17 2022-01-20 武汉联影医疗科技有限公司 Medical data processing method and system
CN114138934A (en) * 2021-11-25 2022-03-04 腾讯科技(深圳)有限公司 Method, device and equipment for detecting text continuity and storage medium
CN116090441A (en) * 2022-12-30 2023-05-09 永中软件股份有限公司 Chinese spelling error correction method integrating local semantic features and global semantic features

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020037A (en) * 2012-12-05 2013-04-03 福建亿榕信息技术有限公司 Official document standardized calibration system
US20140214401A1 (en) * 2013-01-29 2014-07-31 Tencent Technology (Shenzhen) Company Limited Method and device for error correction model training and text error correction
CN109657052A (en) * 2018-12-12 2019-04-19 中国科学院文献情报中心 A kind of abstract of a thesis contains the abstracting method and device of fine granularity Knowledge Element

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020037A (en) * 2012-12-05 2013-04-03 福建亿榕信息技术有限公司 Official document standardized calibration system
US20140214401A1 (en) * 2013-01-29 2014-07-31 Tencent Technology (Shenzhen) Company Limited Method and device for error correction model training and text error correction
CN109657052A (en) * 2018-12-12 2019-04-19 中国科学院文献情报中心 A kind of abstract of a thesis contains the abstracting method and device of fine granularity Knowledge Element

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
张仰森等: "中文文本自动校对技术现状及展望", 《中文信息学报》 *
张仰森等: "中文文本自动校错系统中知识库及其构造方法研究", 《小型微型计算机系统》 *
张仰森等: "文本自动校对技术研究综述", 《计算机应用研究》 *
最AI的小PAI: "NLP上层应用的关键一环—中文纠错技术简述", 《知乎》 *
陈智鹏等: "基于N-gram统计模型的搜索引擎中文纠错", 《中国电子科学研究院学报》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753529A (en) * 2020-06-03 2020-10-09 杭州云嘉云计算有限公司 Chinese text error correction method based on pinyin identity or similarity
CN111753529B (en) * 2020-06-03 2021-07-27 杭州云嘉云计算有限公司 Chinese text error correction method based on pinyin identity or similarity
CN111680634A (en) * 2020-06-10 2020-09-18 平安科技(深圳)有限公司 Document file processing method and device, computer equipment and storage medium
CN111680634B (en) * 2020-06-10 2023-08-01 平安科技(深圳)有限公司 Document file processing method, device, computer equipment and storage medium
CN111444706A (en) * 2020-06-15 2020-07-24 四川大学 Referee document text error correction method and system based on deep learning
WO2021135444A1 (en) * 2020-06-28 2021-07-08 平安科技(深圳)有限公司 Text error correction method and apparatus based on artificial intelligence, computer device and storage medium
CN111753544A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Document error correction method, device, equipment and medium based on RPA and AI
CN113919326A (en) * 2020-07-07 2022-01-11 阿里巴巴集团控股有限公司 Text error correction method and device
CN111651978A (en) * 2020-07-13 2020-09-11 深圳市智搜信息技术有限公司 Entity-based lexical examination method and device, computer equipment and storage medium
WO2022012687A1 (en) * 2020-07-17 2022-01-20 武汉联影医疗科技有限公司 Medical data processing method and system
CN111881679A (en) * 2020-08-04 2020-11-03 医渡云(北京)技术有限公司 Text standardization processing method and device, electronic equipment and computer medium
CN111881679B (en) * 2020-08-04 2022-12-23 医渡云(北京)技术有限公司 Text standardization processing method and device, electronic equipment and computer medium
WO2021189803A1 (en) * 2020-09-03 2021-09-30 平安科技(深圳)有限公司 Text error correction method and apparatus, electronic device, and storage medium
CN112016304A (en) * 2020-09-03 2020-12-01 平安科技(深圳)有限公司 Text error correction method and device, electronic equipment and storage medium
CN112396459A (en) * 2020-11-19 2021-02-23 上海源慧信息科技股份有限公司 Cloud auditing method for shopping certificate verification
CN112836493B (en) * 2020-12-04 2023-03-14 国家计算机网络与信息安全管理中心 Transcribed text proofreading method and storage medium
CN112836493A (en) * 2020-12-04 2021-05-25 国家计算机网络与信息安全管理中心 Transcribed text proofreading method and storage medium
CN112949291A (en) * 2021-03-02 2021-06-11 赛飞特工程技术集团有限公司 Report error correction system and method
CN112949291B (en) * 2021-03-02 2022-05-06 赛飞特工程技术集团有限公司 Report error correction system and method
CN113704498A (en) * 2021-09-01 2021-11-26 云知声(上海)智能科技有限公司 Intelligent auditing method and system for document
CN114138934A (en) * 2021-11-25 2022-03-04 腾讯科技(深圳)有限公司 Method, device and equipment for detecting text continuity and storage medium
CN116090441A (en) * 2022-12-30 2023-05-09 永中软件股份有限公司 Chinese spelling error correction method integrating local semantic features and global semantic features
CN116090441B (en) * 2022-12-30 2023-10-20 永中软件股份有限公司 Chinese spelling error correction method integrating local semantic features and global semantic features

Similar Documents

Publication Publication Date Title
CN111090986A (en) Method for correcting errors of official document
US11734328B2 (en) Artificial intelligence based corpus enrichment for knowledge population and query response
AU2019263758B2 (en) Systems and methods for generating a contextually and conversationally correct response to a query
Jung Semantic vector learning for natural language understanding
US11886814B2 (en) Systems and methods for deviation detection, information extraction and obligation deviation detection
Şeker et al. Initial explorations on using CRFs for Turkish named entity recognition
US7983903B2 (en) Mining bilingual dictionaries from monolingual web pages
CN116306487A (en) Intelligent detection system and method for academic treatises of higher institutions
CN113159969A (en) Financial long text rechecking system
Wagner Detecting grammatical errors with treebank-induced, probabilistic parsers
US8666987B2 (en) Apparatus and method for processing documents to extract expressions and descriptions
Marcińczuk et al. Structure annotation in the polish corpus of suicide notes
CN116451646A (en) Standard draft detection method, system, electronic equipment and storage medium
Melero et al. Holaaa!! writin like u talk is kewl but kinda hard 4 NLP
Bousmaha et al. A hybrid approach for the morpho-lexical disambiguation of arabic
Hirpassa Information extraction system for Amharic text
Bloodgood et al. Data cleaning for xml electronic dictionaries via statistical anomaly detection
CN111274354B (en) Referee document structuring method and referee document structuring device
Shekhar et al. Computational linguistic retrieval framework using negative bootstrapping for retrieving transliteration variants
Romero et al. Information extraction in handwritten marriage licenses books
Round et al. Automated parsing of interlinear glossed text from page images of grammatical descriptions
Zhao et al. Natural language query for technical knowledge graph navigation
Alosaimy Ensemble Morphosyntactic Analyser for Classical Arabic
Petersen-Frey et al. Dataset of Quotation Attribution in German News Articles
Abeykoon Detecting Intrinsic Plagiarism Using Text Analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200501

WD01 Invention patent application deemed withdrawn after publication