CN110610000A - Key name context error detection method and system - Google Patents

Key name context error detection method and system Download PDF

Info

Publication number
CN110610000A
CN110610000A CN201910737596.6A CN201910737596A CN110610000A CN 110610000 A CN110610000 A CN 110610000A CN 201910737596 A CN201910737596 A CN 201910737596A CN 110610000 A CN110610000 A CN 110610000A
Authority
CN
China
Prior art keywords
word
context
name
dark
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910737596.6A
Other languages
Chinese (zh)
Inventor
张勇
朱立松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCTV INTERNATIONAL NETWORKS WUXI Co Ltd
Original Assignee
CCTV INTERNATIONAL NETWORKS WUXI Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCTV INTERNATIONAL NETWORKS WUXI Co Ltd filed Critical CCTV INTERNATIONAL NETWORKS WUXI Co Ltd
Priority to CN201910737596.6A priority Critical patent/CN110610000A/en
Publication of CN110610000A publication Critical patent/CN110610000A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a method and a system for detecting errors of key name context, wherein the method comprises the following steps: determining a keyword name set needing to be audited; selecting N continuous characters as dark words or key names, and then taking the context of the dark words or the key names; respectively segmenting the dark words or the key name context by using a segmentation algorithm; vectorizing the word segmentation result; inputting the result into a machine learning classifier, and outputting the result by the classifier; and when the key name belongs to the name in the key name set and is different from the output of the classifier, judging the word as an error context, and prompting an auditor to perform key audit. The system comprises an input module, a dark word selection module, a name calibration module, a context taking module, a word segmentation module, a word vectorization module, a classifier module and an alarm module. The invention has the advantages that: dark words with meaning can be identified, and key names that appear in the wrong context can be identified.

Description

Key name context error detection method and system
Technical Field
The invention relates to a method and a system for detecting a key name context error, belonging to the technical field of text information processing.
Background
The Internet becomes a network for people to participate, the national political life is as large as that of the Internet, and the small size is as small as that of the salt, sauce and vinegar, and none of the Internet is related. The Internet is a virtual space, people can participate in the Internet and speak, and the Internet naturally has entertainment tendency; but when the internet is combined with some serious topics, we have to strictly regulate it to maintain its seriousness.
However, this work appears to be simple and presents a number of challenges in practice. The name "vermilion patch" is used as an example to list common misspellings:
1) in case of a sound error, the three characters with the same pronunciation as the "Zhu" and "Yuan" and the "jade article are used to refer to the" Zhu Yuan article ". For example: the traditional Chinese medicine composition comprises a pig element, a Zhu Chao, a Zhu Yuan ledger and the like. This type of error is often due to errors that occur when the netizen uses the chinese pinyin input method.
2) In the case where the abbreviation is wrong, for example, "ZYZ" or "zhuyuanzhang" or "ZYZ" or the like is used instead of "vermilion patch".
3) In the case of complete error, for example, the correct original sentence should be: "the vermilion tablet beat the army of the monarch, beat all other strong enemies, establish a new dynasty", the wrong sentence is: the pig eight defeats the monarch, defeats all other strong enemies, and establishes a new dynasty. In this case, three words of "vermilion patch" do not appear in the sentence, but we can see that "pig eight" in the wrong sentence actually refers to the vermilion patch. Moreover, this refers to the profanity of the body, which is very likely to cause public opinion. And because the 'eight pigs' and the 'vermilion jade tablet' are different in tone and character, the difficulty of examination is increased, and an auditor can only deduce according to the context.
4) In the case of a context error, for example, the correct original sentence would be: "Chu overlord is one of the most powerful enemies of Han Liu bang, but he is also finally defeated"; the wrong sentence is: "Chu Bawang is one of the most powerful enemies of the vermilion patch, but he is also finally defeated". In this case, the three characters of the Zhuyu jade tablet have no misspelling, but the Zhuyu jade tablet and the Han Liu bang are mixed.
5) Context-independent cases, such as the original sentence: the children are helped to write and cook at home at the end of the week of the Zhuyu jade tablet, and the life of a couple of scholars is passed. It may be true that a person who is the same name as "vermilion article" is actually in the course of a scholar, but since "vermilion article" is a well-known historical figure, it is not appropriate. The fact that the context is not relevant is also a context error.
In order to deal with the above situation, a keyword scanning mode is generally adopted in the prior art to assist manual review. The keyword scanning system scans and highlights keywords appointed by the auditors to remind the auditors of paying attention to the keywords. This increases the difficulty of review due to the variety of possible errors, and the keyword scanning system has limited assistance to human beings. The prior art approach is to scan the text using a computer, match all the correct three words "mercury tablets", and know possible errors, such as: eight pig, dried pork slice, Zhu Chao, ZYZ, zhuyuanzhang, ZYZ, etc. The matching items are provided to a manual auditor in a highlighted form for manual auditing.
The prior art has the following defects:
1) the technical approach of using keyword scanning does not allow to enumerate all possible error patterns. In addition to the common errors listed above, there may be a variety of other specific errors, such as "pig weight eight".
2) By adopting the technical method of keyword scanning, a large number of scanning results can be obtained, so that auditors cannot be effectively assisted, the auditing time is shortened, and the auditing efficiency is improved.
3) In the prior art, only key words which need to be focused by a manual auditor can be marked, and in fact, whether errors exist or not needs to be judged comprehensively by combining context. Such as the case of the complete error listed above, using the prior art is ineffective because no keywords will be matched in the wrong sentence.
4) For the situation that the context is wrong and irrelevant, the existing keyword marking technology is also powerless, so that an auditor is required to have certain political history knowledge, and people who know that Zhang Fei and the vermilion jade tablet are not in the same era can find the error in the sentence.
Disclosure of Invention
The invention provides a method and a system for detecting a key name context error, aiming at overcoming the defects in the prior art, and utilizing context related to a name to carry out comprehensive analysis to predict whether a key name has a complete error condition or not and predict whether a key name has a context error or has no relation to the context or not.
The technical solution of the invention is as follows: a key name context error detection method comprises the following steps:
step 1: firstly, determining a key name set needing to be audited, wherein the set comprises others;
step 2: selecting N continuous characters in an article or a sentence or a section of a sentence as a word, and then taking the context of the dark word; or selecting a key name in an article or a sentence or a paragraph and selecting the context of the key name;
and 3, step 3: respectively segmenting the context of the dark words or the key names by using a segmentation algorithm;
and 4, step 4: vectorizing the word segmentation result;
and 5, step 5: inputting the vectorized word segmentation result into a machine learning classifier, and outputting an instruction indicating whether the context belongs to one of the key name sets determined in the step 1, and if so, indicating which one of the key name sets belongs to;
and 6, step 6: when the dark word does not belong to the names in the key name set and the output of the classifier is not other, the dark word is judged to be a wrong word, an auditor is prompted to perform key audit,
and when the key name belongs to the names in the key name set and is different from the output of the classifier, judging that the word is in an error context, and prompting an auditor to perform key audit.
Preferably, in step 1, a set of key NAMES, NAMES { "α", "β", "γ", … …, "NONE" }, where "α", "β", "γ", and the like are NAMES that need to be focused on, and "NONE" denotes others, is determined.
Preferably, in the step 2, the context of the dark word or the key name is taken, specifically: taking the M characters on the left side of the dark word or the key name as the upper text of the dark word or the key name, and taking the M characters on the right side of the dark word or the key name as the lower text of the dark word or the key name; when the dark words or the key names appear at the beginning of the sentence, no words above or less than M words above exist; when dark words or key names appear at the end of a sentence, there is no or less than M words below.
Preferably, in the step 4, the word segmentation result is vectorized, specifically: converting each word or character into a vector with D dimension, if K words exist in the text and K words exist in the text, obtaining a data matrix with D rows and 2K columns according to the context, and recording the matrix as CD×2K(ii) a And if the number of the words is less than K, complementing the words by using a 0 vector.
Preferably, in the 5 th step, C is addedD×2KInputting a machine learning classifier, wherein the classifier outputs a condition that the context does not belong to one of the key names in the key name set determined in the step 1, namely 'NONE'.
Preferably, in the step 6, specifically, when the dark word does not belong to the set NAMES- { "NONE" } and the output of the classifier is not NONE, it is determined that the dark word is a wrong word, and the auditor is prompted to perform a focused audit, the NAMES- { "NONE" } is a difference set between the set NAMES and the set { "NONE" },
and when the name of the key person belongs to the set NAMES- { "NONE" } and is different from the output of the classifier, judging that the word is an error context, and prompting an auditor to perform key audit.
A system for detecting context errors of key names comprises
The input module is used for inputting a given text to be audited and a keyword name set needing to be audited;
the dark word selection module is used for assuming that any continuous N characters in the text to be audited form a dark word;
the name calibration module is used for directly calibrating names in the key name set appearing in the text to be examined;
a context selecting module for selecting context of the dark word or the name according to the dark word selected by the dark word selecting module or the name marked by the name marking module,
a word segmentation module for segmenting words from the context selected by the context selection module by using a word segmentation algorithm,
the word vectorization module is used for converting the words obtained by the word segmentation module into D-dimensional vectors, and the contexts respectively segment K words to obtain a matrix with D rows and 2K columns for representing the contexts corresponding to the dark words or the names, and the vectors are complemented by 0 when the number of the words is less than K;
the classifier module is used for inputting a matrix of D rows and 2K columns output by the word vectorization module and outputting a certain element in the key name set, and the classifier module is a machine learning classifier;
the alarm module is used for judging that a dark word is used for referring to a certain key name when the classifier module predicts the context of the dark word as the context belonging to the certain key name, giving an alarm and enabling the output module to be highlighted so as to prompt an auditor to perform key audit; or when the context of a key name A is input into the classifier module for classification and the output of the classifier module is not A, judging that the word is the wrong context, giving an alarm, highlighting the output module and prompting an auditor to perform key audit.
And the output module is used for outputting the highlight display signal transmitted by the alarm module.
Preferably, the text to be audited is an article or a sentence or a paragraph;
the key name set needing to be audited is NAMES, NAMES { "alpha", "beta", "gamma", … …, "NONE" }, wherein the "alpha", "beta", "gamma" and the like are NAMES needing important attention, and the "NONE" represents others;
when the context taking module selects the context of the dark word or the name, taking the M characters on the left side of the dark word or the name as the upper text of the dark word or the name, taking the M characters on the right side of the dark word or the name as the lower text of the dark word or the name, when the dark word or the name appears at the beginning of a sentence, the upper text or the upper text is not short of the M characters, and when the dark word or the name appears at the end of the sentence, the lower text or the lower text is not short of the M characters;
the matrix of the D rows and the 2K columns is CD×2K
The classifier module predicts the context of a dark word as belonging to the context of a certain key name, namely the dark word does not belong to a set NAMES- { "NONE" }, and the output of the classifier is not NONE; the context of a key name a is input to the classifier for classification, the output of which is not a, i.e. when the key name belongs to the set NAMES- { "NONE" } and is different from the output of the classifier.
The invention has the advantages that: the comprehensive analysis is carried out by utilizing the context related to the names of the people, the condition that whether a certain key name has a complete error of the key name can be predicted, the condition that whether a certain key name has a context error or is irrelevant to the context can be predicted, the dark words with the meaning can be identified, the key name appearing in the wrong context can be identified, an auditor can be effectively assisted, the auditing time of the content needing to be audited on the Internet can be shortened, and the auditing efficiency and the auditing accuracy can be improved.
Drawings
FIG. 1 is a schematic diagram of the architecture of a key name context error detection system according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and specific embodiments.
Linguistic and computer natural language processing related studies indicate that the meaning of a word is not determined by itself, but rather by its context. A word alone can be presented to a particular meaning only if a large number of linguistic phenomena have been assigned the particular meaning for which the word is relatively stable.
Example 1: in the wrong sentence: in the process that eight pigs defeat the army of the Yuan empire and defeat all other strong enemies, a new dynasty is established, a human reader can easily infer that eight pigs mean red jade tablets, namely, the human reader can infer by using context knowledge and other prior background knowledge. Many people who beat the great army of the Yuan empire have established a new dynasty, and only one person is the jade tablet. Therefore, based on these contexts and background, the human reader infers that "pig eight" refers to vermilion patch. It can be seen from this example that although the "eight pig" sound is different from the "red jade tablet" sound in character and shape, and the wording is completely wrong, the reader can still infer that "eight pig" refers to "red jade tablet" because the context plays a great role.
Example 2: in the wrong sentence "the writing work of the child at home at the end of the week of the vermilion jade tablet" and the life of the scholar textbook "after the writing work of the child, the three characters of the" vermilion jade tablet "are completely correct, but the whole sentence is incorrect because the three characters of the" vermilion jade tablet "are collocated with the wrong context. Since the vermilion patch is a well-known historical figure, the three words "vermilion patch" have been given a very stable meaning, i.e., meaning the mingdian of mingdu, rather than a family woman, due to a number of language phenomena. So that the context of "Zhuyu jade article" and "Xiang Fu Zi" are matched together in error.
Examples
A key name context error detection method comprises the following steps:
step 1: first, a set NAMES of key NAMES to be audited is determined, for example: NAMES { "vermilion article", "elbow king", "bang", "NONE" }. Wherein "Zhuyue Jade article" and the like are names of persons needing attention, and "NONE" means others.
Step 2: for any article or a sentence or a paragraph, any continuous N characters are selected as a word, and for the convenience of description, the word is called as "dark word", because the word may refer to a politically sensitive character. The following takes the context of this word, specifically: taking M characters on the left side of the word as the upper text of the word, and taking M characters on the right side of the word as the lower text of the word. When the dark words appear at the beginning of the sentence, no or less than M characters exist; when dark words appear at the end of a sentence, there is no or less than M words of text.
Eight pig defeats the great army of the Yuan empire, defeats all other strong enemies, and establishes a new dynasty
For the above paragraph, two examples are given:
example 1: assuming that N is 2 and M is 10, two words "beat" are selected as the dark word, the upper part of the dark word is [ eight beats the grand army of the monarch ], and the lower part of the dark word is [ all other strong enemies, established ].
Example 2: assuming that N is 3 and M is 10, three characters "other" are selected as a dark word, the upper part of the dark word is [ army of monarch, beat ], and the lower part of the dark word is [ all strong enemies, establish a new word ].
Alternatively, for any article or sentence or paragraph, the key names are selected and the context of the key names is selected. The context selection method of the key name is the same as the context selection method of the dark word.
And 3, step 3: the above and below are segmented separately, for example the above [ eight defeats the monarch's army ] will be divided into word sequences: [ "eight", "beat-up", "passed", "Yudi", "of", "army" ], hereinafter [ all other strong enemies, set-up ] will be divided into word sequences: [ "has", "other", "all", "strong enemy", ",", "" ]. The Chinese word segmentation is a common algorithm in the Chinese natural language processing in the prior art, and is not described herein. Instead of word segmentation, the context may simply be segmented into individual words and punctuation.
And 4, step 4: and vectorizing the word segmentation result, namely converting each word (or character) into a D-dimensional vector, supposing that K words exist in the text and K words also exist in the text, and complementing the situation of less than K words by using a 0 vector. Thus, a data matrix of D rows and 2K columns is obtained according to the context, and the matrix is recorded as CD×2K
And 5, step 5: c is to beD×2KAs an input to a machine learning classifier, the output of the classifier isIndicating which person name (possibly "NONE") in the set of key person names (given in step 1) the context should belong to.
And 6, step 6: when the dark word does not belong to a set NAMES- { "NONE" } (a difference set of the set NAMES and the set { "NONE" }) and the output of the classifier is not NONE, judging that the dark word is a wrong word, and prompting an auditor to perform key audit.
And when the name of the key person belongs to the set NAMES- { "NONE" } and is different from the output of the classifier, judging that the word is an error context, and prompting an auditor to perform key audit.
As shown in FIG. 1, a key name context error detection system comprises
And the input module is used for inputting the given text to be audited and the appointed name set. For example, the text to be reviewed is an article of political subject matter, or a news report. The set of names is the set of key names that are of interest to the audit transaction, as in the example of step 1 above.
And the dark word selection module is used for assuming that any continuous N characters in the text to be audited form a dark word. So when the text to be reviewed is long, there are many possible dark words.
And the name calibration module is used for directly calibrating the names in the name set appearing in the text to be examined. This step requires only a simple match to perform the calibration.
And the context taking module is used for selecting the context of the dark words or the marked names according to the result of the dark word selection or the result of the name marking.
And the word segmentation module is used for segmenting words of the context by using a word segmentation algorithm. Because Chinese is different from English, Chinese is a continuous written Chinese character, and words composed of Chinese characters have no space separation.
And the word vectorization module is used for converting the words into D-dimensional vectors, and if the contexts are respectively divided into K words, a matrix with D rows and 2K columns can be obtained to represent the context corresponding to a certain dark word or name. The processing of the computer can be facilitated.
And the classifier module is used for inputting a matrix of D rows and 2K columns output by the word vectorization module and outputting a certain element in the name set. The classifier can be trained using a method of machine learning.
The alarm module is used for judging that a dark word is used for referring to a certain key name when the context of the dark word is predicted to belong to the context of the certain key name by the classifier (namely the dark word does not belong to the set NAMES- { "NONE" }, and the output of the classifier is not NONE), giving an alarm and highlighting the alarm to prompt an auditor to perform key audit; or when the context of a key person name A is input into the classifier for classification and the output of the classifier is not A (namely when the key person name belongs to a set NAMES- { "NONE" } and is different from the output of the classifier), judging that the word is an error context, giving an alarm and highlighting to prompt an auditor to perform key audit.
And the output module is used for outputting the highlight display signal transmitted by the alarm module.
All the above components are prior art, and those skilled in the art can use any model and existing design that can implement their corresponding functions.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the inventive concept of the present invention, and these changes and modifications are all within the scope of the present invention.

Claims (8)

1. A key name context error detection method is characterized by comprising the following steps:
step 1: firstly, determining a key name set needing to be audited, wherein the set comprises others;
step 2: selecting N continuous characters in an article or a sentence or a section of a sentence as a word, and then taking the context of the dark word; or selecting a key name in an article or a sentence or a paragraph and selecting the context of the key name;
and 3, step 3: respectively segmenting the context of the dark words or the key names by using a segmentation algorithm;
and 4, step 4: vectorizing the word segmentation result;
and 5, step 5: inputting the vectorized word segmentation result into a machine learning classifier, and outputting an instruction indicating whether the context belongs to one of the key name sets determined in the step 1, and if so, indicating which one of the key name sets belongs to;
and 6, step 6: when the dark word does not belong to the names in the key name set and the output of the classifier is not other, the dark word is judged to be a wrong word, an auditor is prompted to perform key audit,
and when the key name belongs to the names in the key name set and is different from the output of the classifier, judging that the word is in an error context, and prompting an auditor to perform key audit.
2. A method as claimed in claim 1, wherein in step 1, the key name context error detection method determines the key name set NAMES, NAMES { "α", "β", "γ", … …, "NONE" } which is the name of the person to be focused on, and "NONE" which is the other.
3. The method for detecting the context error of the key name according to claim 2, wherein in the step 2, the context of the dark word or the key name is taken, and specifically: taking the M characters on the left side of the dark word or the key name as the upper text of the dark word or the key name, and taking the M characters on the right side of the dark word or the key name as the lower text of the dark word or the key name; when the dark words or the key names appear at the beginning of the sentence, no words above or less than M words above exist; when dark words or key names appear at the end of a sentence, there is no or less than M words below.
4. The method as claimed in claim 3, wherein in the step 4, the word segmentation result is vectorized, specifically: converting each word or character into a D-dimensional vector, if there are K words in the above, the followingAlso having K words, a data matrix of D rows and 2K columns is obtained according to the context, and the matrix is recorded as CD×2K(ii) a And if the number of the words is less than K, complementing the words by using a 0 vector.
5. The method as claimed in claim 4, wherein in the step 5, C is addedD×2KInputting a machine learning classifier, wherein the classifier outputs a condition that the context does not belong to one of the key names in the key name set determined in the step 1, namely 'NONE'.
6. The method as claimed in claim 5, wherein in the step 6, specifically, when the dark word does not belong to the set NAMES- { "NONE" } and the output of the classifier is not NONE, the dark word is determined to be a wrong word, so as to prompt the auditor to perform a focused audit, NAMES- { "NONE" } is a difference set between the set NAMES and the set { "NONE" },
and when the name of the key person belongs to the set NAMES- { "NONE" } and is different from the output of the classifier, judging that the word is an error context, and prompting an auditor to perform key audit.
7. A key name context error detection system is characterized by comprising
The input module is used for inputting a given text to be audited and a keyword name set needing to be audited;
the dark word selection module is used for assuming that any continuous N characters in the text to be audited form a dark word;
the name calibration module is used for directly calibrating names in the key name set appearing in the text to be examined;
a context selecting module for selecting context of the dark word or the name according to the dark word selected by the dark word selecting module or the name marked by the name marking module,
a word segmentation module for segmenting words from the context selected by the context selection module by using a word segmentation algorithm,
the word vectorization module is used for converting the words obtained by the word segmentation module into D-dimensional vectors, and the contexts respectively segment K words to obtain a matrix with D rows and 2K columns for representing the contexts corresponding to the dark words or the names, and the vectors are complemented by 0 when the number of the words is less than K;
the classifier module is used for inputting a matrix of D rows and 2K columns output by the word vectorization module and outputting a certain element in the key name set, and the classifier module is a machine learning classifier;
the alarm module is used for judging that a dark word is used for referring to a certain key name when the classifier module predicts the context of the dark word as the context belonging to the certain key name, giving an alarm and enabling the output module to be highlighted so as to prompt an auditor to perform key audit; or when the context of a key name A is input into the classifier module for classification and the output of the classifier module is not A, judging that the word is the wrong context, giving an alarm, highlighting the output module and prompting an auditor to perform key audit.
And the output module is used for outputting the highlight display signal transmitted by the alarm module.
8. The system according to claim 7, wherein the text to be reviewed is an article, a sentence, or a paragraph;
the key name set needing to be audited is NAMES, NAMES { "alpha", "beta", "gamma", … …, "NONE" }, wherein the "alpha", "beta", "gamma" and the like are NAMES needing important attention, and the "NONE" represents others;
when the context taking module selects the context of the dark word or the name, taking the M characters on the left side of the dark word or the name as the upper text of the dark word or the name, taking the M characters on the right side of the dark word or the name as the lower text of the dark word or the name, when the dark word or the name appears at the beginning of a sentence, the upper text or the upper text is not short of the M characters, and when the dark word or the name appears at the end of the sentence, the lower text or the lower text is not short of the M characters;
the matrix of the D rows and the 2K columns is CD×2K
The classifier module predicts the context of a dark word as belonging to the context of a certain key name, namely the dark word does not belong to a set NAMES- { "NONE" }, and the output of the classifier is not NONE; the context of a key name a is input to the classifier for classification, the output of which is not a, i.e. when the key name belongs to the set NAMES- { "NONE" } and is different from the output of the classifier.
CN201910737596.6A 2019-08-12 2019-08-12 Key name context error detection method and system Pending CN110610000A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910737596.6A CN110610000A (en) 2019-08-12 2019-08-12 Key name context error detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910737596.6A CN110610000A (en) 2019-08-12 2019-08-12 Key name context error detection method and system

Publications (1)

Publication Number Publication Date
CN110610000A true CN110610000A (en) 2019-12-24

Family

ID=68889946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910737596.6A Pending CN110610000A (en) 2019-08-12 2019-08-12 Key name context error detection method and system

Country Status (1)

Country Link
CN (1) CN110610000A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1193779A (en) * 1997-03-13 1998-09-23 国际商业机器公司 Method for dividing sentences in Chinese language into words and its use in error checking system for texts in Chinese language
US20160357731A1 (en) * 2014-01-28 2016-12-08 Somol Zorzin Gmbh Method for Automatically Detecting Meaning and Measuring the Univocality of Text
CN106527756A (en) * 2016-10-26 2017-03-22 长沙军鸽软件有限公司 Method and device for intelligently correcting input information
CN108304366A (en) * 2017-03-21 2018-07-20 腾讯科技(深圳)有限公司 A kind of hypernym detection method and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1193779A (en) * 1997-03-13 1998-09-23 国际商业机器公司 Method for dividing sentences in Chinese language into words and its use in error checking system for texts in Chinese language
US20160357731A1 (en) * 2014-01-28 2016-12-08 Somol Zorzin Gmbh Method for Automatically Detecting Meaning and Measuring the Univocality of Text
CN106527756A (en) * 2016-10-26 2017-03-22 长沙军鸽软件有限公司 Method and device for intelligently correcting input information
CN108304366A (en) * 2017-03-21 2018-07-20 腾讯科技(深圳)有限公司 A kind of hypernym detection method and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
倪吉等: "基于可信度模型的中文人名识别研究", 《中文信息学报》 *
周昆等: "一种基于本体论和规则匹配的中文人名识别方法", 《微计算机信息》 *

Similar Documents

Publication Publication Date Title
Thrush et al. Winoground: Probing vision and language models for visio-linguistic compositionality
US10157171B2 (en) Annotation assisting apparatus and computer program therefor
Meurers et al. Evidence and interpretation in language learning research: Opportunities for collaboration with computational linguistics
CN103853702B (en) The apparatus and method of the Chinese idiom mistake in correction language material
Mihalcea et al. Toward communicating simple sentences using pictorial representations
Laha et al. Scalable micro-planned generation of discourse from structured data
Allen Cognate frequency and assessment of second language lexical knowledge
CN110472234A (en) Sensitive text recognition method, device, medium and computer equipment
Jarvis et al. Detecting L2 writers’ L1s on the basis of their lexical styles
Slater et al. Semantic Features of Math Problems: Relationships to Student Learning and Engagement.
Turney et al. The natural selection of words: Finding the features of fitness
G. Torre et al. Can Menzerath’s law be a criterion of complexity in communication?
Ramil Brick et al. Am i allergic to this? assisting sight impaired people in the kitchen
El-Fiqi et al. Network motifs for translator stylometry identification
Adedamola et al. Development and evaluation of a system for normalizing Internet slangs in social media texts
CN110610000A (en) Key name context error detection method and system
Sharma et al. Visual clue: an approach to predict and highlight next character
Puspitasari et al. Identify Fake Author in Indonesia Crime Cases: A Forensic Authorsip Analysis Using N-gram and Stylometric Features
Ferguson et al. Retrieval data augmentation informed by downstream question answering performance
Wang et al. Investigating differences in gaze and typing behavior across writing genres
Oguz et al. Chop and change: Anaphora resolution in instructional cooking videos
Hasenäcker et al. Morpheme position coding in reading development as explored with a letter search task
Malmasi et al. From visualisation to hypothesis construction for second language acquisition
Mersinias et al. Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization
Man et al. Flexible letter-position coding in Chinese-English L2 bilinguals: Evidence from eye movements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191224

RJ01 Rejection of invention patent application after publication