CN110941720B - Knowledge base-based specific personnel information error correction method - Google Patents

Knowledge base-based specific personnel information error correction method Download PDF

Info

Publication number
CN110941720B
CN110941720B CN201910865592.6A CN201910865592A CN110941720B CN 110941720 B CN110941720 B CN 110941720B CN 201910865592 A CN201910865592 A CN 201910865592A CN 110941720 B CN110941720 B CN 110941720B
Authority
CN
China
Prior art keywords
name
similarity
person
information
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910865592.6A
Other languages
Chinese (zh)
Other versions
CN110941720A (en
Inventor
黄瑞章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Cloud Pioneer Tech Co ltd
Guizhou University
Original Assignee
Guizhou Cloud Pioneer Tech Co ltd
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Cloud Pioneer Tech Co ltd, Guizhou University filed Critical Guizhou Cloud Pioneer Tech Co ltd
Priority to CN201910865592.6A priority Critical patent/CN110941720B/en
Publication of CN110941720A publication Critical patent/CN110941720A/en
Application granted granted Critical
Publication of CN110941720B publication Critical patent/CN110941720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a knowledge base-based specific personnel information error correction method, and relates to the technical field of computer character recognition. The method utilizes a Double-LSTM boundary model to identify and extract the name and other information of a specific person in a text to be detected, compares and calculates the similarity between the extracted information in the text and the specific person information in a specific person knowledge base, judges whether the name and the related information in the current text are correct, establishes a correct name information base, screens out suspected wrong name information, preferentially uses the correct name information base in the text to calculate the similarity of the suspected wrong information and matches other auxiliary information by using the information in the specific person knowledge base, and corrects the suspected wrong information. The method solves the technical difficulty that the names are difficult to identify due to the fact that the sentences contain wrong characters to change text semantics, greatly improves the identification effect of the names and the name information, and realizes end-to-end direct correction of specific names and related information of the specific names in the texts.

Description

Knowledge base-based specific personnel information error correction method
Technical Field
The invention relates to the technical field of computer character recognition, in particular to a specific personnel information error correction method based on a knowledge base.
Background
Most of the current error correction techniques are limited to performing common word matching edit distance calculation on a target field, and selecting a word with the smallest edit distance from candidate words smaller than an edit distance threshold value for error correction. However, in the actual text application scene, only the edit distance comparison is simply performed, whether the target field is wrong or not cannot be accurately determined, and often the information in the context can provide help for finding errors and correcting errors, but the prior art rarely uses the information in the context of the extracted text and is used for correcting errors. Similarly, in the alternative library used for matching and correcting errors with the target field in the prior art, only the target candidate word is often used, and related auxiliary information is absent, so that the judging and correcting accuracy is greatly reduced.
In the existing name entity extraction method, a multi-purpose sequence labeling model, particularly a plurality of neural network technologies in recent years, are applied to a sequence labeling recognition model in various aspects, and good effects are achieved in some application scenes. In sentences containing error information, the effect of sequence labeling on entity name extraction, especially name extraction, is greatly reduced. Because the sequence annotation model often cannot determine whether the current incorrect word is a new word or one of the other words when the incorrect word is encountered.
Disclosure of Invention
The invention aims to provide a specific personnel information error correction method based on a knowledge base, so as to solve the problems in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a specific personnel information error correction method based on a knowledge base comprises the following steps:
s1, preprocessing a target text, and simultaneously establishing a common error dictionary;
s2, matching and correcting errors are carried out on the target text by using a common error dictionary;
s3, identifying the preprocessed text to obtain name or/and name information;
s4, comparing the person names obtained through recognition with the person names in the knowledge base, and calculating the similarity.
Preferably, the preprocessing in step S1 includes sentence preprocessing, and sentence in the text is sentence-separated according to sentence characters in the text.
Preferably, step S2 specifically includes: according to sentence feature input calculation, matching whether the input sequence contains errors in a common error dictionary or not by using a character string matching mode for each text sequence, and if so, storing an error field as a recognition result and correcting errors; if the common error is not included, the process proceeds to step S3.
Preferably, the text recognition in step S3 is performed in the following manner:
s3.1, using a HanLP tool to assist a Double-LSTM boundary recognition model to recognize information such as names and names in sentences;
s3.2, extracting pinyin characteristics and wubi characteristics of the name character strings.
Preferably, the step S31 specifically includes:
1) Traversing each word in the sentence to be identified, and dividing the sentence into a left clause and a right clause by taking the current word as the center;
2) Inputting the left clause and the right clause into two different LSTMs respectively for coding;
3) The encoded vectors are input into a full-link layer in cascade for classification, and whether the current word is an entity starting boundary or not is judged;
4) Taking 2-gram and 3-gram with boundaries as the beginning as candidate names, and using a HanLP tool to segment sentences, and identifying the names according to the part of speech nr;
5) The name nnt is identified by the part of speech after word segmentation, and the name closest to the name is searched in the name context as the name to which the name belongs.
Preferably, step S3.2 specifically includes:
extracting pinyin characteristics of name strings, including pinyin of each word, unifying flat tongue and edge sound nasal tones, unifying the flat tongue and the edge sound nasal tones, and unifying the nasal tones as edge sounds; and extracting the five-stroke characteristics of the name character string, including five-stroke codes of each word.
Preferably, the step S4 specifically includes:
s4.1, judging whether the identified person name is the same as the person name in the knowledge base, if the identified person name is a specific person name in the knowledge base, storing the identified person name into a 'text specific person name set', otherwise, storing into a 'suspected wrong person name set';
s4.2, calculating the similarity of the suspected wrong person name and the person name of the specific person in the text; when the similarity is greater than the threshold value, correcting by the name of the specific person; otherwise, step S43 is entered;
s4.3, calculating the similarity of the suspected wrong name and the name of the knowledge base; and judging whether the similarity of the names is greater than a threshold value, if so, correcting the names through the knowledge base, otherwise, judging that the names are not names.
Preferably, step S42 specifically includes:
and (5) name similarity calculation: name similarity = name spelling similarity + name similarity; the name spelling similarity and the name similarity are calculated as follows;
and (5) calculating the similarity of the spelling of the name: respectively calculating the edit distances of character strings, pinyin and wubi of the names of specific personnel and suspected wrong personnel, wherein the edit distance of the pinyin/wubi is the average of the edit distances of each word pinyin/wubi code, and finally calculating the weighted average of the three edit distances as the comprehensive distance; comparing whether the comprehensive distance is larger than a given threshold value, if the comprehensive distance is smaller than the threshold value, then the name similarity=threshold value-comprehensive distance, otherwise, the name similarity=0, the threshold value can be manually given according to whether the spelling similarity requirement is loose or tight in specific application conditions, and the value range is usually 0-1;
the name similarity calculation: number of intersection elements of the name set of the name of the current person and the name set of the name of the specific person of the knowledge base/number of elements of the name set of the current person; if the current person name designation set is not null, but the intersection is null, then the designation similarity is negative.
Preferably, step S4.3 specifically includes:
and (3) calculating name similarity II: name similarity II = name spelling similarity II + name similarity II; the name spelling similarity II and the name similarity II are calculated as follows;
and (3) calculating the similarity II of the name spelling: respectively calculating the edit distances of the character strings, the pinyin and the wubi of the names of the specific personnel and the suspected wrong personnel of the knowledge base, and finally calculating the weighted average of the three edit distances as the comprehensive distance; and comparing whether the comprehensive distance is larger than a given threshold value, if so, the spelling similarity II=threshold value-comprehensive distance, otherwise, the spelling similarity II=0.
The name similarity calculation: number of intersection elements of the name set of the suspected wrong person and the name set of the specific person in the knowledge base/number of elements of the name set of the suspected wrong person; if the suspected wrong person name designation set is not null, but the intersection is null, the designation similarity II is negative.
The beneficial effects of the invention are as follows:
the invention provides and realizes a complete method for correcting the error of specific personnel information based on a knowledge base, firstly, the editing distance between the person name to be identified in an input text and the person name in the knowledge base is calculated only, the person name information is extracted, meanwhile, the information such as the name of the specific personnel in the sentence is extracted as auxiliary information for judgment, and the information is calculated and compared with the information in the knowledge base, so that the context semantic information in the sentence is utilized to make the error correction judgment more reasonable and accurate, and meanwhile, the identification and correction of other information except the person name error correction can be realized. Secondly, the invention uses Double-LSTM model when identifying the name and other information, avoid the technical difficulty when there is wrongly written word in the sentence and can't identify the name, when obtaining the sentence information, will carry on its (except the present word) left and right sides information to each word of the sentence to extract, thus has solved the problem that wrongly written word produces influence to whole sentence semanteme in the sentence effectively, has promoted the name and effect that the information discerns of the name greatly at the same time.
Drawings
FIG. 1 is a flowchart of an implementation of the knowledge base-based personnel specific information correction method in embodiment 1;
fig. 2 is a common error dictionary example in embodiment 1;
FIG. 3 is a schematic diagram of the Double-LSTM model employed in example 1;
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the invention.
Example 1
The embodiment provides a specific personnel information error correction method based on a knowledge base, which comprises the following steps:
1) And carrying out sentence dividing processing on the input text. The following model method calculates in sentence as a sequence unit. The incoming text is processed in clauses with clause symbols (, |;; \n\r).
2) Matching using a common error dictionary: the method comprises the steps of maintaining a common error dictionary, matching whether the input sequence contains errors in the common error dictionary by using a character string matching mode for each input sequence, and if the common errors are contained, storing error fields as recognition results and correcting errors, wherein the fields which are already recognized in the common error dictionary are not used in subsequent calculation. The common error dictionary is shown in fig. 2.
3) Identifying the name of a person in a sentence: the method combines a HanLP word segmentation tool and a Double-LSTM boundary recognition model to recognize the name, the boundary recognition model recognizes the name, the boundary recognition model has the function of improving the name recognition effect, and the situation that the word cannot be correctly recognized under the condition of wrong name writing is avoided, and the entity starting boundary is recognized.
And firstly, recognizing name boundaries by using Double-LSTM models for each input sequence, traversing each word in the sequence, dividing a sentence into a left clause and a right clause by taking the current word as the center, respectively inputting the left clause and the right clause into two different LSTM models for coding, inputting the coded vectors into a full-connection layer for classification, classifying the vectors into two classification problems, and judging whether the current word is an entity starting boundary or not. The Double-LSTM model is shown in the figure.
And (3) taking the 2-gram and the 3-gram with the boundaries as the beginning as candidate names according to the boundary recognition result, comparing the candidate names with the specific personnel names in the knowledge base, and reserving the candidate names with the editing distance of 1 or 2 as suspected specific personnel names. And adding the person names of the suspected specific persons into a word segmentation dictionary, segmenting the sentences, and carrying out person name recognition according to the word segmentation part nr.
4) And identifying names in the sentences, judging the names of the people according to the distances, and obtaining name features. The name nnt is identified by the part of speech after word segmentation, and the name closest to the name (except the number of words and the number of words) is searched in the name context as the name to which the name belongs.
5) Extracting pinyin characteristics and five-stroke characteristics of name strings: the character string of the name is extracted with pinyin characteristics (pinyin of each character, and the flat tongue and the edge tone nose tone are unified, the flat tongue is unified, the nose tone is unified as the edge tone), and five-stroke characteristics (five-stroke codes of each character).
6) Judging whether the person name is the same as the person name of the knowledge base: judging whether the identified person name is a specific person name in the knowledge base, if so, storing the specific person name set in the text, otherwise, storing the specific person name set in the suspected wrong person name set.
7) Calculating the person name similarity I of the suspected wrong person name and the person name of the specific person in the text: and calculating the similarity of the names in the suspected wrong name set and the names in the specific person name set, wherein the name similarity consists of two parts, namely name spelling similarity I and name similarity I.
The spelling similarity I of the name is calculated, the editing distances of the character strings, the pinyin and the wubi of the two names are calculated respectively, the editing distance of the pinyin/wubi is the average of the editing distances of the pinyin/wubi codes of each character, finally, the weighted average of the three editing distances is calculated as the comprehensive distance, whether the comprehensive distance is larger than a given threshold value is compared, if the comprehensive distance is smaller than the threshold value, the spelling similarity I of the name is=the threshold value-the comprehensive distance, otherwise, the name similarity is=0, and the threshold value can be given according to specific application conditions.
The name similarity i=the number of intersection elements of the name set of the current person and the name set of the specific person/the number of elements of the name set of the previous person, and if the name set of the current person is not empty but the intersection is empty, the name similarity I is negative, and the value is-0.2 in this embodiment.
8) Calculating person name similarity II of suspected wrong person names and knowledge base person names:
and (3) calculating name similarity II: name similarity II = name spelling similarity II + name similarity II; the name spelling similarity II and the name similarity II are calculated as follows;
and (3) calculating the similarity II of the name spelling: respectively calculating the edit distances of the character strings, the pinyin and the wubi of the names of the specific personnel and the suspected wrong personnel of the knowledge base, and finally calculating the weighted average of the three edit distances as the comprehensive distance; and comparing whether the comprehensive distance is larger than a given threshold value, if so, the spelling similarity II=threshold value-comprehensive distance, otherwise, the spelling similarity II=0.
The name similarity calculation: number of intersection elements of the name set of the suspected wrong person and the name set of the specific person in the knowledge base/number of elements of the name set of the suspected wrong person; if the name designation set of the suspected wrong person is not null, but the intersection set is null, the name similarity II is negative, and the value is-0.2 in the embodiment.
Example 2
In this embodiment, taking a specific section as an example, the method in embodiment 1 is used to perform information error correction, and includes the following steps:
1) Information about names, names and the like of specific personnel is extracted from various webpage information, and the information is formed into a specific personnel information knowledge base.
2) And referring to the information error-prone words of the common specific personnel in the network, and extracting to form a common error dictionary.
3) Inputting a text to be identified, and obtaining a result through identification and error correction by the method, wherein the method identifies error correction input and output examples:
a) Input sample
{ "docId": "9", "title": ": university friend new spring communication meeting holding", "text": "educational foundation message: day 14 of 1 month college friend new spring communication is held. The recent 30 schools are the stock school for a long time and care education, and the alumni representative with the development and the contribution of each business of the school gathers together with the leader of the school and the responsible person of the relevant functional department of the school, and together with the new spring of congratulation, the alumni representative contributes to the future development and the contribution of the school. The alumni such as the fuhua international group president Zhao Yong, the blue cursor spreading group director and chief executive officer Zhao Wenquan, the eastern cambridge educational group president Yu Yue and the like are commonly attended with university auxiliary school, educational foundation auxiliary school, auxiliary office Wang Bo, alumni office, alumni conference auxiliary meeting and secretary Li Wensheng, the nostalgic scientific city school area raising office main principal Li Hang, industry party work principal auxiliary book, asset management limited company president Wei Junmin, party work office auxiliary principal, educational foundation auxiliary secret, geng Shu, zhao Lin and the like. The communication will be hosted by the educational foundation secretary Li Ningyu. "}
b) Recognition result:
{ "sense" ("Fuhua International group President Zhao Yong, blue cursor propagation group board length and head executive officer Zhao Wenquan" ], "correct": zhao Wenquan "," wrong ": zhao Wenquan" }
{ "sense" [ "communication will be hosted by the educational foundation secretary Li Ningyu" ], "correct": "Li Yuning", "wrng": "Li Ningyu" }
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the invention provides and realizes a complete method for correcting the error of specific personnel information based on a knowledge base, firstly, the editing distance between the person name to be identified in an input text and the person name in the knowledge base is calculated only, the person name information is extracted, meanwhile, the information such as the name of the specific personnel in the sentence is extracted as auxiliary information for judgment, and the information is calculated and compared with the information in the knowledge base, so that the context semantic information in the sentence is utilized to make the error correction judgment more reasonable and accurate, and meanwhile, the identification and correction of other information except the person name error correction can be realized. Secondly, the invention uses Double-LSTM model when identifying the name and other information, avoid the technical difficulty when there is wrongly written word in the sentence and can't identify the name, when obtaining the sentence information, will carry on its (except the present word) left and right sides information to each word of the sentence to extract, thus has solved the problem that wrongly written word produces influence to whole sentence semanteme in the sentence effectively, has promoted the name and effect that the information discerns of the name greatly at the same time.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.

Claims (4)

1. The specific personnel information error correction method based on the knowledge base is characterized by comprising the following steps:
s1, preprocessing a target text, and simultaneously establishing a common error dictionary;
s2, matching and correcting errors are carried out on the target text by using a common error dictionary;
s3, identifying the target text to obtain name or/and name information;
s4, comparing the person names obtained by recognition with person names in the person name information base and the knowledge base, and calculating the similarity of the person names;
s5, judging whether the name related information in the target text is correct or not, correcting the error item and adding the correct information into a text name information base;
manner of text recognition in step S3:
s3.1, using a HanLP tool to assist a Double-LSTM boundary recognition model to recognize names and title information in sentences;
s3.2, extracting pinyin characteristics and wubi characteristics of the name character strings;
the step S31 specifically includes:
1) Traversing each word in the target text, and dividing the sentence into a left clause and a right clause by taking the current word as the center;
2) Inputting the left clause and the right clause into two different LSTMs respectively for coding;
3) The encoded vectors are input into a full-link layer in cascade for classification, and whether the current word is an entity starting boundary or not is judged;
4) Taking 2-gram and 3-gram with boundaries as the beginning as candidate names, and using a HanLP tool to segment sentences, and identifying the names according to the part of speech nr;
5) Identifying a name nnt by the part of speech after word segmentation, and searching a name closest to the name in the name context as a name to which the name belongs;
the step S4 specifically includes:
s4.1, judging whether the identified person name is the same as the person name in the knowledge base, if the identified person name is a specific person name in the knowledge base, storing the identified person name into a 'text specific person name set', otherwise, storing into a 'suspected wrong person name set';
s4.2, calculating the person name similarity I between the suspected wrong person name and the person name of the specific person in the text; when the name similarity I is larger than the threshold value, correcting by the name of the specific person; otherwise, step S43 is entered;
s4.3, calculating the name similarity II of the suspected wrong name and the knowledge base name, judging whether the name similarity II is larger than a threshold value, if so, correcting through the knowledge base name, otherwise, judging that the name is not the name needing error correction;
the step S4.2 specifically comprises the following steps:
and (3) calculating name similarity I: name similarity i=name spelling similarity i+name similarity I; the name spelling similarity and the name similarity are calculated as follows;
and (3) calculating the similarity I of the name spelling: respectively calculating the edit distances of the character strings, the pinyin and the wubi of the names of the specific personnel and the suspected wrong personnel, and finally calculating the weighted average of the three edit distances as the comprehensive distance; comparing whether the comprehensive distance is larger than a given threshold value, if so, the spelling similarity of the name=threshold value-the comprehensive distance, otherwise, the spelling similarity of the name i=0; the threshold may be given by the belief of a particular application;
the name similarity I calculation: title similarity i=number of intersection elements of the title set of the current person name and the title set of the person name of the specific person of the knowledge base/number of elements of the current person name title set; if the current name naming set is not empty, but the intersection set is empty, the name similarity I is negative;
the step S4.3 specifically comprises:
and (3) calculating name similarity II: name similarity II = name spelling similarity II + name similarity II; the name spelling similarity II and the name similarity II are calculated as follows;
and (3) calculating the similarity II of the name spelling: respectively calculating the edit distances of the character strings, the pinyin and the wubi of the names of the specific personnel and the suspected wrong personnel of the knowledge base, and finally calculating the weighted average of the three edit distances as the comprehensive distance; comparing whether the comprehensive distance is larger than a given threshold value, if so, the spelling similarity II=threshold value-the comprehensive distance, otherwise, the spelling similarity II=0;
the name similarity calculation: number of intersection elements of the name set of the suspected wrong person and the name set of the specific person in the knowledge base/number of elements of the name set of the suspected wrong person; if the suspected wrong person name designation set is not null, but the intersection is null, the designation similarity II is negative.
2. The knowledge base based personnel specific information error correction method according to claim 1, wherein the preprocessing in step S1 comprises sentence preprocessing, and sentence in text is sentence-segmented according to sentence characters in text; the method uses sentences as a sequence unit to calculate.
3. The knowledge base based personnel specific information error correction method of claim 1, wherein step S2 specifically comprises: according to sentence feature input calculation, matching whether the input sequence contains errors in a common error dictionary or not by using a character string matching mode for each text sequence, and if so, storing an error field as a recognition result and correcting errors; if the common error is not included, the process proceeds to step S3.
4. The knowledge base based personnel specific information error correction method according to claim 1, wherein step S3.2 specifically comprises:
extracting pinyin characteristics of name strings, including pinyin of each word, unifying flat tongue and edge sound nasal tones, unifying the flat tongue and the edge sound nasal tones, and unifying the nasal tones as edge sounds; and extracting the five-stroke characteristics of the name character string, including five-stroke codes of each word.
CN201910865592.6A 2019-09-12 2019-09-12 Knowledge base-based specific personnel information error correction method Active CN110941720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910865592.6A CN110941720B (en) 2019-09-12 2019-09-12 Knowledge base-based specific personnel information error correction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910865592.6A CN110941720B (en) 2019-09-12 2019-09-12 Knowledge base-based specific personnel information error correction method

Publications (2)

Publication Number Publication Date
CN110941720A CN110941720A (en) 2020-03-31
CN110941720B true CN110941720B (en) 2023-06-09

Family

ID=69906180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910865592.6A Active CN110941720B (en) 2019-09-12 2019-09-12 Knowledge base-based specific personnel information error correction method

Country Status (1)

Country Link
CN (1) CN110941720B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582169B (en) * 2020-05-08 2023-10-10 腾讯科技(深圳)有限公司 Image recognition data error correction method, device, computer equipment and storage medium
CN112000767A (en) * 2020-07-31 2020-11-27 深思考人工智能科技(上海)有限公司 Text-based information extraction method and electronic equipment
CN112380842A (en) * 2020-11-25 2021-02-19 北京明略软件系统有限公司 Name error correction method and device, computer equipment and readable storage medium
CN112528663B (en) * 2020-12-18 2024-02-20 中国南方电网有限责任公司 Text error correction method and system in power grid field scheduling scene
CN116341531B (en) * 2023-02-28 2023-10-10 人民网股份有限公司 Knowledge-driven character information extraction and inspection method and device
CN116341543B (en) * 2023-05-31 2023-09-19 安徽商信政通信息技术股份有限公司 Method, system, equipment and storage medium for identifying and correcting personal names

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117549A1 (en) * 2013-01-29 2014-08-07 Tencent Technology (Shenzhen) Company Limited Method and device for error correction model training and text error correction
CN107045496A (en) * 2017-04-19 2017-08-15 畅捷通信息技术股份有限公司 The error correction method and error correction device of text after speech recognition
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
CN109359291A (en) * 2018-08-28 2019-02-19 昆明理工大学 A kind of name entity recognition method
CN110135551A (en) * 2019-05-15 2019-08-16 西南交通大学 A kind of robot chat method of word-based vector sum Recognition with Recurrent Neural Network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014117549A1 (en) * 2013-01-29 2014-08-07 Tencent Technology (Shenzhen) Company Limited Method and device for error correction model training and text error correction
CN107045496A (en) * 2017-04-19 2017-08-15 畅捷通信息技术股份有限公司 The error correction method and error correction device of text after speech recognition
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
CN109359291A (en) * 2018-08-28 2019-02-19 昆明理工大学 A kind of name entity recognition method
CN110135551A (en) * 2019-05-15 2019-08-16 西南交通大学 A kind of robot chat method of word-based vector sum Recognition with Recurrent Neural Network

Also Published As

Publication number Publication date
CN110941720A (en) 2020-03-31

Similar Documents

Publication Publication Date Title
CN110941720B (en) Knowledge base-based specific personnel information error correction method
CN110717031B (en) Intelligent conference summary generation method and system
US11545156B2 (en) Automated meeting minutes generation service
US10176804B2 (en) Analyzing textual data
CN107305768B (en) Error-prone character calibration method in voice interaction
Neculoiu et al. Learning text similarity with siamese recurrent networks
US6810146B2 (en) Method and system for segmenting and identifying events in images using spoken annotations
CN109800414B (en) Method and system for recommending language correction
CN106570180A (en) Artificial intelligence based voice searching method and device
CN112287680B (en) Entity extraction method, device and equipment of inquiry information and storage medium
CN112183094A (en) Chinese grammar debugging method and system based on multivariate text features
CN114416942A (en) Automatic question-answering method based on deep learning
CN113468891A (en) Text processing method and device
CN116361510A (en) Method and device for automatically extracting and retrieving scenario segment video established by utilizing film and television works and scenario
CN116502628A (en) Multi-stage fusion text error correction method for government affair field based on knowledge graph
CN112447172B (en) Quality improvement method and device for voice recognition text
TW202032534A (en) Voice recognition method and device, electronic device and storage medium
CN112151019A (en) Text processing method and device and computing equipment
CN112307364B (en) Character representation-oriented news text place extraction method
CN112287108B (en) Intention recognition optimization method in field of Internet of things
US20050125224A1 (en) Method and apparatus for fusion of recognition results from multiple types of data sources
CN115831117A (en) Entity identification method, entity identification device, computer equipment and storage medium
CN114998785B (en) Intelligent Mongolian video analysis method
CN113609864B (en) Text semantic recognition processing system and method based on industrial control system
CN113254668B (en) Knowledge graph construction method and system based on scene latitude

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant