CN114220112A - Person name card oriented arbitrary relationship extraction method and system - Google Patents

Person name card oriented arbitrary relationship extraction method and system Download PDF

Info

Publication number
CN114220112A
CN114220112A CN202111544385.4A CN202111544385A CN114220112A CN 114220112 A CN114220112 A CN 114220112A CN 202111544385 A CN202111544385 A CN 202111544385A CN 114220112 A CN114220112 A CN 114220112A
Authority
CN
China
Prior art keywords
character
name
unit
correction
work unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111544385.4A
Other languages
Chinese (zh)
Inventor
李佳静
瞿签新
林润
汪严博
高小涵
张贵鹏
张泽豪
郝亚鑫
曾伟豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology Beijing CUMTB
Original Assignee
China University of Mining and Technology Beijing CUMTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology Beijing CUMTB filed Critical China University of Mining and Technology Beijing CUMTB
Priority to CN202111544385.4A priority Critical patent/CN114220112A/en
Publication of CN114220112A publication Critical patent/CN114220112A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a person name card oriented arbitrary relationship extraction method, which comprises the following steps: step 1, obtaining a character name card picture, and preprocessing the character name card picture; step 2, extracting characters in the preprocessed character name card picture to obtain a character area; step 3, identifying three entities in the text area, wherein the three entities comprise names, working units and positions; step 4, correcting the name, the work unit and the position identified in the step 3; and 5, forming a triple for expressing the relationship of the job according to the corrected name, the work unit and the position, and storing the triple in the electronic business card database. The invention also discloses a system for extracting the occupational relationship of the character business card, thereby realizing the automatic input and storage of the occupational relationship of the business card and expanding and managing the relationship of the human veins.

Description

Person name card oriented arbitrary relationship extraction method and system
Technical Field
The invention relates to the technical field of searching and information extraction, in particular to a person business card oriented arbitrary relationship extraction method and system.
Background
The name card is an important identity information transfer carrier in current business communication and daily life, has great effects on establishing communication deepening impression and establishing preliminary business mutual trust in daily use, and is a tool with higher cost performance for improving personal influence and increasing cooperation possibility. In early years, people often converted the contents on business cards into electronic information in a manual input mode and input the electronic information into a digital storage device for storage and management. On one hand, the method is low in efficiency, and is powerless when a large amount of data needs to be processed; secondly, the cost is high, a simple entry work needs the repeated labor of personnel using a computer, a large amount of time and energy are consumed to manage and maintain the database in the later period, and the database is often inconvenient to be docked with the databases of other people. With the increasing frequency of communication nowadays, the business card entry requirement is increasing, and the automatic entry and storage of business cards by technical means is possible and urgent.
In the existing processing method for the business card, access to the contact information such as a mobile phone number, an email and the like is generally only realized. The relationship of the business card is important for organizing and managing the relationship of the human arteries. The relationship of job and task is expressed by three groups of name, work unit and position, and the prior method does not solve the following problems:
(1) identifying three entities, namely a name, a work unit and a position, from the character extraction result;
(2) correcting the character with the error identification according to the characteristics of the entity;
(3) and matching a plurality of working units and positions to generate a correct triple of the name, the working unit and the position.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method and a system for extracting the arbitrary relationship of a character-oriented business card.
The invention adopts the following technical scheme for solving the technical problems:
the invention provides a person business card oriented arbitrary relationship extraction method, which comprises the following steps:
step 1, obtaining a character name card picture, and preprocessing the character name card picture;
step 2, extracting characters in the preprocessed character name card picture to obtain a character area;
step 3, identifying three entities in the text area, wherein the three entities comprise names, working units and positions;
step 4, correcting the name, the work unit and the position identified in the step 3;
and 5, forming a plurality of triples for expressing the arbitrary role relationship according to the corrected name, the work unit and the position, and storing the triples in an electronic business card database, wherein each triplet is the name, the work unit and the position.
As a further optimization scheme of the person business card oriented arbitrary relationship extraction method, the obtaining mode of the person business card picture in the step 1 is as follows: a shot, crawler, or user offer;
the pretreatment comprises the following steps:
if the character business card picture contains a plurality of business cards, the picture is firstly divided into single character business cards, and then binarization, noise smoothing, inclination angle detection and correction processing are carried out on the single character business cards.
As a further optimization scheme of the person business card oriented arbitrary relationship extraction method, the extraction in the step 2 comprises character detection and character recognition.
As a further optimization scheme of the person business card oriented arbitrary relationship extraction method,
step 1, automatically generating a picture training test set for the preprocessed figure business card pictures;
and 2, extracting characters by adopting the figure business card picture in the automatically generated picture training test set, wherein the method for automatically generating the training test set comprises the steps of generating pictures with various fonts and different noises of Chinese characters and automatically adjusting angles of the figure business card picture to generate a plurality of test samples.
As a further optimization scheme of the person business card oriented arbitrary relationship extraction method, in step 3, three entities, namely a person name, a work unit and a position, are identified based on a named entity identification method, and when more than two entities are contained in the same text area, the entities are divided into single entities by using a Chinese lexical tool.
As a further optimization scheme of the person business card oriented arbitrary relationship extraction method, in the step 4, the correction method is as follows:
firstly, for the identified names, under the condition that the corresponding name pinyin exists in the picture of the character name card, Chinese characters with the same pinyin and the closest font are obtained from a Chinese character pinyin library for correction; under the condition that pinyin does not exist, selecting the Chinese character with the closest font by utilizing font similarity measurement to correct;
judging whether the identified working unit is a logo according to the position and the font of the identified working unit, and calling a logo identification algorithm to identify and correct the logo if the identified working unit is the logo; if the work unit is not a logo but contains English, Pinyin or address information of the work unit, using the English, Pinyin or address information as input, and calling an interface of a search engine to search and obtain a correct name of the work unit for correction; if the information is not contained, firstly, a language model is utilized to obtain characters, and then the characters with the closest character patterns are selected by utilizing the similarity measurement of the character patterns in the characters for correction;
selecting the job name with the minimum editing distance for the identified job according to the job dictionary library for correction; if the distance between the job names in the dictionary is larger than the preset threshold value, the corrected work unit name and the job to be corrected are input into the language model together to obtain the most probable character, and then the character type closest to the Chinese character is selected in the character by utilizing the character type similarity measurement for correction.
As a further optimization scheme of the person business card oriented arbitrary relation extraction method, in step 5, for a plurality of working units and positions, the working units and the positions are paired according to the proximity relation of the positions; and if a certain position has no adjacent work units in the position, the identified logo is used as the work unit corresponding to the position.
An extraction system for the relationship between the human name card and the job of the character comprises
The picture training test set unit is used for storing character name card pictures with various fonts and different noises, containing Chinese characters, and automatically adjusting angles of the character name card pictures to generate character name card pictures of a plurality of test samples;
the text knowledge base unit is used for storing a Chinese character pinyin base, a stroke order base and a dictionary of positions and unit names;
the character extraction unit is used for extracting characters in the character name card picture, obtaining character extraction results and outputting the character extraction results to the entity recognition unit, wherein the character extraction results comprise character areas;
the entity identification unit is used for identifying three entities, namely the name, the work unit and the position in the character extraction result; when the same character area contains two or more entities, the Chinese lexical tool is used for dividing the same into single entities;
the entity correcting unit is used for correcting the parts of the identified names, the identified working units and the identified positions, the confidence degrees of which are lower than the preset values;
the system comprises an arbitrary relationship generating unit, a database and a processing unit, wherein the arbitrary relationship generating unit is used for generating a plurality of triples of < person names, work units and positions > and storing the triples in the database;
the entity correction unit comprises a name correction subunit, a work unit correction subunit and a position correction subunit:
a name correction subunit, for correcting the recognized name by using the Chinese character with the same pinyin and the closest character pattern obtained from the Chinese character pinyin library under the condition that the corresponding name pinyin exists in the character name card picture; under the condition that pinyin does not exist, selecting the Chinese character with the closest font by utilizing font similarity measurement to correct;
the work unit correction subunit judges whether the identified work unit is a logo according to the position and the font of the work unit, and if the work unit is the logo, a logo identification algorithm is called to identify and correct the work unit; if the work unit is not a logo but contains English, Pinyin or address information of the work unit, using the English, Pinyin or address information as input, and calling an interface of a search engine to search and obtain a correct name of the work unit for correction; if the information is not contained, firstly, a language model is utilized to obtain characters, and then the characters with the closest character patterns are selected by utilizing the similarity measurement of the character patterns in the characters for correction;
the position correcting subunit is used for selecting the position name with the minimum editing distance for the identified position according to the position dictionary library for correction; if the distance between the job names in the dictionary is larger than the preset threshold value, the corrected work unit name and the job to be corrected are input into the language model together to obtain the most probable character, and then the character type closest to the Chinese character is selected in the character by utilizing the character type similarity measurement for correction.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
the method and the system can automatically realize the following functions:
(1) identifying three entities, namely a name, a work unit and a position, from the character extraction result;
(2) correcting the character with the error identification according to the characteristics of the entity;
(3) matching a plurality of working units and positions to generate a plurality of correct < person name, working unit, position > triplets;
therefore, the accuracy rate of extraction of the arbitrary relationship is improved, the automatic input and storage of the arbitrary relationship in the character business card are realized, and the viewing rate and the propagation of the electronic business card are improved. Based on the extracted arbitrary relationship, the relationship of the human pulse can be managed and expanded.
Drawings
FIG. 1 is a flow chart of a person-card oriented method for extracting an occupational relationship;
FIG. 2 is a block diagram of a system for extracting membership functions for character cards;
FIG. 3 is a block diagram of a text extraction unit;
fig. 4 is a structural diagram of an entity correcting unit.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings:
as shown in fig. 1, a method for extracting an arbitrary relationship oriented to a character business card includes the following steps:
step 1, obtaining a picture of a character name card, and preprocessing the picture;
step 2, extracting characters in the picture;
step 3, identifying three entities of the name, the work unit and the position of the extracted character area;
step 4, correcting the name, the work unit and the position identified in the step 3 when the character identification confidence coefficient is lower than a threshold value;
and 5, forming a plurality of triples of the name, the work unit and the position, and storing the triples in a database.
Wherein the picture of the business card of step 1 can be obtained by shooting, from a network using a crawler, or provided by a user. The picture preprocessing step comprises the following steps:
(1) if the picture contains a plurality of business cards, firstly, the picture is divided into single character business cards;
(2) and carrying out binarization, noise smoothing, inclination angle detection, correction and other processing on the picture.
Wherein, the step 2 of extracting the characters in the picture comprises two steps of character detection and character recognition. The character detection can adopt a detection model based on image segmentation, such as DB, and the like to judge whether each pixel belongs to a text target and the connection condition between the pixel and the surrounding pixels, and then integrates the results of adjacent pixels into a text box; the word recognition algorithm may use the CRNN recognition model, including convolutional layer CNN, cyclic layer RNN, and transcription layer CTC.
Before step 2, a step of automatically generating a picture training test set to train the character extraction method is also included. The method for automatically generating the training test set comprises the steps of generating pictures of 3000 Chinese character common characters with different noises including Song style, regular style, clerical script, black body and the like, and automatically adjusting angles of the pictures of the character name cards to generate a plurality of test samples.
In step 3, three entities, namely a name, a work unit and a position, are identified based on the named entity identification method, wherein the named entity identification method further comprises the step of dividing the named entity into single entities by using a Chinese lexical tool when two or more entities are contained in the same area, for example, when the CEO of Fujian Kogaku consult Co., Ltd comprises two entities, namely the work unit and the position, and the lexical tool can adopt a Baidu LAC Chinese lexical analysis tool. The method for named entity recognition can use a three-layer model comprising an embedding layer, a BilSTM layer and a decoding CRF layer.
In step 4, the correction method is as follows:
(1) for the identified name, under the condition that the name card has pinyin corresponding to the name, Chinese characters with the same pinyin and the closest font are obtained from a Chinese character pinyin library for correction; and under the condition that pinyin does not exist, selecting the Chinese character with the closest font by utilizing the font similarity measurement to correct. Wherein the font similarity can be calculated based on the edit distance of IDS (Ideographic Description sequence).
(2) And judging whether the identified work unit is a logo (logo) according to the information such as the position, the font and the like of the identified work unit. If the logo (logo) is the logo (logo), calling a logo (logo) recognition algorithm for recognition and correction; if the logo is not a common logo (logo) but contains English, Pinyin or address information of a working unit, using the English, Pinyin or address information as input, and calling an interface of a search engine to search and obtain a correct name of the working unit for correction; if the information does not contain the information, the most probable character is obtained by using the language model, and then the character type closest to the Chinese character is selected from the most probable character by using the character type similarity measurement for correction. Wherein the language model may use a BERT model and the glyph similarity may be calculated based on IDS.
(3) Selecting the job name with the minimum editing distance according to the job dictionary library for correcting the identified job; if the distances of the job names in the dictionary are all larger than the threshold value, the corrected names of the working units and the jobs to be corrected are input into the language model together to obtain the most probable characters, and then the characters with the closest font are selected from the most probable characters by utilizing the font similarity measurement for correction. Wherein the language model may use a BERT model and the glyph similarity may be calculated based on IDS.
In step 5, there may be a plurality of work units and positions, and the work units and the positions are paired according to the proximity relation of the positions. And if a certain position has no adjacent working units in position, the identified logo (logo) is used as the working unit corresponding to the position.
As shown in fig. 2, an arbitrary relationship extraction system for a character card includes the following components:
picture training test set unit: the common fonts including 3000 Chinese character common characters comprise different noise pictures such as a song style, a regular script, an clerical script, a black body and the like, and the pictures of the character business card are automatically subjected to angle adjustment to generate picture data such as a plurality of test samples and the like;
a text knowledge base unit: the Chinese character input method comprises a Chinese character pinyin library, a stroke order library, dictionaries of positions, unit names and the like;
a character extraction unit: the extraction of characters in the picture of the business card is realized;
an entity identification unit: realizing the identification of three entities, namely the name, the work unit and the position in the character extraction result; when two or more entities are contained in the same region, the entities are divided into single entities by using a Chinese lexical tool, and the lexical tool can adopt a hundredth LAC Chinese lexical analysis tool. The named entity identification method can use a three-layer model consisting of an embedding layer, a BilSTM layer and a decoding CRF layer.
An entity correction unit: the method and the device realize the correction of the parts with low confidence coefficient of the identified names, the work units and the positions.
An arbitrary relationship generating unit: several triplets of < person name, work unit, job position > are generated and stored in a database.
As shown in fig. 3, the word extraction unit includes a detection subunit and an identification subunit, which respectively implement word detection and word identification; the character detection can adopt a detection model based on image segmentation, such as DB, and the like to judge whether each pixel belongs to a text target and the connection condition between the pixel and the surrounding pixels, and then integrates the results of adjacent pixels into a text box; the word recognition algorithm may use the CRNN recognition model, including convolutional layer CNN, cyclic layer RNN, and transcription layer CTC.
As shown in fig. 4, the entity modification unit includes a name modification subunit, a work unit modification subunit and a position modification subunit:
(1) the name correction subunit corrects the recognized name by using the Chinese character pinyin which has the same pinyin and the closest character pattern and is obtained from the Chinese character pinyin library under the condition that the corresponding name pinyin exists in the name card; under the condition that pinyin does not exist, selecting the Chinese character with the closest font by utilizing font similarity measurement to correct; wherein the font similarity can be calculated based on the edit distance of IDS (Ideographic Description sequence).
(2) The work unit correction subunit judges whether the identified work unit is a logo (logo) or not according to the information such as the position, the font and the like of the identified work unit, and if the identified work unit is the logo (logo), the work unit correction subunit calls a logo (logo) identification algorithm to identify and correct the logo (logo); if the logo is not a common logo (logo) but contains English, Pinyin or address information of a working unit, using the English, Pinyin or address information as input, and calling an interface of a search engine to search and obtain a correct name of the working unit for correction; if the information does not contain the information, the most probable character is obtained by using the language model, and then the character type closest to the Chinese character is selected from the most probable character by using the character type similarity measurement for correction. Wherein the language model may use a BERT model and the glyph similarity may be calculated based on IDS.
(3) And the position correction subunit selects the position name with the minimum editing distance for the identified position according to the position dictionary library for correction. If the distances of the job names in the dictionary are all larger than the threshold value, the corrected names of the working units and the jobs to be corrected are input into the language model together to obtain the most probable characters, and then the characters with the closest font are selected from the most probable characters by utilizing the font similarity measurement for correction. Wherein the language model may use a BERT model and the glyph similarity may be calculated based on IDS.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (8)

1. A person business card oriented arbitrary relationship extraction method is characterized by comprising the following steps:
step 1, obtaining a character name card picture, and preprocessing the character name card picture;
step 2, extracting characters in the preprocessed character name card picture to obtain a character area;
step 3, identifying three entities in the text area, wherein the three entities comprise names, working units and positions;
step 4, correcting the name, the work unit and the position identified in the step 3;
and 5, forming a plurality of triples for expressing the arbitrary role relationship according to the corrected name, the work unit and the position, and storing the triples in an electronic business card database, wherein each triplet is the name, the work unit and the position.
2. The method for extracting human business card oriented relationship as claimed in claim 1, wherein the human business card picture of step 1 is obtained by: a shot, crawler, or user offer;
the pretreatment comprises the following steps:
if the character business card picture contains a plurality of business cards, the picture is firstly divided into single character business cards, and then binarization, noise smoothing, inclination angle detection and correction processing are carried out on the single character business cards.
3. The method as claimed in claim 1, wherein the extraction in step 2 comprises text detection and text recognition.
4. The method for extracting human business card-oriented occupational relationship according to claim 1, wherein,
step 1, automatically generating a picture training test set for the preprocessed figure business card pictures;
and 2, extracting characters by adopting the figure business card picture in the automatically generated picture training test set, wherein the method for automatically generating the training test set comprises the steps of generating pictures with various fonts and different noises of Chinese characters and automatically adjusting angles of the figure business card picture to generate a plurality of test samples.
5. The method as claimed in claim 1, wherein in step 3, three entities of name, work unit and position are identified based on the named entity identification method, and when more than two entities are included in the same text area, the entities are divided into single entities by using a Chinese lexical tool.
6. The method for extracting human business card-oriented occupational relationship as claimed in claim 1, wherein in step 4, the correction method comprises:
firstly, for the identified names, under the condition that the corresponding name pinyin exists in the picture of the character name card, Chinese characters with the same pinyin and the closest font are obtained from a Chinese character pinyin library for correction; under the condition that pinyin does not exist, selecting the Chinese character with the closest font by utilizing font similarity measurement to correct;
judging whether the identified working unit is a logo according to the position and the font of the identified working unit, and calling a logo identification algorithm to identify and correct the logo if the identified working unit is the logo; if the work unit is not a logo but contains English, Pinyin or address information of the work unit, using the English, Pinyin or address information as input, and calling an interface of a search engine to search and obtain a correct name of the work unit for correction; if the information is not contained, firstly, a language model is utilized to obtain characters, and then the characters with the closest character patterns are selected by utilizing the similarity measurement of the character patterns in the characters for correction;
selecting the job name with the minimum editing distance for the identified job according to the job dictionary library for correction; if the distance between the job names in the dictionary is larger than the preset threshold value, the corrected work unit name and the job to be corrected are input into the language model together to obtain the most probable character, and then the character type closest to the Chinese character is selected in the character by utilizing the character type similarity measurement for correction.
7. The method for extracting human business card-oriented occupational relationship according to claim 6, wherein in the step 5, for the presence of a plurality of work units and positions, the work units and positions are paired according to the proximity relationship of the positions; and if a certain position has no adjacent work units in the position, the identified logo is used as the work unit corresponding to the position.
8. A system for extracting the relationship between the person and the name card includes
The picture training test set unit is used for storing character name card pictures with various fonts and different noises, containing Chinese characters, and automatically adjusting angles of the character name card pictures to generate character name card pictures of a plurality of test samples;
the text knowledge base unit is used for storing a Chinese character pinyin base, a stroke order base and a dictionary of positions and unit names;
the character extraction unit is used for extracting characters in the character name card picture, obtaining character extraction results and outputting the character extraction results to the entity recognition unit, wherein the character extraction results comprise character areas;
the entity identification unit is used for identifying three entities, namely the name, the work unit and the position in the character extraction result; when the same character area contains two or more entities, the Chinese lexical tool is used for dividing the same into single entities;
the entity correcting unit is used for correcting the parts of the identified names, the identified working units and the identified positions, the confidence degrees of which are lower than the preset values;
the system comprises an arbitrary relationship generating unit, a database and a processing unit, wherein the arbitrary relationship generating unit is used for generating a plurality of triples of < person names, work units and positions > and storing the triples in the database;
the entity correction unit comprises a name correction subunit, a work unit correction subunit and a position correction subunit:
a name correction subunit, for correcting the recognized name by using the Chinese character with the same pinyin and the closest character pattern obtained from the Chinese character pinyin library under the condition that the corresponding name pinyin exists in the character name card picture; under the condition that pinyin does not exist, selecting the Chinese character with the closest font by utilizing font similarity measurement to correct;
the work unit correction subunit judges whether the identified work unit is a logo according to the position and the font of the work unit, and if the work unit is the logo, a logo identification algorithm is called to identify and correct the work unit; if the work unit is not a logo but contains English, Pinyin or address information of the work unit, using the English, Pinyin or address information as input, and calling an interface of a search engine to search and obtain a correct name of the work unit for correction; if the information is not contained, firstly, a language model is utilized to obtain characters, and then the characters with the closest character patterns are selected by utilizing the similarity measurement of the character patterns in the characters for correction;
the position correcting subunit is used for selecting the position name with the minimum editing distance for the identified position according to the position dictionary library for correction; if the distance between the job names in the dictionary is larger than the preset threshold value, the corrected work unit name and the job to be corrected are input into the language model together to obtain the most probable character, and then the character type closest to the Chinese character is selected in the character by utilizing the character type similarity measurement for correction.
CN202111544385.4A 2021-12-16 2021-12-16 Person name card oriented arbitrary relationship extraction method and system Pending CN114220112A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111544385.4A CN114220112A (en) 2021-12-16 2021-12-16 Person name card oriented arbitrary relationship extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111544385.4A CN114220112A (en) 2021-12-16 2021-12-16 Person name card oriented arbitrary relationship extraction method and system

Publications (1)

Publication Number Publication Date
CN114220112A true CN114220112A (en) 2022-03-22

Family

ID=80703048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111544385.4A Pending CN114220112A (en) 2021-12-16 2021-12-16 Person name card oriented arbitrary relationship extraction method and system

Country Status (1)

Country Link
CN (1) CN114220112A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187997A (en) * 2022-07-13 2022-10-14 厦门理工学院 Zero-sample Chinese character recognition method based on key component analysis

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187997A (en) * 2022-07-13 2022-10-14 厦门理工学院 Zero-sample Chinese character recognition method based on key component analysis
CN115187997B (en) * 2022-07-13 2023-07-28 厦门理工学院 Zero-sample Chinese character recognition method based on key component analysis

Similar Documents

Publication Publication Date Title
CN111753767B (en) Method and device for automatically correcting operation, electronic equipment and storage medium
CN110363194B (en) NLP-based intelligent examination paper reading method, device, equipment and storage medium
US8028230B2 (en) Contextual input method
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
EP1564675A1 (en) Apparatus and method for searching for digital ink query
CN101620680A (en) Recognition and translation method of character image and device
CN103577818A (en) Method and device for recognizing image characters
Clausner et al. Icfhr 2018 competition on recognition of historical arabic scientific manuscripts–rasm2018
CN107679070B (en) Intelligent reading recommendation method and device and electronic equipment
EP1743275A1 (en) Apparatus and method for handwriting recognition
CN105468732A (en) Image keyword inspecting method and device
CN109766891A (en) Obtain the method and computer readable storage medium of installations and facilities information
CN109857912A (en) A kind of font recognition methods, electronic equipment and storage medium
Pantke et al. An historical handwritten arabic dataset for segmentation-free word spotting-hadara80p
CN112560849A (en) Neural network algorithm-based grammar segmentation method and system
KR20200010650A (en) Deep Learning Based Automatic Gesture Recognition Method and System
CN115116082B (en) One-key gear system based on OCR (optical character recognition) algorithm
CN114220112A (en) Person name card oriented arbitrary relationship extraction method and system
CN114419636A (en) Text recognition method, device, equipment and storage medium
CN112749639B (en) Model training method and device, computer equipment and storage medium
KR101800975B1 (en) Sharing method and apparatus of the handwriting recognition is generated electronic documents
CN111582281B (en) Picture display optimization method and device, electronic equipment and storage medium
CN107563382A (en) The text recognition method of feature based capturing technology
Rahul et al. Deep reader: Information extraction from document images via relation extraction and natural language
KR100668032B1 (en) Document recording device and method using the electronic pen

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination