CN114092700A

CN114092700A - Ancient character recognition method based on target detection and knowledge graph

Info

Publication number: CN114092700A
Application number: CN202111414456.9A
Authority: CN
Inventors: 徐昊; 李沿增; 吴垒; 史大千; 刁晓蕾
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2022-02-25
Anticipated expiration: 2041-11-25
Also published as: CN114092700B

Abstract

The invention provides an ancient character recognition method based on target detection and a knowledge graph, which belongs to the technical field of image processing and recognition, and comprises the steps of carrying out part marking and data preprocessing on ancient character picture data so as to expand an ancient character image data set; constructing an ancient character part identification model, and identifying parts and position coordinates of the parts contained in an ancient character picture; constructing an ancient character part position relation recognition model according to the position coordinates of the parts and the parts contained in the recognized ancient character pictures, and obtaining the position relation of the parts so as to judge the character structure; constructing an ancient character knowledge map; and deducing a character result according to the position relation of the components through the ancient character knowledge graph. The classification of the ancient words is inferred by identifying the components present in the ancient digital images. The invention can increase the number of recognized classifications by a method based on target detection and knowledge graph reasoning, and can classify more ancient characters.

Description

Ancient character recognition method based on target detection and knowledge graph

Technical Field

The invention belongs to the technical field of image processing and recognition, and particularly relates to an ancient character recognition method based on target detection and a knowledge graph.

Background

OCR (optical character recognition) technology has been widely used in the office field from letters, numbers to chinese characters. For example, hundred degree character recognition can be used for recognizing multiple languages such as Chinese, English and French in the picture by taking pictures and screenshot. Although many companies are invested in the field of character recognition and make great contributions on the market, the technology has not been fully applied to the recognition work of ancient characters. At present, ancient characters and calligraphy in the past of Qin dynasty (seal character) can be identified through similarity calculation. Through the neural network model, the common handwritten Chinese character pictures can be identified. However, the prior art method is not suitable for ancient characters before seal character (Qin dynasty), such as carapace-bone-script, golden-script, and warring-script. Because the existing oracle and golden language has about 4500 Chinese characters, wherein the number of the examined and released Chinese characters is only more than 2000, the occurrence frequency of each single character is too small, and the number of variant characters is large, the data size in a training set is insufficient, and the neural network can not effectively extract the character characteristics, so that the model can hardly correctly recognize the ancient characters. In addition, due to the problem of too little training word data, most ancient characters are difficult to classify.

The data that present ancient characters's discernment was used are mostly artifical imitative image data, and are not outstanding to real ancient characters rubbing recognition effect. For example: an oracle big data platform proposed by the Anyang university school constructs an HWOBC (character in hand) based oracle character database, wherein 83245 character-level samples are contained, 3881 character categories are included, and a traditional deep learning classification network is adopted for learning classification. Is not suitable for rubbing ancient character recognition. University of Chinese academy of sciences provides a nearest neighbor classification oracle character recognition based on depth measurement learning, and the oracle character recognition is classified by 2583 classes and the accuracy is 93.37% of a handwritten oracle character pattern configuration data set; the oracle script handwriting data set was classified into 261 categories with an accuracy of 92.43%. Is not suitable for rubbing ancient character recognition, and is detailed in the literature: zhangyong, Yangqing, Liu super, "application of nearest neighbor classification method based on depth measurement learning in Chinese character recognition", 2019 International document analysis and recognition conference (ICDAR), 2019. The 'two-stage identification of oracle characters' proposed by the university of the Japanese standing library is an item against the identification of copy fonts, and is detailed in the literature: menglin, two-stage identification of oracle text (ICIAP), 2017, uses 30 oracle text templates (rubbings) and 29 original OBIs (about 576 characters) to measure the performance of the proposed method with an accuracy of 90%. AlexNet is used for identifying oracle characters in 2019, 184 oracle characters can be identified, and the accuracy rate is 92.3%.

The difficulty in solving the above problems is that the long tail effect of the ancient character image data is serious, there are many varieties (the currently obtained data includes carapace script 2800+, golden script 2000+ types), there are few samples (only 1 sample is stored in some varieties), and it is difficult to copy, which is contrary to the requirement of deep learning classification for a large number of samples of data, so that only partial data can be classified by a general deep learning classification algorithm, and not all 2800+ types can be classified efficiently. Secondly, the ancient character images have a large amount of noise, the quality of a data set is not high, the existing image noise reduction method cannot well process the ancient character images, and a noise reduction method needs to be designed according to special noise of the ancient characters.

Disclosure of Invention

The purpose of the invention is: the method is based on target detection and knowledge graph to deduce and obtain the ancient character classification by identifying the components in the ancient character image. The number of recognized classifications can be increased through a method based on target detection and knowledge graph reasoning, and more ancient characters can be classified.

In order to achieve the purpose, the invention adopts the following technical scheme: an ancient character recognition method based on target detection and a knowledge graph is characterized by comprising the following steps:

step one, carrying out part marking and data preprocessing on ancient character picture data to expand an ancient character image data set

Collecting an ancient character picture, obtaining an ancient character sample image, traversing the ancient character sample image, carrying out part marking on the ancient characters in each ancient character sample image by using a marking frame, and taking corresponding marks as part classification labels;

preprocessing the data of the ancient character sample image marked with the part classification label to expand an ancient character image data set; the preprocessing method comprises the steps of picture size adjustment, color gamut transformation and picture turning;

step two, constructing an ancient character part recognition model, training the ancient character part recognition model by using the ancient character image data set expanded in the step one, detecting the ancient character part, and recognizing the part contained in the ancient character picture and the position coordinate of the part;

the main feature extraction network in the ancient character component recognition model is generated by replacing a CSPDarknet53 network in a YOLOv4 algorithm with a MobileNet network, the reinforced feature extraction network in the YOLOv4 algorithm is composed of a spatial pyramid pooling network SPP and a path aggregation network PANet, and a yologead prediction network is adopted for prediction to recognize an ancient character component;

step three, according to the position coordinates of the part and the part contained in the ancient character picture identified in the step two, constructing an ancient character part position relation identification model to obtain the position relation of the part so as to judge the character structure;

the Chinese character structure of the ancient character part position relation recognition model comprises a single character, an upper and lower structure, a left and right structure, an enclosing structure, a left, middle and right structure, an upper, middle and lower structure, a left, right, upper and lower structure and an upper, lower, left and right structure;

step four, constructing an ancient character knowledge graph;

and fifthly, deducing a character result according to the position relation of the components through the ancient character knowledge graph.

As a preferred embodiment of the present invention, the first step further comprises:

cutting an ancient character sample image marked with a component classification label to obtain a single character sample set, wherein the single character sample is a component image with a label, and each single character sample corresponds to a component;

preprocessing a single character sample, wherein the preprocessing method comprises picture size adjustment, color gamut conversion and picture turning;

after preprocessing, splicing and expanding the original data set of the single character sample by an image splicing method.

Further, according to the ancient character recognition method based on the target detection and the knowledge graph, when the part coordinates meet the following relation, the character structure is a left-right structure;

when the part coordinates satisfy the following relationship, the character structure is an up-down structure;

when the part coordinates satisfy the following relationship, the character structure is an enclosing structure;

when the part coordinates satisfy the following relationship, the character structure is a left-right-up-down structure;

when the part coordinates satisfy the following relationship, the character structure is an up-down left-right structure;

when the part coordinates satisfy the following relationship, the character structure is an upper, middle and lower structure;

when the part coordinates satisfy the following relationship, the character structure is a left-middle-right structure;

wherein x is_aAnd y_aRespectively the abscissa and ordinate, x, of the centre point of the first component_bAnd y_bRespectively the abscissa and ordinate, x, of the centre point of the second component_a1And y_a1Respectively the abscissa and ordinate, x, of the upper left corner of the first part_a2And y_a2Respectively the abscissa and ordinate, x, of the lower right corner of the first part_b1And y_b1Respectively the abscissa and ordinate, x, of the upper left corner of the second part_b2And y_b2Respectively the abscissa and ordinate, x, of the lower right-hand corner of the second part_cAnd y_cRespectively the abscissa and ordinate, x, of the centre point of the third component_c1And y_c1Respectively the abscissa and ordinate, x, of the upper left corner of the third part_c2And y_c2Respectively the abscissa and ordinate of the lower right corner of the third part.

Further, the process of constructing the ancient character knowledge graph is as follows:

(1) determining a data format: the data comprises literal characters, containing parts, containing part number and literal structure;

(2) screening the data to ensure that each piece of data has the same attribute to obtain structured data which is easy to read by a computer;

(3) uniformly storing the screened data in a structured xlsx file, defining entities and relations,

relationship names	Description of relationships
		Comprises (Contain)	The components being included in each character

(4) Construction of a knowledge graph using the JAVA language

Defining an ancient character knowledge graph ontology through a function build _ CRKG _ ontology and constructing an example through the function build _ instance, wherein the step of defining the ancient character knowledge graph ontology comprises defining character classes and component classes, defining the relation between the classes and defining the attribute owned by the classes in the ontology; building examples includes building all character instances, building all component instances, and building character-to-component relationships.

Further, the character structure in the ancient character part position relation recognition model further comprises: the structure about half surrounds, surrounds structure about, half surrounds left and right upper structure and triangle structure.

Through the design scheme, the invention can bring the following beneficial effects: the invention provides an ancient character recognition method based on target detection and knowledge graph, which removes a great deal of noise of an ancient character image, uses an advanced target detection method compared with the traditional detection method, replaces the traditional method for recognizing the whole character by recognizing components in the ancient character, constructs a comprehensive knowledge graph of the ancient character, facilitates the inquiry and reasoning of the ancient character, and increases the classification quantity of the recognition while ensuring the recognition accuracy.

In conclusion, the ancient character recognition method based on target detection and knowledge graph provided by the invention can improve the type of ancient character recognition, construct an omnibearing ancient character knowledge base, protect the ancient characters and facilitate the research and recognition of archaeological researchers.

Drawings

The above and other features, advantages and aspects of the disclosed embodiments will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale:

FIG. 1 is a flow chart of a method for identifying ancient characters based on target detection and knowledge graph according to an embodiment of the present invention;

FIG. 2 is a visual display effect diagram of a knowledge graph constructed by a single ancient character according to an embodiment of the invention;

FIG. 3 is a block diagram showing the positional relationship of components in the embodiment of the present invention;

FIG. 4 is a structural diagram of the structure of the present invention in which the position relationship of the components is left, right, up and down;

FIG. 5 is a structural diagram of the structure of the present invention in which the position relationship of the components is up, down, left and right;

fig. 6 is a recognition result of the recognition component based on object detection in the embodiment of the present invention.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to fig. 1, 2, 3, 4, 5, 6, and steps in the specification. While certain embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present invention. It should be understood that the drawings and the embodiments of the present invention are illustrative only and are not intended to limit the scope of the present invention.

The invention has proposed an ancient characters recognition method based on target detection and knowledge map, the recognition to the ancient characters mainly needs to start from two aspects, need to include the part to discern and part position relation discerns in the ancient characters picture on the one hand, the part is also called the component, refer to the word-forming unit with function of assembling the characters that is made up of the stroke, the part refers to the structural component making up the oracle characters in the invention, divide the part into 199 kinds mainly in the invention, mainly include , then, ten, parts such as the mouth, etc.; on the other hand, the identified components are inferred through the knowledge graph to obtain an inference result, and the method specifically comprises the following steps:

the first step, part labeling and data preprocessing are carried out on the ancient character picture data to expand the ancient character image data set

in the data preprocessing of the ancient character picture data, each character has less than ten average data amount of the ancient characters, and the ancient characters also comprise variant characters. Therefore, the method expands the data set by randomly preprocessing the ancient character pictures, and the method for randomly preprocessing mainly comprises the steps of picture size adjustment, color gamut transformation, picture turning and picture splicing. Labeling the parts in each oracle picture by means of labelimg software (image labeling software) (namely labeling the ancient characters in each ancient character picture by using a labeling box and using the corresponding labels as part classification labels); for picture splicing, firstly, processing the marked oracle pictures, specifically: cutting off each part in the marked oracle pictures, storing each part as a single character, wherein the width and the height of the single character picture are respectively represented by the left upper corner of a character, the transverse direction represents an x axis and the longitudinal direction represents a y axis, so that the coordinates of the left upper corner and the right lower corner of the single character picture are respectively (0,0) and (width and height), and the resolution of the single character picture is magnified according to multiple times by using a bilinear interpolation method. Carrying out random resolution amplification processing on each oracle component picture intercepted from a single oracle picture, and then splicing each individual component; the picture turning mainly refers to turning left and right of an original picture, the size adjustment of the picture mainly refers to zooming the original picture, the color gamut change refers to changing the saturation and brightness of the original picture, and 2-4 pictures subjected to data preprocessing are randomly selected for random splicing and are spliced one by one according to the structure. In the random splicing, a two-dimensional matrix with a fixed size of 600 x 600px is selected as a background picture, then random width and height from the origin point at the upper left corner of the background picture are generated through random positions, random width and random height are used for representing the random width and the random height, the sub-matrix of the specified background position is replaced by the part picture after preprocessing through the randomly generated width and height, the position of a label is adjusted, the coordinates at the upper left corner of a single character picture are changed from (0,0) to (random x, random), and the coordinates at the lower right corner are changed from (width, height) to (width + random x, height + random). When the second and later parts are placed, the overlapping judgment is needed, if the second and later parts are overlapped with the previous part, a position coordinate needs to be randomized again, and the performance problem of repeated randomization for many times can not occur basically because the background matrix is large.

When splicing according to the structure, taking the placement of components according to the left and right structures as an example, in order to simulate a real ancient character data set, the gap between two pictures is also randomly generated, namely, a gap exists between some components, namely gap >0, and some components are slightly overlapped, namely gap < 0. When splicing is not carried out in an overlapping mode, only the new component needs to be placed on the right, and if splicing is carried out in the overlapping mode, matrix data of the overlapping portion needs to be subjected to exclusive OR processing, so that the overlapping effect is achieved. When calculating the subsequent component position and label position, the height h1 of the original image and the height h2 of the new component need to be calculated firstly, if h1> h2, the upper and lower inner edge distance padding of the new component needs to be kept consistent with the height h1, the upper inner edge distance is called padding top, then the new component is spliced with the original image, the width originx and the height of the original image are high, the upper left corner coordinate of the new label position is changed into (originx + gap, padding top), the lower right corner coordinate is changed into (originx + gap + width, padding top + height); if h1< h2, the upper and lower inner edge distances padding are required to be completed for the original image, the upper left corner coordinates (origin 1) and the lower right corner coordinates (origin 2) of the label of the original image are initially set, the upper left corner coordinates (origin 1, origin 1+ padding top) and the lower right corner coordinates (origin 2, origin 2+ padding top) of the updated label are adjusted, and the original data set is expanded through the splicing transformation.

Secondly, building an ancient character part recognition model

In the second step of building the ancient character part recognition model, a neural network model needs to be built, and then model training is carried out through the data obtained in the first step. After comparing several models of SSD, Retianet, Fast-RCNN and RefineDet, the Yolov4 neural network model has good performance in all tests, higher precision, faster speed and less overlapping, and sensitivity to small targets and tolerant noise, from the viewpoint of studying oracle, the Yolov4 neural network model is taken as an optimal model, the invention provides a neural network model for identification based on improved Yolov4, the accuracy of component identification is further improved, and the whole framework is mainly divided into three parts:

the first part mainly performs primary feature extraction through a main feature extraction network, the extracted features mainly comprise features such as textures, colors and shapes, and three primary effective feature layers can be obtained after the main feature extraction network is utilized.

The second part of the enhanced feature extraction network has the function of performing enhanced feature extraction, and mainly by using an SPP network structure and a PANet network structure, the SPP structure performs pooling processing by using 4 pooling kernels with different sizes, namely {1 × 1,5 × 5,9 × 9,13 × 13} and then splices different feature maps, so that the receptive field can be greatly increased, and the most significant contextual features can be separated. And then, a feature extraction network is enhanced by using PANET, which is an example segmentation algorithm proposed in 2018 and is mainly characterized in that repeated feature extraction can be performed on features, and three primary effective feature layers are subjected to feature fusion to obtain three more effective feature layers.

The third part of the prediction network has the function of obtaining a prediction result by utilizing a more effective characteristic layer, and ancient character parts are identified by mainly carrying out prediction through yologead.

In the first part, a mobilenet series network is used for replacing CSPdark net53 in YOLOv4 to perform feature extraction, and three primary feature layers with the same shape in effective feature layers are subjected to enhanced feature extraction, so that the mobilenet series can be replaced into YOLOv 4.

In the preliminary effective feature layer construction, the same output structure as that of the CSPdarknet53 needs to be found and output, the effective feature layer of the mobilenet specific shape is transmitted to the enhanced feature layer, where the following shapes are mainly extracted, where the output result of the first layer is 52 × 256, the output result of the second layer is 26 × 512, and the output result of the third layer is 13 × 1024, the three effective feature layers are used to replace the effective feature layer of the CSPdarknet 4 main network CSPdarknet53, and then the three preliminary effective feature layers are used to perform further enhanced feature extraction.

In the network structure of YOLOv4, for the enhanced feature extraction network, the 3 × 3 volume blocks in PAnet are also replaced with mobilene separable convolutions to reduce the number of parameters, and the 3 × 3 volume blocks are replaced with mobilene depth separable convolutions, so that the number of parameters is reduced from 5000 to 1000 ten thousand.

And then training and predicting the oracle images through an improved YoloV4 neural network model, wherein the total data amount of the oracle images is 13106, 9174 is used as a training set, 3932 is used as a test set, wherein the parts are 199 in total, the recognized oracle classifications are 2755, the recognition result mainly comprises the labels of the parts in the input oracle images and the position coordinates of the parts, the labels are pinyin displays of the parts recognized in the images as shown in figure 6, the label corresponding to each part classification is a modern character which is recognized and corresponding by a plurality of ancient character experts, the values of the position coordinates are the upper left corner coordinate and the lower right corner coordinate of the part respectively, and the ancient character part position relation recognition model is constructed through the recognized part coordinates.

In the third step of constructing the model for identifying the positional relationship of the ancient character components, the ancient character structure in the ancient character model defined by the invention mainly comprises a single character, an upper and lower structure, a left and right structure, a surrounding structure, a left, middle and right structure, an upper, middle and lower structure, a left, right, upper, lower, left and right structure, wherein the single character is formed by separate components, such as ' middle ', the upper and lower structure is in the upper and lower relationship, such as ' li ', the left and right structure is in the left and right relationship, such as ' play ', the surrounding structure is in the reverse direction ', the left, middle and right structure is in the left and right structure, such as ' shed ', the upper, middle and lower structure is in the upper, middle and lower position, such as ' ', the left side of the left and right, such as ' orange ', is formed by the wood character of the component, the right side of the component is formed by the component and the mouth, the upper, the left side of the component is formed by the component month, referring to the attached fig. 3 of the specification, a single character, an up-down structure, a left-right structure, a surrounding structure, a left-middle-right structure, an up-middle-down structure, a left-right-up-down structure, and a left-up-down-left-right structure, where each rectangle in the figure represents a component, and for the convenience of understanding, fig. 4 shows a structural diagram where the positional relationship of the components is the left-right-up-down structure; fig. 5 is a block diagram showing a positional relationship between components in an up-down and left-right configuration, wherein A, B, C in fig. 4 and 5 represents a component a and a component B, and a component C, respectively, and the component B and the component C are arranged up and down and the component B and the component C are located entirely on the right side of the component a when the positional relationship between the components satisfies the up-down and left-right configurations, and the component a and the component B are arranged up and down and the component a and the component B are located entirely on the left side of the component C when the positional relationship between the components satisfies the up-down and left-right configurations.

And judging the structural relationship of the components by using an anchor point mode according to the labels and the positions of the components which are identified in the second step.

In the following, two parts are taken as examples, wherein x_aAnd y_aRespectively the abscissa and ordinate, x, of the centre point of the first component_bAnd y_bRespectively the abscissa and ordinate, x, of the centre point of the second component_a1And y_a1Respectively the abscissa and ordinate, x, of the upper left corner of the first part_a2And y_a2Respectively the abscissa and ordinate, x, of the lower right corner of the first part_b1And y_b1Respectively the abscissa and ordinate, x, of the upper left corner of the second part_b2And y_b2Respectively the abscissa and ordinate of the lower right-hand corner of the second component, and so on, x_cAnd y_cRespectively the abscissa and ordinate, x, of the centre point of the third component_c1And y_c1Respectively the abscissa and ordinate, x, of the upper left corner of the third part_c2And y_c2Respectively the abscissa and ordinate of the lower right corner of the third part. Here, several determination methods of the position relationship are mainly listed

When a part is detected, the output character structure is a single-body character;

when the part coordinates satisfy the following relationship, the character structure is a left-right structure;

fourthly, constructing a comprehensive ancient character knowledge graph, wherein the fourth step of constructing the oracle-bone character knowledge graph comprises the following steps:

(1) the data format is determined according to the requirement of the query, wherein the data format table only shows part of data, which mainly comprises literal characters, containing components, the number of containing components, a font structure and the like. Due to the excessive data volume of the entire data sheet, the following is a partial display of the oracle-part correspondence sheet

Character ID	Character of the characters	Including the number of parts	Comprising a component	Character structure
					jia_c_0030	Chinese corktree bark	2	Wood, can	Left and right structure
jia_c_0031	Duration of sleep	3	Wood mouth	Left-right structure-up-down structure
					jia_c_0032	Woman	1	Woman	Single character
jia_c_0033	Nu (Nu)	2	For woman being in turn	Left and right structure
					jia_c_0034
		2	Woman	Left and right structure
					jia_c_0035	Good taste	2	For woman	Left and right structure
jia_c_0036	Delusions
			2	Death and woman	Upper and lower structure
jia_c_0037						3	Woman, man, mouth	Top and bottom structure-left and right structure
		jia_c_0038		3	Woman				Left-middle-right structure
jia_c_0039	Human being					1	Human being	Single character
		jia_c_0040	From	2	Human being				Left and right structure

(2) Then, after the data is screened, the condition that some data are incomplete is mainly deleted, the incomplete data generally refers to incomplete structures or incomplete components, and the required structured data is obtained by reserving the incomplete data, that is, each piece of data has the same attribute, so that the structured data which is easy to read by a computer is obtained.

(3) The acquired data is processed uniformly, stored in a structured xlsx file, and entities and relationships are defined, and several entities and descriptions thereof are mainly listed below.

Several entities and descriptions thereof

Entity name	Description of entities
		Character (Character _ zh)	Oracle font
Component (radial _ zh)	Component for composing oracle-bone inscription
		Structure (structure _ zh)	Character structure of oracle-bone inscription
Relationship names	Description of relationships
		Comprises (Contain)	The components being included in each character

(4) Construction of a knowledge graph using the JAVA language

The following briefly introduces the creation process, which first creates a text class and a component class of oracle, then associates the entity and text data through a DatatTypeProperty function, associates the entity with other entities such as characters and components as inclusion relationship through an ObjectProperty function, creates a component entity through a build _ radial _ instance () function, and creates an entity relationship through a build _ relationships () function

Defining ancient character knowledge graph ontology by using function build _ CRKG _ ontology

I. A text class and a part class are defined. In the ontology model, two general classes, i.e., a text class and a component class, are defined first. All the specific sub-classes of text in the text table data are then added under the text class, and "Character _ eg", i.e. "Character english" is used as an example for each oracle text class. Similarly, all the specific component subclasses in the component table data are added under the component class, and "radial _ eg", i.e., "component english" is taken as an example of each component.

Define relationships between classes. The relationship of a word to a part is a word containing part, and thus the relationship of a word to a part is defined as "containing".

Define attributes owned by classes in the ontology. Including "Character _ eg", "Character _ zh", "radial _ all _ num", "Structure _ eg", "Structure _ zh", "radial _ eg", i.e. "Character English representation", "Character Chinese representation", "containing number of parts", "Character Structure English representation", "Character Structure Chinese representation", "part Chinese representation", and "part English representation".

An example is constructed through the function build _ instance, and the following is a detailed description

I. All Character instances are constructed, and attributes 'Character _ eg', 'Character _ zh', 'radial _ all _ num', 'Structure _ eg', 'Structure _ zh' are added to each Character instance.

Build all component instances, add attributes "radial _ zh", "radial _ eg" for each component instance.

And III, constructing a relation between the character and the part, and establishing a relation between the character instance and the part instance included by the character.

And fifthly, reasoning according to the constructed knowledge graph to obtain a final result, specifically, inputting a picture by a user, detecting the position coordinates of the part and the part contained in the picture through the neural network model (namely an ancient character part identification model) constructed in the second step, calculating the position relation of the part according to the ancient character part position relation identification model through the position coordinates, and then reasoning the position relation of the identified part and the part through the knowledge graph to obtain the modern Chinese corresponding to the oracle picture. If the user inputs the oracle picture of the '3 mou' character, the parts contained in the oracle picture are recognized as 'wood', 'wood' and 'day' and the coordinates of the upper left corner and the lower right corner of the three parts are obtained, the character structure is calculated to be a top-bottom structure according to the coordinates obtained and the ancient character part position relation recognition model, and finally the modern Chinese character corresponding to the oracle is deduced to be '3 mou' according to the obtained parts 'wood', 'wood' and 'day' and the position structure top-bottom structure according to the knowledge graph constructed in the fourth step.

Claims

1. An ancient character recognition method based on target detection and a knowledge graph is characterized by comprising the following steps:

step four, constructing an ancient character knowledge graph;

2. The method for ancient character recognition based on target detection and knowledge-graph according to claim 1, wherein the first step further comprises:

3. The method of claim 1 for ancient character recognition based on object detection and knowledge-graph, wherein: when the part coordinates satisfy the following relationship, the character structure is a left-right structure;

4. The method of claim 1, wherein the process of constructing an ancient character knowledge graph is as follows:

entity name Description of entities Character (Character _ zh) Character pattern of ancient characters Component (radial _ zh) Parts for composing ancient characters Structure (structure _ zh) Character structure of ancient characters Relationship names Description of relationships Comprises (Contain) The components being included in each character

(4) Construction of a knowledge graph using the JAVA language

5. The method of claim 1, wherein the ancient character structure in the ancient character part position relation recognition model further comprises: the structure about half surrounds, surrounds structure about, half surrounds left and right upper structure and triangle structure.