CN112015907A - Method and device for quickly constructing discipline knowledge graph and storage medium - Google Patents

Method and device for quickly constructing discipline knowledge graph and storage medium Download PDF

Info

Publication number
CN112015907A
CN112015907A CN202010833647.8A CN202010833647A CN112015907A CN 112015907 A CN112015907 A CN 112015907A CN 202010833647 A CN202010833647 A CN 202010833647A CN 112015907 A CN112015907 A CN 112015907A
Authority
CN
China
Prior art keywords
knowledge
knowledge points
points
correlation
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010833647.8A
Other languages
Chinese (zh)
Inventor
魏泽林
李雪
于丹
张帅
马壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Neusoft Education Technology Group Co ltd
Original Assignee
Dalian Neusoft Education Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Neusoft Education Technology Group Co ltd filed Critical Dalian Neusoft Education Technology Group Co ltd
Priority to CN202010833647.8A priority Critical patent/CN112015907A/en
Publication of CN112015907A publication Critical patent/CN112015907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Abstract

The invention provides a method and a device for quickly constructing a discipline knowledge graph and a storage medium. The method comprises the following steps: constructing a relational database; acquiring a catalogue and an index of the electronic edition teaching material through optical character recognition, and constructing relational data of the catalogue and the index in the electronic edition teaching material; calculating the correlation between any two data in a relational database based on the names of the knowledge points, and establishing a correlation relationship; the correlation between the knowledge points and the nodes is divided into belonged classified knowledge points or necessary knowledge points, and a knowledge graph is generated; acquiring low-dimensional vector expression of each knowledge point in the knowledge map by a map embedding method, calculating cosine similarity of vectors to obtain similarity between knowledge points, maintaining correlation relationship types according to the similarity between the knowledge points, and synchronizing data to a graphic database. The invention can quickly and comprehensively construct the discipline knowledge graph without marking a training set, and has higher universality.

Description

Method and device for quickly constructing discipline knowledge graph and storage medium
Technical Field
The invention relates to the technical field of data analysis and atlas construction, in particular to a method, a device and a storage medium for quickly constructing a discipline knowledge atlas.
Background
In the construction of the knowledge graph, two methods are generally adopted, one is manual construction, and the other is entity extraction and relationship extraction algorithm based on semantic recognition.
Generally speaking, manually constructing the knowledge graph usually depends on manual operations of people, and the process is relatively slow and inefficient. Because the number of entities of the knowledge points is large, the problems that the knowledge points are usually omitted or the relation between the knowledge points is not correctly labeled in manual construction and the like occur, and in addition, the number and the comprehensiveness of the constructed entities are greatly influenced by human subjectivity.
In addition, entity extraction and relationship extraction algorithms based on semantic recognition generally need to label data, which is not only high in cost and inaccurate in result, but also needs to be manually modified. Specifically, an entity extraction and relationship extraction algorithm based on semantic recognition firstly needs to perform data preprocessing, namely, noise data existing in text resources are cleaned, the text is segmented according to sections, paragraphs and punctuations according to sentences, and then word segmentation processing, lexical analysis and syntactic analysis are performed. On the basis of preprocessing, a semantic analysis tool is adopted to mark partial knowledge point entities and semantic relations, and the semantic relations are used as training samples. And (3) constructing a knowledge graph by adopting an entity extraction method, a semantic relation mining method and a knowledge fusion method, and finally carrying out manual correction. This method of semantic recognition analysis has a relatively good effect on building a graph between the individual characters in a novel book because the characters in the novel have a large number of relational sentences depicting the characters. However, for the teaching materials of high-class science and technology, because the relation description type characters among the knowledge points are less, and a large number of formulas, tables and the like are usually inserted among the characters of the teaching materials, the method for constructing the knowledge map by applying the natural language processing and the semantic analysis method is poor in accuracy and often needs a large amount of manual modification.
In conclusion, various methods have no good construction effect for the construction of knowledge point diagram spectrums of disciplines.
Disclosure of Invention
According to the technical problems that the manual construction method is low in efficiency, a large amount of training set data needs to be labeled and accurate prediction cannot be carried out and the like based on the semantic recognition method, the rapid construction method, the device and the storage medium of the discipline knowledge graph are provided.
The technical means adopted by the invention are as follows:
a rapid discipline knowledge graph construction method comprises the following steps:
constructing a relational database, wherein the relational database at least comprises four columns of knowledge point names, knowledge point types, belonging classification knowledge points and necessary knowledge points;
acquiring a catalogue and an index of an electronic edition teaching material through optical character recognition, and constructing relational data of the catalogue and the index in the electronic edition teaching material, wherein the catalogue of the electronic edition teaching material comprises courses, chapters, sections and corresponding page number information, and the index comprises a knowledge point name and the corresponding page number information;
calculating the correlation between any two data in a relational database based on the names of the knowledge points, and establishing a correlation relationship between the two data of which the correlation calculation result is higher than a threshold value;
acquiring the correlation between knowledge points and nodes from the relational data, marking the correlation between the knowledge points and the nodes as belonged classification knowledge points or necessary knowledge points, crawling the relational data in the relational database into a graphic database to generate a knowledge graph containing the knowledge points and the relations between the knowledge points;
acquiring low-dimensional vector expression of each knowledge point in the knowledge graph by a graph embedding method, calculating cosine similarity of vectors based on the low-dimensional vector expression, maintaining the relevant relation type according to the cosine similarity, determining the relevant relation type as a belonging classification knowledge point or a necessary knowledge point, and synchronizing data to a graph database.
Further, the knowledge point types are divided into: curriculum, chapter, section, knowledge point;
the courses correspond to the names of the teaching materials, the chapters correspond to the primary catalogues in the teaching material catalogues, the sections correspond to the secondary catalogues in the teaching material catalogues, and the knowledge points correspond to the knowledge points in the index knowledge points.
Further, if the directory includes three or more levels of directories, the part of directories are listed as knowledge points.
Further, the obtaining of the catalog and the index knowledge points of the electronic edition teaching material through optical character recognition and the building of the relational data of the catalog and the index knowledge points in the electronic edition teaching material include:
scanning a teaching material catalog and an index into picture files, carrying out optical character recognition on the picture files, and importing chapters and sections in the catalog and knowledge points in the index into a relational database;
and matching the page position of the directory with the page position of the knowledge point in the index to establish the correlation between the knowledge point and the node.
Further, the scanning of the teaching material catalog and the index into a picture file, the optical character recognition of the picture file, and the importing of the chapter, section and knowledge point in the index in the catalog into the relational database comprises: if the index knowledge point name is the same name as the chapter or section in the directory, the index knowledge point is merged to the chapter or section of the same name.
Further, the calculating the correlation between any two relational data based on the knowledge point names includes:
and calculating the correlation of the knowledge point names by using a TF-IDF algorithm.
Further, acquiring low-dimensional vector expression of each knowledge point in the knowledge graph by a graph embedding method, and calculating cosine similarity of vectors based on the low-dimensional vector expression, wherein the method comprises the following steps:
extracting low-dimensional dense vectors of knowledge points by a graph embedding method aiming at the knowledge graph data to be used as low-dimensional vector expression;
and performing cosine similarity calculation on the low-dimensional vector expression to obtain the similarity between the knowledge points.
A discipline knowledge graph rapid construction device comprises:
the database construction unit is used for constructing a relational database, and data in the relational database at least comprises four dimensions of knowledge point names, knowledge point types, belonged classification knowledge points and necessary knowledge points;
the data construction unit is used for constructing relational data of catalogues and indexes in the electronic edition teaching materials, wherein the catalogues and the indexes of the electronic edition teaching materials are obtained through optical character recognition, the catalogues of the electronic edition teaching materials comprise courses, chapters, sections and corresponding page number information, and the indexes comprise knowledge point names and corresponding page number information;
the correlation establishing unit is used for establishing a correlation between two data of which the correlation calculation result is higher than a threshold value, wherein the correlation is calculated based on the name of the knowledge point;
the knowledge map generating unit acquires the correlation between the knowledge points and the nodes from the relational data, divides the correlation between the knowledge points and the nodes into belonged classified knowledge points or necessary knowledge points, and crawls the data in the relational database into a graphic database to generate a knowledge map containing the relationship between the knowledge points and the knowledge points;
and the data maintenance unit is used for acquiring the low-dimensional vector expression of each knowledge point in the knowledge map, calculating the similarity between the knowledge points based on the low-dimensional vector expression, maintaining the type of the correlation relationship according to the similarity between the knowledge points, determining the knowledge points as belonging classification knowledge points or necessary knowledge points, and synchronizing the data to the graph database.
A computer-readable storage medium having a set of computer instructions stored therein; the set of computer instructions, when executed by a processor, implement a method for rapid building of a discipline knowledge graph as described in any one of the above.
Compared with the prior art, the invention has the following advantages:
1. the invention provides a rapid knowledge graph construction method, which has the advantages of normative entity construction and comprehensive relationship construction, can quickly and accurately construct and maintain the knowledge graph of a subject in the subject, and can assist teachers to quickly construct the knowledge graph.
2. In the process of establishing the knowledge graph among knowledge points, the invention establishes the knowledge graph of the initial edition by using the catalogues and indexes of the teaching materials in a standardized way; because the contents of the knowledge points of the directory and the index are written in a standard way, the page number is accurately positioned, and the establishing process of the method is quick and convenient; meanwhile, the method is more in line with the traditional teaching habits, the network relation between the knowledge points and the chapters is added on the basis of the tree structure of the teaching material catalog, and the relevance of the knowledge points is more accurately defined through further maintenance; in addition, the method has great universality, and all teaching materials containing the catalogues and the indexes can be used for constructing the basic knowledge graph in a quick and universal method for further maintenance.
3. The invention uses TFIDF method to calculate the correlation of knowledge point names, because the knowledge point names in the teaching materials contain certain information quantity, and the possibility that the relation exists between the knowledge points with the same keyword in the names is high, the correlation between the knowledge points can be quickly searched by using the method.
4. The method uses the GraphEmbedding algorithm to predict and iteratively push the links among the knowledge points, the maintenance mode can reduce the problems of incomplete relation mounting of the knowledge points and the like, the method calculates the low-dimensional vectors of the knowledge points in the knowledge graph through a random walk algorithm, the vectors can express the characteristics of the structures and the like of the knowledge points in the knowledge graph, and the potential relation among the knowledge points is mined through a collaborative filtering algorithm.
For the above reasons, the present invention can be widely popularized in the field of education.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow field diagram of the subject knowledge graph rapid construction method of the present invention.
FIG. 2a is a schematic diagram of the essential knowledge points in the present invention.
FIG. 2b is a diagram of classified knowledge points in the present invention.
FIG. 2c is a diagram illustrating a first combination relationship between necessary knowledge points and classified knowledge points.
FIG. 2d is a diagram illustrating a second combination relationship between the required knowledge points and the classified knowledge points.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in FIG. 1, the invention provides a rapid discipline knowledge graph construction method, which mainly comprises the following steps:
step one, a relational database is constructed, wherein the necessary columns of the database are 4: the first column is the name of the knowledge point and is also the unique identification of the knowledge point; the second column is a knowledge point type, and the entity types of the knowledge points are divided into four types according to granularity: the courses, chapters, sections and knowledge points respectively correspond to the names of the teaching materials, the first-level catalogues, the second-level catalogues and the knowledge points contained in the indexes (if the catalogues contain three or more levels, the catalogues are all listed as the knowledge points); the third column is affiliated classified knowledge points, which knowledge points each knowledge point belongs to are recorded, and the affiliation among the knowledge points is embodied; the fourth column is necessary knowledge points, and records the preposed knowledge points which need to be mastered when a user wants to master a certain knowledge point, so that the logic progressive relation in the learning process of the knowledge points is embodied. It is explained here that 4 different relations between knowledge points and knowledge points will be reflected by the combination of the two relations of belonging classification knowledge points and necessary knowledge points, see fig. 2.
And step two, in the algorithm using process, the method takes the catalogue and the index knowledge points of the teaching material as a structured data template, and constructs relational data from the catalogue and the index knowledge points in the electronic edition teaching material. The specific method is that the textbook catalog and the index are scanned to obtain a jpg file, and then ocr (optical character recognition) recognition is carried out on the picture. And importing the chapters and sections in the directory and the knowledge points in the index into a relational database, wherein the names of the knowledge points are knowledge point entities, the types of the labeled knowledge points are divided according to the entity types in the process 1, and if the index knowledge points are the same as the chapters or sections in the directory, the index knowledge points are merged into the chapters or sections with the same name. The page number position of the directory is then used to match the page number position of the knowledge point noted in the index. The matching method comprises the following steps: and establishing the dependency relationship of the courses, the chapters and the sections according to the directory structure, and establishing the correlation relationship between the knowledge points and the sections if the page number in the index is contained in the page number range of a certain section.
And step three, calculating the correlation by using a TF-IDF algorithm, and establishing a correlation between two knowledge points with high correlation. Because the name of the knowledge point contains a certain amount of information, the possibility that the relationship exists between the knowledge points with the same number of keywords in the name is high (for example, a certain relationship exists between Bayes' theorem and Bayes decision tree). And calculating the names of the knowledge points in the index by using a TF-IDF algorithm to establish a correlation between two knowledge points with high correlation.
And step four, the correlation relationship between the knowledge points and the nodes is divided into the affiliated classified knowledge points or the necessary knowledge points through manual operation by a knowledge map maintenance system, wherein the correlation relationship comprises a front-end website and a background database which are used for maintaining and managing the relationship between the knowledge points and the knowledge points. Then, the contents of the database are crawled into a neo4j database through a program to generate a knowledge graph containing knowledge points and relations among the knowledge points, wherein the neo4j database is a main carrier of the knowledge graph.
And step five, calculating the low-dimensional vector expression of the knowledge points by using a Graph Embedding (Graph Embedding) method and the like. The specific method is to extract the low-dimensional dense vector of each knowledge point from the established knowledge map data by a graphomegrading method, and the specific algorithm can select depwalk (deep walk: Online Learning of Social retrieval), node2vec (node2vec: Scalable Feature Learning for Networks), and GCN (graph relational network for Text classification) methods. Here we use the node2vec algorithm.
And step six, expressing the obtained knowledge point vectors, calculating the similarity between the knowledge points by using cosine similarity, representing the possibility of potential relation between the knowledge points, recommending the similar knowledge points to maintenance personnel, usually teachers, manually judging to maintain the types between the knowledge points as belonged classification knowledge points or necessary knowledge points and storing the classified knowledge points or necessary knowledge points into a relational database, and synchronizing the newly added data to the neo4j database.
The solution according to the invention is further illustrated by the following specific application examples.
In this embodiment, the machine learning of Zhou Shi Hua is used as a teaching material to construct a knowledge graph of weak relationships (undefined relationships). The method comprises the following specific steps:
1. constructing a relational database, wherein the necessary columns of the database are 4 columns: the first column is the name of the knowledge point and is also the unique identification of the knowledge point; the second column is a knowledge point type, and the entity types of the knowledge points are divided into four types according to granularity: curriculum, chapter, section, knowledge point; the third column is the affiliated category knowledge point; the fourth column is the required knowledge point.
2. The directory and index portions are first scanned and the processed photo is then text extracted using python open source packets cnocr. Wherein cnocr is a Python 3 package used for Chinese OCR, the cnocr carries a trained recognition model, the used recognition model is crnn, and the recognition accuracy is about 98.7%. The text data obtained through recognition is cleaned to a certain extent through a compiling program by means of a python development source data packet pandas, useless information appearing in some catalogues is removed and is arranged into standard structural data, the catalogues are crawled out with names of chapters, sections and teaching materials, and the catalogues are matched with indexes through the compiling program. If the page number of the index knowledge point A is larger than the page number of the nth chapter B and smaller than the page number of the (n + 1) th chapter, the associated knowledge point of the nth chapter B knowledge point is stored in the MySql database as A.
3. And calculating the correlation of the names of the knowledge points in the teaching material index by using a TF-IDF algorithm, and storing the relationship established among the knowledge points with high correlation into a database.
4. And logging in a front-end management page, modifying and correcting the structured knowledge graph data through the page, and in the later use process, in a campus scene, a teacher can also manage the knowledge graph through the management page.
5. And extracting information from the knowledge graph of the neo4j database according to an RDF format, and calculating 128-dimensional vector representation of the knowledge point by using a Node2vec algorithm. The Node2vec algorithm is used for generating a random walk sequence by controlling a random walk strategy through hyper-parameters p and q, wherein the random walk probability is as follows:
Figure BDA0002638899830000081
wherein d istxIs the shortest path distance between vertex p and vertex q.
The random walk mode is derived by a deepwalk algorithm, and then a Skip-Gram algorithm calculates a vector. The optimization function is as follows:
Figure BDA0002638899830000082
6. calculating by using the vector of each knowledge point through a cosine similarity calculation method, wherein the calculation result is used as the correlation between the knowledge points:
Figure BDA0002638899830000083
wherein the link between two knowledge points with a higher degree of correlation is more likely to exist. The knowledge points with possible relations are pushed to a management system, and the relations among the knowledge points are corrected and maintained through a front-end management interface. We again use the structured data crawl program to maintain the relationships between the newly added knowledge points into the neo4j database. In specific uses that follow, steps following the 4 th procedure may be used iteratively to further maintain the knowledge-graph.
Finally, we constructed a knowledge graph of the machine learning disciplines of 590 entities, 1026 relations.
Corresponding to the subject knowledge graph rapid construction method, the application also provides a device for rapidly constructing the subject knowledge graph, which comprises the following steps:
the database construction unit is used for constructing a relational database, and data in the relational database at least comprises four dimensions of knowledge point names, knowledge point types, belonged classification knowledge points and necessary knowledge points;
the data construction unit is used for constructing relational data of catalogues and indexes in the electronic edition teaching materials, wherein the catalogues and the indexes of the electronic edition teaching materials are obtained through optical character recognition, the catalogues of the electronic edition teaching materials comprise courses, chapters, sections and corresponding page number information, and the indexes comprise knowledge point names and corresponding page number information;
the correlation establishing unit is used for establishing a correlation between two data of which the correlation calculation result is higher than a threshold value, wherein the correlation is calculated based on the name of the knowledge point;
the knowledge map generating unit acquires the correlation between the knowledge points and the nodes from the relational data, divides the correlation between the knowledge points and the nodes into belonged classified knowledge points or necessary knowledge points, and crawls the data in the relational database into a graphic database to generate a knowledge map containing the relationship between the knowledge points and the knowledge points;
and the data maintenance unit is used for acquiring the low-dimensional vector expression of each knowledge point in the knowledge map, calculating the similarity between the knowledge points based on the low-dimensional vector expression, maintaining the type of the correlation relationship according to the similarity between the knowledge points, determining the knowledge points as belonging classification knowledge points or necessary knowledge points, and synchronizing the data to the graph database.
The present application also discloses a computer-readable storage medium having a set of computer instructions stored therein; the set of computer instructions, when executed by a processor, implement a method for rapid building of a discipline knowledge graph as described in any one of the above.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A rapid discipline knowledge graph construction method is characterized by comprising the following steps:
constructing a relational database, wherein the relational database at least comprises four columns of knowledge point names, knowledge point types, belonging classification knowledge points and necessary knowledge points;
acquiring a catalogue and an index of an electronic edition teaching material through optical character recognition, and constructing relational data of the catalogue and the index in the electronic edition teaching material, wherein the catalogue of the electronic edition teaching material comprises courses, chapters, sections and corresponding page number information, and the index comprises a knowledge point name and the corresponding page number information;
calculating the correlation between any two data in a relational database based on the names of the knowledge points, and establishing a correlation relationship between the two data of which the correlation calculation result is higher than a threshold value;
acquiring the correlation between knowledge points and nodes from the relational data, marking the correlation between the knowledge points and the nodes as belonged classification knowledge points or necessary knowledge points, crawling the relational data in the relational database into a graphic database to generate a knowledge graph containing the knowledge points and the relations between the knowledge points;
acquiring low-dimensional vector expression of each knowledge point in the knowledge graph by a graph embedding method, calculating cosine similarity of vectors based on the low-dimensional vector expression, maintaining the relevant relation type according to the cosine similarity, determining the relevant relation type as a belonging classification knowledge point or a necessary knowledge point, and synchronizing data to a graph database.
2. The rapid discipline knowledge graph construction method according to claim 1, wherein the knowledge point types are divided into: curriculum, chapter, section, knowledge point;
the courses correspond to the names of the teaching materials, the chapters correspond to the primary catalogues in the teaching material catalogues, the sections correspond to the secondary catalogues in the teaching material catalogues, and the knowledge points correspond to the knowledge points in the index knowledge points.
3. The method of claim 2, wherein if the directory contains three or more levels of directories, the part of directories are listed as knowledge points.
4. The rapid discipline knowledge graph building method according to claim 1, wherein the obtaining of the catalogues and the index knowledge points of the electronic edition textbook through the optical character recognition and the building of the relational data of the catalogues and the index knowledge points in the electronic edition textbook comprise:
scanning a teaching material catalog and an index into picture files, carrying out optical character recognition on the picture files, and importing chapters and sections in the catalog and knowledge points in the index into a relational database;
and matching the page position of the directory with the page position of the knowledge point in the index to establish the correlation between the knowledge point and the node.
5. The method for rapidly building a discipline knowledge graph according to claim 4, wherein the steps of scanning a teaching material catalog and an index into a picture file, carrying out optical character recognition on the picture file, and importing chapters, sections and knowledge points in the index into a relational database comprise: if the index knowledge point name is the same name as the chapter or section in the directory, the index knowledge point is merged to the chapter or section of the same name.
6. The discipline knowledge graph rapid construction method according to claim 1, wherein the calculating the correlation between any two relational data based on the knowledge point names comprises:
and calculating the correlation of the knowledge point names by using a TF-IDF algorithm.
7. The fast discipline knowledge graph construction method according to claim 1, wherein the obtaining of the low-dimensional vector expression of each knowledge point in the knowledge graph through the graph embedding method, and the calculating of the cosine similarity of the vectors based on the low-dimensional vector expression comprise:
extracting low-dimensional dense vectors of knowledge points by a graph embedding method aiming at the knowledge graph data to be used as low-dimensional vector expression;
and performing cosine similarity calculation on the low-dimensional vector expression to obtain the similarity between the knowledge points.
8. A device for quickly constructing a discipline knowledge graph is characterized by comprising:
the database construction unit is used for constructing a relational database, and data in the relational database at least comprises four dimensions of knowledge point names, knowledge point types, belonged classification knowledge points and necessary knowledge points;
the data construction unit is used for constructing relational data of catalogues and indexes in the electronic edition teaching materials, wherein the catalogues and the indexes of the electronic edition teaching materials are obtained through optical character recognition, the catalogues of the electronic edition teaching materials comprise courses, chapters, sections and corresponding page number information, and the indexes comprise knowledge point names and corresponding page number information;
the correlation establishing unit is used for establishing a correlation between two data of which the correlation calculation result is higher than a threshold value, wherein the correlation is calculated based on the name of the knowledge point;
the knowledge map generating unit acquires the correlation between the knowledge points and the nodes from the relational data, divides the correlation between the knowledge points and the nodes into belonged classified knowledge points or necessary knowledge points, and crawls the data in the relational database into a graphic database to generate a knowledge map containing the relationship between the knowledge points and the knowledge points;
and the data maintenance unit is used for acquiring the low-dimensional vector expression of each knowledge point in the knowledge map, calculating the similarity between the knowledge points based on the low-dimensional vector expression, maintaining the type of the correlation relationship according to the similarity between the knowledge points, determining the knowledge points as belonging classification knowledge points or necessary knowledge points, and synchronizing the data to the graph database.
9. A computer-readable storage medium having a set of computer instructions stored therein; the set of computer instructions, when executed by a processor, implement the subject knowledge graph rapid construction method of any one of claims 1-7.
CN202010833647.8A 2020-08-18 2020-08-18 Method and device for quickly constructing discipline knowledge graph and storage medium Pending CN112015907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010833647.8A CN112015907A (en) 2020-08-18 2020-08-18 Method and device for quickly constructing discipline knowledge graph and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010833647.8A CN112015907A (en) 2020-08-18 2020-08-18 Method and device for quickly constructing discipline knowledge graph and storage medium

Publications (1)

Publication Number Publication Date
CN112015907A true CN112015907A (en) 2020-12-01

Family

ID=73504958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010833647.8A Pending CN112015907A (en) 2020-08-18 2020-08-18 Method and device for quickly constructing discipline knowledge graph and storage medium

Country Status (1)

Country Link
CN (1) CN112015907A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312472A (en) * 2021-05-20 2021-08-27 北京黑岩方碑网络科技有限公司 Intelligent collaborative knowledge map recording and displaying system
CN116912867A (en) * 2023-09-13 2023-10-20 之江实验室 Teaching material structure extraction method and device combining automatic labeling and recall completion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287327A (en) * 2019-07-03 2019-09-27 中山大学 Path adapted information map automatic generation method based on teaching material catalogue and Directed Hypergraph
CN110941723A (en) * 2019-11-18 2020-03-31 广东宜学通教育科技有限公司 Method, system and storage medium for constructing knowledge graph
CN111046194A (en) * 2019-12-31 2020-04-21 重庆和贯科技有限公司 Method for constructing multi-mode teaching knowledge graph
CN111339318A (en) * 2020-02-29 2020-06-26 西安理工大学 University computer basic knowledge graph construction method based on deep learning
CN111382843A (en) * 2020-03-06 2020-07-07 浙江网商银行股份有限公司 Method and device for establishing upstream and downstream relation recognition model of enterprise and relation mining
CN111475629A (en) * 2020-03-31 2020-07-31 渤海大学 Knowledge graph construction method and system for math tutoring question-answering system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287327A (en) * 2019-07-03 2019-09-27 中山大学 Path adapted information map automatic generation method based on teaching material catalogue and Directed Hypergraph
CN110941723A (en) * 2019-11-18 2020-03-31 广东宜学通教育科技有限公司 Method, system and storage medium for constructing knowledge graph
CN111046194A (en) * 2019-12-31 2020-04-21 重庆和贯科技有限公司 Method for constructing multi-mode teaching knowledge graph
CN111339318A (en) * 2020-02-29 2020-06-26 西安理工大学 University computer basic knowledge graph construction method based on deep learning
CN111382843A (en) * 2020-03-06 2020-07-07 浙江网商银行股份有限公司 Method and device for establishing upstream and downstream relation recognition model of enterprise and relation mining
CN111475629A (en) * 2020-03-31 2020-07-31 渤海大学 Knowledge graph construction method and system for math tutoring question-answering system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312472A (en) * 2021-05-20 2021-08-27 北京黑岩方碑网络科技有限公司 Intelligent collaborative knowledge map recording and displaying system
CN116912867A (en) * 2023-09-13 2023-10-20 之江实验室 Teaching material structure extraction method and device combining automatic labeling and recall completion
CN116912867B (en) * 2023-09-13 2023-12-29 之江实验室 Teaching material structure extraction method and device combining automatic labeling and recall completion

Similar Documents

Publication Publication Date Title
CN110399457B (en) Intelligent question answering method and system
CN106776711B (en) Chinese medical knowledge map construction method based on deep learning
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN110968782B (en) User portrait construction and application method for learner
CN107180045B (en) Method for extracting geographic entity relation contained in internet text
CN110929038B (en) Knowledge graph-based entity linking method, device, equipment and storage medium
CN111475623A (en) Case information semantic retrieval method and device based on knowledge graph
Zanasi Text mining and its applications to intelligence, CRM and knowledge management
CN110443571A (en) The method, device and equipment of knowledge based map progress resume assessment
US10417267B2 (en) Information processing terminal and method, and information management apparatus and method
CN109947952B (en) Retrieval method, device, equipment and storage medium based on English knowledge graph
CN108319583B (en) Method and system for extracting knowledge from Chinese language material library
Aletras et al. Computing similarity between items in a digital library of cultural heritage
CN106980664B (en) Bilingual comparable corpus mining method and device
CN106874397B (en) Automatic semantic annotation method for Internet of things equipment
CN113377916B (en) Extraction method of main relations in multiple relations facing legal text
CN110888991A (en) Sectional semantic annotation method in weak annotation environment
CN112015907A (en) Method and device for quickly constructing discipline knowledge graph and storage medium
CN115438195A (en) Construction method and device of knowledge graph in financial standardization field
CN106372232B (en) Information mining method and device based on artificial intelligence
KR102256007B1 (en) System and method for searching documents and providing an answer to a natural language question
Schöch et al. Smart Modelling for Literary History
Schirmer et al. A new dataset for topic-based paragraph classification in genocide-related court transcripts
CN114491209A (en) Method and system for mining enterprise business label based on internet information capture
Sirajzade et al. The LuNa Open Toolbox for the Luxembourgish Language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 116000 room 206, no.8-9, software garden road, Ganjingzi District, Dalian City, Liaoning Province

Applicant after: Neusoft Education Technology Group Co.,Ltd.

Address before: 116000 room 206, no.8-9, software garden road, Ganjingzi District, Dalian City, Liaoning Province

Applicant before: Dalian Neusoft Education Technology Group Co.,Ltd.