Disclosure of Invention
The invention mainly aims to provide a retrieval method, a retrieval device, retrieval equipment and a storage medium based on an English knowledge graph, and aims to improve the effectiveness of retrieval.
In order to achieve the above object, the present invention provides a retrieval method based on an english knowledge graph, which comprises the following steps:
acquiring English knowledge information to be retrieved;
extracting reference keyword information in the English knowledge information to be retrieved, and querying corresponding query suggestion information carrying individual category information in a preset index file according to the reference keyword information;
inquiring related reference retrieval English knowledge information in a target file library based on a body webpage language according to the inquiry suggestion information, wherein the reference retrieval English knowledge information comprises wiki vocabulary information, fixed collocation information, example sentences and translation information thereof, resource information and knowledge point information, and the related reference retrieval English knowledge information comprises reference retrieval English knowledge information with derivation relation, composite relation, inclusion relation or key relation among individual category information;
grouping the reference retrieval English knowledge information according to the inquired wiki vocabulary information, fixed collocation information, example sentences and translation information thereof, resource information and knowledge point information;
and sequencing the grouped reference retrieval English knowledge information, and generating target retrieval English knowledge map information according to the sequenced reference retrieval English knowledge information.
Preferably, before acquiring the english knowledge information to be retrieved, the method further includes:
acquiring current English knowledge information, and judging the character length of the current English knowledge information;
when the length of the characters reaches a preset threshold value, carrying out English dependency grammar analysis on the current English knowledge information, determining a grammar structure of the current English knowledge information according to an analysis result, carrying out keyword division according to the grammar structure, and obtaining current keyword information according to a division result;
extracting attribute information in the current keyword information, and judging whether the current keyword information is at least one of a wiki entry, an ontology class name, a member alias and a resource entry according to the attribute information;
and when the current keyword information belongs to at least one of a wiki entry, an ontology class name, a member alias and a resource entry, taking the current keyword information as the English knowledge information to be retrieved.
Preferably, the querying related reference retrieval english knowledge information in the target document library based on the ontology web language according to the query suggestion information includes:
searching application scene information in a preset area according to the individual category information in the inquiry suggestion information;
searching individual information and related instance information which establish a relationship with the individual category information in the target file library according to the application scene information;
judging the type information of the example information, extracting the data attribute information in the example information according to the type information, and taking the data attribute information as the associated reference to retrieve English knowledge information.
Preferably, before searching the associated reference and retrieving english knowledge information in the target document library based on the ontology webpage language according to the query suggestion information, the method further comprises:
extracting user information in English knowledge information to be retrieved, wherein the user information comprises weak knowledge point information, grade information, a learning basis and learning outline information of a user;
determining difficulty level information of English subject knowledge to be retrieved according to the grade information and the learning basis;
and filtering a preset file library according to the weak knowledge point information, the difficulty level information and the learning outline information, and taking the filtered file library as the target file library.
Preferably, before extracting reference keyword information in the english knowledge information to be retrieved and querying corresponding query suggestion information carrying individual category information in a preset index file according to the reference keyword information, the method further includes:
acquiring historical English knowledge information and associated resource information, and extracting historical keyword information in the historical English knowledge information and the associated resource information;
and storing the historical keyword information as triple information in a storage mode of a terrain database, establishing index information according to the triple information, and using the established triple information as the preset index file.
Preferably, before the step of sorting the grouped reference retrieval english knowledge information and generating the target retrieval english knowledge map information according to the sorted reference retrieval english knowledge information, the method further includes:
acquiring current user behavior information, wherein the current user behavior information comprises preset examination points, difficulty degrees, grade information, version information, updating time information and preset examination paper information;
setting preset weight values for the preset examination points, the difficulty level, the grade information, the version information, the updating time information and the preset examination paper information;
generating a sorting network model by a relevancy sorting algorithm according to the preset examination point, the difficulty level, the grade information, the version information, the updating time information and the preset examination paper information after the weight value is set;
the step of sorting the grouped reference retrieval English knowledge information and generating target retrieval English knowledge map information according to the sorted reference retrieval English knowledge information comprises the following steps:
and sorting the grouped reference retrieval English knowledge information through the sorting network model, and generating target retrieval English knowledge map information according to the sorted reference retrieval English knowledge information.
Preferably, after the grouped reference retrieval english knowledge information is sorted and the target retrieval english knowledge map information is generated according to the sorted reference retrieval english knowledge information, the method further includes:
sorting the grouped reference retrieval English knowledge information to obtain the sorted reference retrieval English knowledge information;
and acquiring preset white list and preset black list information, filtering the sorted reference retrieval English knowledge information according to the preset white list and preset black list information and a bloom filter, and generating target retrieval English knowledge map information from the filtered reference retrieval English knowledge information.
In order to achieve the above object, the present invention provides an english-language-knowledge-map-based search device, comprising:
the acquisition module is used for acquiring English knowledge information to be retrieved;
the extraction module is used for extracting reference keyword information in the English knowledge information to be retrieved and inquiring corresponding inquiry suggestion information carrying individual category information in a preset index file according to the reference keyword information;
the query module is used for querying related reference retrieval English knowledge information in a target document library based on the body webpage language according to the query suggestion information, wherein the reference retrieval English knowledge information comprises wiki vocabulary information, fixed collocation information, example sentences and translation information thereof, resource information and knowledge point information, and the related reference retrieval English knowledge information comprises reference retrieval English knowledge information with derivation relation, composite relation, inclusion relation or key relation among individual category information;
the grouping module is used for grouping the reference retrieval English knowledge information according to the inquired wiki vocabulary information, the fixed collocation information, the example sentences and the translation information thereof, the resource information and the knowledge point information;
and the sequencing module is used for sequencing the grouped reference retrieval English knowledge information and generating target retrieval English knowledge map information according to the sequenced reference retrieval English knowledge information.
In addition, in order to achieve the above object, the present invention also provides an english-language-knowledge-map-based retrieval apparatus, including: the English knowledge map based retrieval method comprises the steps of a memory, a processor and an English knowledge map based retrieval program which is stored in the memory and can run on the processor, wherein the English knowledge map based retrieval program is configured to realize the steps of the English knowledge map based retrieval method.
In addition, in order to achieve the above object, the present invention further provides a storage medium, wherein the storage medium stores an english-language-knowledge-map-based retrieval program, and the english-language-knowledge-map-based retrieval program implements the steps of the english-language-knowledge-map-based retrieval method as described above when executed by a processor.
The retrieval method based on the English knowledge graph obtains English knowledge information to be retrieved; extracting reference keyword information in the English knowledge information to be retrieved, and querying corresponding query suggestion information carrying individual category information in a preset index file according to the reference keyword information; inquiring related reference retrieval English knowledge information in a target file library based on a body webpage language according to the inquiry suggestion information, wherein the reference retrieval English knowledge information comprises wiki vocabulary information, fixed collocation information, example sentences and translation information thereof, resource information and knowledge point information, and the related reference retrieval English knowledge information comprises reference retrieval English knowledge information with derivation relation, composite relation, inclusion relation or key relation among individual category information; grouping the reference retrieval English knowledge information according to the inquired wiki vocabulary information, fixed collocation information, example sentences and translation information thereof, resource information and knowledge point information; and sequencing the grouped reference retrieval English knowledge information, and generating target retrieval English knowledge map information according to the sequenced reference retrieval English knowledge information, so that the retrieval content of the user is efficiently associated with knowledge network retrieval based on the body webpage language file library, and the retrieval accuracy and effectiveness are greatly improved.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may comprise a Display screen (Display), an input unit such as keys, and the optional user interface 1003 may also comprise a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a retrieval program based on an english knowledge graph.
In the device shown in fig. 1, the network interface 1004 is mainly used for connecting an external network and performing data communication with other network devices; the user interface 1003 is mainly used for connecting user equipment and performing data communication with the equipment; the device calls a retrieval program based on the english knowledge graph stored in the memory 1005 through the processor 1001 and executes the implementation method of retrieval based on the english knowledge graph provided by the embodiment of the invention.
Based on the hardware structure, the embodiment of the retrieval method based on the English knowledge graph is provided.
Referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of the retrieval method based on the english knowledge graph according to the present invention.
In a first embodiment, the english knowledge graph-based retrieval method includes the following steps:
and step S10, acquiring English knowledge information to be retrieved.
It should be noted that the execution subject of the embodiment is the retrieval platform, and may also be a server for performing data processing, which is not limited in the embodiment.
In this embodiment, the english knowledge information to be retrieved includes keyword information to be retrieved, a retrieval sentence, and the like, for example, a current ongoing grammar in junior, middle, and second grades.
Step S20, extracting reference keyword information in the english knowledge information to be retrieved, and querying corresponding query suggestion information carrying individual category information in a preset index file according to the reference keyword information, wherein the individual category information includes at least one of grammar, sentence, phrase, vocabulary, title, common error, and multimedia file.
It should be noted that the individual classification of the english subject includes: the English knowledge search system comprises a plurality of English subjects, wherein each English subject comprises a grammar, a sentence, a phrase, words, a question, common errors, multimedia files and the like, each English subject comprises a plurality of individual words, such as a word, and the individual words comprise a plurality of attributes, such as phonetic symbols, parts of speech, usage, example sentences, grades, related books, whether the English subject is four-six-grade words or not, the number of times of a tested place and the like, the preset index file is a corresponding relation between preset keyword information and individual category information, and the corresponding individual category information can be inquired according to English knowledge information to be searched, which is input by a user.
In the present embodiment, the form of the query can be flexibly determined by the query operator by Lucene, and four forms are adopted as shown in fig. 3:
1. and inquiring the operator, wherein the keywords are in a relation of AND and adopt a form of + word1+ word 2;
2. an or query operator, wherein the keywords are in an OR relationship, and the operator takes the form of word1word 2;
3. a not query operator, wherein a form of + word1-word2 is adopted when a certain keyword is excluded;
4. a like query operator, a fuzzy query, in the form of word-over.
Step S30, querying relevant reference retrieval english knowledge information in a target document library based on the ontology web language according to the query suggestion information, where the reference retrieval english knowledge information includes wiki vocabulary information, fixed collocation information, example sentences and their translation information, resource information, and knowledge point information, where the relevant reference retrieval english knowledge information includes reference retrieval english knowledge information having a derivation relationship, a composite relationship, an inclusion relationship, or a key relationship among individual category information, and may also include other relationships, which is not limited in this embodiment.
It will be appreciated that retrieving English knowledge information, such as in English, with associated references includes: the two vocabularies have a derivative relationship or a compound relationship, a certain vocabulary and a certain sentence have an inclusion relationship, a certain vocabulary and a topic have an investigation or important vocabulary relationship, and the like.
It should be noted that the wiki vocabulary information is entry information searched from wiki vocabulary entries, the resource information includes product information such as test paper and training video data, the query is executed in a manner that a user inputs english knowledge points, as shown in fig. 4, a query result is displayed, english knowledge points are input, the output content includes wiki vocabulary information of current english knowledge points, and the specific content includes wiki vocabulary information, fixed collocation information, example sentences and their translation information, resource information, and knowledge point information of the knowledge points.
In the present embodiment, as shown in fig. 5, a rectangle represents a record of string type in the ontology, an ellipse represents an individual in the ontology, a solid arrow represents an object attribute relationship in the ontology, and a dotted arrow represents a data attribute relationship in the ontology.
And step S40, grouping the reference retrieval English knowledge information according to the inquired wiki vocabulary information, resource information and keyword information.
In order to display the search result in the form of Extensible Markup Language (XML), the search result may be organized in a manner of constructing different virtual subgraphs according to different core words.
And step S50, sorting the grouped reference retrieval English knowledge information, and generating target retrieval English knowledge map information according to the sorted reference retrieval English knowledge information.
It should be noted that, the grouped reference retrieval english knowledge information can be sorted by two ways, the first way is sorted according to the Lucene rating, the second way is sorted according to the association sorting algorithm, the algorithm adds the common examination points, difficulty level, grade, version, update time and weight calculation of large examination paper, the algorithm uses the knowledge points as the vertexes to form a network, and the more complex the network relationship, the harder the corresponding knowledge points are to be mastered by the user.
In this embodiment, the retrieval platform is provided with a plurality of preset processing modules, for example, a preprocessing module, a filtering module, a retrieving module, a sorting module and an intervention module shown in fig. 6, where the preprocessing module is configured to accurately convert content input by a user into a machine-recognizable retrieval language, the filtering module is configured to cache a common retrieval result by using a caching technique and avoid filtering the retrieval result by using 2 ways, i.e., full-table scanning and full-index scanning, to improve retrieval speed, the retrieving module is configured to establish a Lucene index, store a body file by using a Terrain Database (Terrain Database, TDB), and search the index and the body file by using SPARQL, the sorting module is configured to sort the retrieved preliminary result according to an improved association sorting algorithm without manual intervention, and the intervention module, the method is used for intervening based on the white list and black list principle, the black list and white list data are stored in a database, and the bloom filter algorithm can be adopted for filtering on the premise of ensuring certain accuracy.
In this embodiment, a learning manner that a system can actively inject resources into a learner according to user characteristics is implemented, so as to improve the search quality of the learner, and make the search result more efficient, comprehensive, detailed, accurate, and clear, and the system recommends an individualized learning path and learning resources for the student, thereby effectively promoting the individualized education of the learner, as shown in the flow diagram of retrieval shown in fig. 7, an index is first established, and LARQ has 3 usage modes: the method comprises The steps of character indexing, theme resource indexing and graph indexing, establishing corresponding indexes by using character indexes of an LARQ (Web Ontology Language, OWL) according to a Web Ontology Language (The OWL) file, then carrying out SPARQL query, obtaining corresponding results according to SPARQL query statements, then grouping, and finally carrying out XML output result organization, namely constructing different virtual subgraphs according to different core words, then carrying out sequencing and intervention, removing illegal contents or retrieval results which do not accord with platform policy regulation, and finally carrying out XML output. And outputting the retrieval result subjected to the grouping sequencing artificial intervention as a corresponding XML organization form.
The embodiment specifically realizes the knowledge graph system of the English field body, compared with the retrieval result of a learner on a search engine, the retrieved result does not need to be summarized by users, the result is comprehensive, detailed, accurate and clear, and the search quality is improved.
The retrieval method based on the English knowledge graph obtains English knowledge information to be retrieved; extracting reference keyword information in the English knowledge information to be retrieved, and querying corresponding query suggestion information carrying individual category information in a preset index file according to the reference keyword information; inquiring related reference retrieval English knowledge information in a target file library based on a body webpage language according to the inquiry suggestion information, wherein the reference retrieval English knowledge information comprises wiki vocabulary information, fixed collocation information, example sentences and translation information thereof, resource information and knowledge point information, and the related reference retrieval English knowledge information comprises reference retrieval English knowledge information with derivation relation, composite relation, inclusion relation or key relation among individual category information; grouping the reference retrieval English knowledge information according to the inquired wiki vocabulary information, fixed collocation information, example sentences and translation information thereof, resource information and knowledge point information; and sequencing the grouped reference retrieval English knowledge information, and generating target retrieval English knowledge map information according to the sequenced reference retrieval English knowledge information, so that the retrieval content of the user is efficiently associated with knowledge network retrieval based on the body webpage language file library, and the retrieval accuracy and effectiveness are greatly improved.
In an embodiment, as shown in fig. 8, a second embodiment of the english knowledge base retrieval method according to the present invention is proposed based on the first embodiment, and before the step S10, the method further includes:
acquiring current English knowledge information, and judging the character length of the current English knowledge information.
And when the character length reaches a preset threshold value, carrying out English dependency grammar analysis on the current English knowledge information, determining a grammar structure of the current English knowledge information according to an analysis result, carrying out keyword division according to the grammar structure, and obtaining current keyword information according to a division result.
And extracting attribute information in the current keyword information, and judging whether the current keyword information is at least one of a wiki entry, an ontology class name, a member alias and a resource entry according to the attribute information.
And when the current keyword information belongs to at least one of a wiki entry, an ontology class name, a member alias and a resource entry, taking the current keyword information as the English knowledge information to be retrieved.
In this embodiment, to perform preprocessing on english knowledge information to be retrieved, in order to improve the accuracy of the preprocessing, the following four ways or combinations may be adopted for improvement:
1) recording the request content of each user, and generating a request frequency table;
2) giving related retrieval suggestions aiming at the request contents which are possible to be wrong by a user;
3) if the retrieval content is longer, syntactic analysis can be carried out to obtain a key retrieval theme;
4) if the retrieval content is a wiki vocabulary entry, an ontology class name or a member alias or a resource entry, directly performing retrieval of the corresponding type, and overlapping the retrieval content by taking the ontology as an entrance.
In one embodiment, the step S30 includes:
step S301, searching application scene information in a preset area according to the individual category information in the query suggestion information.
It should be noted that, the application scenario information is preset, and the application scenario is divided according to the actual needs of the user, for example, if the user (junior school students and senior high school students) inputs a certain vocabulary, in general, the user wants to obtain the chinese-english interpretation of the vocabulary, the fixed collocation or usage related to the vocabulary, the usage of the vocabulary in the sentence (giving example sentences and their interpretations), and the grammar related to the vocabulary.
Step S302, searching individual information and related instance information which are related to the individual category information in the target file library according to the application scene information.
It is understood that the instance information includes information such as sentences and resources related to the individual category information.
Step S303, judging the type information of the example information, extracting the data attribute information in the example information according to the type information, and taking the data attribute information as the associated reference to retrieve English knowledge information.
In order to realize resource retrieval in the ontology, feature extraction is performed on data attribute information in the instance information which can be acquired, and the extracted features are used as associated reference retrieval english knowledge information, so that data retrieval is performed more comprehensively.
The specific query statement information SPARQL query statement comprises the following query steps:
1) judging the type of the query word according to the query word;
2) finding individuals establishing a relationship with the query word;
3) determining the type of the individuals;
4) the data attribute values of these individuals are found.
In an embodiment, before the step S203, the method further includes:
extracting user information in English knowledge information to be retrieved, wherein the user information comprises weak knowledge point information, grade information, a learning basis and learning outline information of a user; determining difficulty level information of English subject knowledge to be retrieved according to the grade information and the learning basis; and filtering a preset file library according to the weak knowledge point information, the difficulty level information and the learning outline information, and taking the filtered file library as the target file library.
In the embodiment, before a retrieval result is obtained, contents irrelevant to retrieval in the database need to be filtered, the resource filtering should follow the following three principles, and the push product should really aim at weak links of students; the difficulty level of the pushed products is adapted to the grade of the student and the learning basis, and the pushed products are in accordance with the requirements of the learning outline, so that the retrieval information meeting the requirements of the user is retrieved based on the learning mode of the user.
According to the scheme provided by the embodiment, the user information in the English knowledge information to be retrieved is extracted, the current ontology base is filtered according to the user information, and the target information is retrieved from the filtered ontology base, so that the retrieval efficiency is improved.
In an embodiment, as shown in fig. 9, a third embodiment of the english knowledge base retrieval method according to the present invention is proposed based on the first embodiment or the second embodiment, and in this embodiment, the method further includes, before step S20:
acquiring historical English knowledge information and associated resource information, and extracting historical keyword information in the historical English knowledge information and the associated resource information;
and storing the historical keyword information as triple information in a storage mode of a terrain database, establishing index information according to the triple information, and using the established triple information as the preset index file.
In this embodiment, a TDB is used to store an ontology file, a triplet in the ontology is converted and a Lucene index is established to perform LARQ retrieval, the LARQ has multiple usage modes including a character index, a theme resource index and a graph index, and a corresponding index is established by using the character index of the LARQ according to an OWL file, so that the retrieval of the ontology is realized, and the retrieval efficiency is improved.
In a specific implementation, the preset index file may be an english ontology, in order to construct an english ontology, a database is first imported in batch, where the database includes an oxford phrase verb dictionary, an oxford english idiom dictionary, an oxford english collocation dictionary, a fourth version of an oxford high-order english-chinese dual-interpretation dictionary, a usage dictionary, a concise english dictionary, a zhao real english grammar, an english national corpus, a sentence library, and the like, an english dependency grammar analysis and a hadoop distributed computing framework are used to realize automatic analysis of sentences, so as to obtain a common binary grammar structure, and a structure itself and a binary word relationship are used as tag attributes, for example, for a My (dog, My) nsubj (links, dog) advmod (links, also) ROOT (ROOT, links) xcomp (links, earth) dobj (earage) are obtained according to a grammar analysis result, and may be used as a key word tag, pos, (dog, My), nsubj, (keys, dog), advmod, (keys, also), ROOT, (ROOT, keys), xcomp, (keys, eating), dobj (eating, usage)
In the process of importing the database, extracting label keywords according to the labeling standard of the knowledge body, importing the resources into a database in batch, traversing the keywords of all entries, segmenting words and establishing indexes, and finally further performing data mining, searching and extracting on the index database to find the mutual relation, as shown in fig. 10, importing the extracted keywords of a dictionary, a sentence library, a language book and the like in batch, classifying the extracted keywords into classified information such as words, phrases, the sentence library, a collocation word library, knowledge points and the like according to individual classification, and establishing index management according to a wiki model.
In the specific implementation, the files in the index library correspond to the entries of wikipedia one-to-one, wherein the file attributes include file names, keywords, contents, uniform resource locators, time information, state information and the like, and other attribute information can be enveloped, so that the file attributes can be flexibly adjusted according to actual requirements.
It should be noted that the association of the entries needs to be established on the index of the keyword, and if the entries are used as nodes and the indexes are used as connecting lines, a complex graph structure, that is, knowledge graph information, can be formed, as shown in fig. 11, and a relationship graph between the entries and the entries can be established according to the associated indexes.
In an embodiment, before the step S50, the method further includes:
step S501, obtaining current user behavior information, wherein the current user behavior information comprises preset examination points, difficulty degrees, grade information, version information, updating time information and preset examination paper information.
It should be noted that the preset examination points are common examination points, and the preset examination paper information is large-scale examination paper information, such as end-of-term examinations and the like.
Step S502, setting preset weight values for the preset examination points, the difficulty level, the grade information, the version information, the updating time information and the preset examination paper information.
Step S503, generating a sorting network model by the preset examination point, the difficulty level, the grade information, the version information, the updating time information and the preset examination paper information after the weight value is set through a relevancy sorting algorithm.
In the embodiment, the first mode groups are sorted according to the Lucene scores, the second mode groups are sorted according to the improved association sorting algorithm, common examination points, difficulty degrees, grades, versions, updating time and weight calculation of large examination paper are added into the algorithm, the knowledge points are used as vertexes in the algorithm to form a network, and the more complex the network relationship, the harder the corresponding knowledge point user can master, so that the effectiveness of the retrieval result is improved.
In one embodiment, the step S50 includes:
and sorting the grouped reference retrieval English knowledge information through the sorting network model, and generating target retrieval English knowledge map information according to the sorted reference retrieval English knowledge information.
In an embodiment, after the step S50, the method further includes:
sorting the grouped reference retrieval English knowledge information to obtain the sorted reference retrieval English knowledge information;
and acquiring preset white list and preset black list information, filtering the sorted reference retrieval English knowledge information according to the preset white list and preset black list information and a bloom filter, and generating target retrieval English knowledge map information from the filtered reference retrieval English knowledge information.
On the basis of the established subject field ontology, index and storage ontology files are respectively established by using Lucene and TDB, then results are retrieved by using SPARQL query statements, intervention grouping and sequencing are carried out, the retrieval results are output to a corresponding XML organization form, intervention is carried out based on the white list and black list principles, black and white list data are stored in a relational database, and the query speed is accelerated by using a bloom filter algorithm under the condition that a certain accuracy rate can be ensured.
According to the scheme provided by the embodiment, the search result is filtered again in a black list and white list mode, the search result which is illegal or not in accordance with the policy regulation is removed, and the search result is displayed in an XML organization form, so that the effectiveness of the search result is improved.
The invention further provides a retrieval device based on the English knowledge map.
Referring to fig. 12, fig. 12 is a schematic diagram of functional modules of a first embodiment of an english knowledge base retrieval device according to the present invention.
In a first embodiment of the retrieval apparatus based on english knowledge graph of the present invention, the retrieval apparatus based on english knowledge graph comprises:
and the obtaining module 10 is used for obtaining English knowledge information to be retrieved.
It should be noted that the execution subject of the embodiment is the retrieval platform, and may also be a server for performing data processing, which is not limited in the embodiment.
In this embodiment, the english knowledge information to be retrieved includes keyword information to be retrieved, a retrieval sentence, and the like, for example, the gravity acceleration of junior middle school second grade, and the like.
The extracting module 20 is configured to extract reference keyword information in the english knowledge information to be retrieved, and query corresponding query suggestion information carrying individual category information in a preset index file according to the reference keyword information, where the individual category information includes at least one of a grammar, a sentence, a phrase, a vocabulary, a title, a common error, and a multimedia file.
It should be noted that the individual classification of the english subject includes: the English knowledge search system comprises a plurality of English subjects, wherein each English subject comprises a grammar, a sentence, a phrase, words, a question, common errors, multimedia files and the like, each English subject comprises a plurality of individual words, such as a word, and the individual words comprise a plurality of attributes, such as phonetic symbols, parts of speech, usage, example sentences, grades, related books, whether the English subject is four-six-grade words or not, the number of times of a tested place and the like, the preset index file is a corresponding relation between preset keyword information and individual category information, and the corresponding individual category information can be inquired according to English knowledge information to be searched, which is input by a user.
In the present embodiment, the form of the query can be flexibly determined by the query operator by Lucene, and four forms are adopted as shown in fig. 3:
1. and inquiring the operator, wherein the keywords are in a relation of AND and adopt a form of + word1+ word 2;
2. an or query operator, wherein the relation between keywords is an or, and the form of word1word2 is adopted;
3. a not query operator, wherein a form of + word1-word2 is adopted when a certain keyword is excluded;
4. a like query operator, a fuzzy query, in the form of word-over.
The query module 30 is configured to query, according to the query suggestion information, associated reference retrieval english knowledge information in a target document library based on the ontology web language, where the reference retrieval english knowledge information includes wiki vocabulary information, fixed collocation information, example sentences and translation information thereof, resource information, and knowledge point information, where the associated reference retrieval english knowledge information includes reference retrieval english knowledge information having a derivation relationship, a compound relationship, an inclusion relationship, or a key relationship among individual category information, and may further include other relationships, which is not limited in this embodiment.
It will be appreciated that retrieving English knowledge information, such as in English, with associated references includes: the two vocabularies have a derivative relationship or a compound relationship, a certain vocabulary and a certain sentence have an inclusion relationship, a certain vocabulary and a topic have an investigation or important vocabulary relationship, and the like.
It should be noted that the wiki vocabulary information is entry information searched from wiki entries, the resource information includes product information such as test paper and training video data, the query is executed in a manner that a user inputs an english knowledge point, as shown in fig. 4, a query result is displayed, the input is an english knowledge point, the output content includes wiki entry information of the current english knowledge point, and the specific content is a specific explanation of the knowledge point, resources related to the english knowledge point, and keywords related to the english knowledge point.
In the present embodiment, as shown in fig. 5, a rectangle represents a record of string type in the ontology, an ellipse represents an individual in the ontology, a solid arrow represents an object attribute relationship in the ontology, and a dotted arrow represents a data attribute relationship in the ontology.
And the grouping module 40 is configured to group the reference retrieval english knowledge information according to the queried wiki vocabulary information, resource information, and keyword information.
In order to display the search result in the form of Extensible Markup Language (XML), the search result may be organized in a manner of constructing different virtual subgraphs according to different core words.
And the sorting module 50 is used for sorting the grouped reference retrieval English knowledge information and generating target retrieval English knowledge map information according to the sorted reference retrieval English knowledge information.
It should be noted that, the grouped reference retrieval english knowledge information can be sorted by two ways, the first way is sorted according to the Lucene rating, the second way is sorted according to the association sorting algorithm, the algorithm adds the common examination points, difficulty level, grade, version, update time and weight calculation of large examination paper, the algorithm uses the knowledge points as the vertexes to form a network, and the more complex the network relationship, the harder the corresponding knowledge points are to be mastered by the user.
In this embodiment, the retrieval platform is provided with a plurality of preset processing modules, for example, a preprocessing module, a filtering module, a retrieving module, a sorting module and an intervention module shown in fig. 6, where the preprocessing module is configured to accurately convert content input by a user into a machine-recognizable retrieval language, the filtering module is configured to cache a common retrieval result by using a caching technique and avoid filtering the retrieval result by using 2 ways, i.e., full-table scanning and full-index scanning, to improve retrieval speed, the retrieving module is configured to establish a Lucene index, store a body file by using a Terrain Database (Terrain Database, TDB), and search the index and the body file by using SPARQL, the sorting module is configured to sort the retrieved preliminary result according to an improved association sorting algorithm without manual intervention, and the intervention module, the method is used for intervening based on the white list and black list principle, the black list and white list data are stored in a database, and the bloom filter algorithm can be adopted for filtering on the premise of ensuring certain accuracy.
In this embodiment, a learning manner that a system can actively inject resources into a learner according to user characteristics is implemented, so as to improve the search quality of the learner, and make the search result more efficient, comprehensive, detailed, accurate, and clear, and the system recommends an individualized learning path and learning resources for the student, thereby effectively promoting the individualized education of the learner, as shown in the flow diagram of retrieval shown in fig. 7, an index is first established, and LARQ has 3 usage modes: the method comprises The steps of character indexing, theme resource indexing and graph indexing, establishing corresponding indexes by using character indexes of an LARQ (Web Ontology Language, OWL) according to a Web Ontology Language (The OWL) file, then carrying out SPARQL query, obtaining corresponding results according to SPARQL query statements, then grouping, and finally carrying out XML output result organization, namely constructing different virtual subgraphs according to different core words, then carrying out sequencing and intervention, removing illegal contents or retrieval results which do not accord with platform policy regulation, and finally carrying out XML output. And outputting the retrieval result subjected to the grouping sequencing artificial intervention as a corresponding XML organization form.
The embodiment specifically realizes the knowledge graph system of the English field body, compared with the retrieval result of a learner on a search engine, the retrieved result does not need to be summarized by users, the result is comprehensive, detailed, accurate and clear, and the search quality is improved.
According to the scheme, English knowledge information to be retrieved is obtained; extracting reference keyword information in the English knowledge information to be retrieved, and querying corresponding query suggestion information carrying individual category information in a preset index file according to the reference keyword information; inquiring related reference retrieval English knowledge information in a target file library based on a body webpage language according to the inquiry suggestion information, wherein the reference retrieval English knowledge information comprises wiki vocabulary information, fixed collocation information, example sentences and translation information thereof, resource information and knowledge point information, and the related reference retrieval English knowledge information comprises reference retrieval English knowledge information with derivation relation, composite relation, inclusion relation or key relation among individual category information; grouping the reference retrieval English knowledge information according to the inquired wiki vocabulary information, the resource information and the keyword information; and sequencing the grouped reference retrieval English knowledge information, and generating target retrieval English knowledge map information according to the sequenced reference retrieval English knowledge information, so that the retrieval content of the user is efficiently associated with knowledge network retrieval based on the body webpage language file library, and the retrieval accuracy and effectiveness are greatly improved.
In addition, in order to achieve the above object, the present invention also provides an english-language-knowledge-map-based retrieval apparatus, including: the English knowledge map based retrieval method comprises the steps of a memory, a processor and an English knowledge map based retrieval program which is stored on the memory and can run on the processor, wherein the English knowledge map based retrieval program is configured to realize the steps of the English knowledge map based retrieval method.
Furthermore, an embodiment of the present invention further provides a storage medium, where the storage medium stores an english-language-knowledge-map-based retrieval program, and the english-language-knowledge-map-based retrieval program is executed by a processor to perform the steps of the english-language-knowledge-map-based retrieval method described above.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a computer-readable storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, and includes several instructions for enabling an intelligent terminal (which may be a mobile phone, a computer, a terminal, an air conditioner, or a network terminal) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.