CN110276079B - Word stock establishment method, information retrieval method and corresponding system - Google Patents

Word stock establishment method, information retrieval method and corresponding system Download PDF

Info

Publication number
CN110276079B
CN110276079B CN201910568339.4A CN201910568339A CN110276079B CN 110276079 B CN110276079 B CN 110276079B CN 201910568339 A CN201910568339 A CN 201910568339A CN 110276079 B CN110276079 B CN 110276079B
Authority
CN
China
Prior art keywords
word
vocabulary
search
library
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910568339.4A
Other languages
Chinese (zh)
Other versions
CN110276079A (en
Inventor
谷晓佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910568339.4A priority Critical patent/CN110276079B/en
Publication of CN110276079A publication Critical patent/CN110276079A/en
Application granted granted Critical
Publication of CN110276079B publication Critical patent/CN110276079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention discloses a word stock building method, an information retrieval method and a corresponding system, wherein the word stock building method comprises the following steps: acquiring the associated vocabulary of each vocabulary according to each vocabulary in a dictionary database and the specific explanation corresponding to each vocabulary; for each word, the word and the related word of the word are used as a word group and stored in a pre-established word sense library; and saving the classification logic relation between each word and the associated word of the word in a pre-established classification association library. According to the embodiment of the invention, according to the vocabulary and specific explanation of the dictionary library, the associated vocabulary of each vocabulary is stored in the word sense library, and the classification logic relation among the vocabularies is stored in the classification associated library, so that when information retrieval is available, the vocabulary expansion is carried out on the search terms to obtain the associated search terms, and then the search is carried out according to the associated search terms, the obtained search results are relatively comprehensive, and the initial results are expanded.

Description

Word stock establishment method, information retrieval method and corresponding system
Technical Field
The embodiment of the invention relates to the technical field of information retrieval, in particular to a word stock building method, an information retrieval method and a corresponding system.
Background
Currently, a common information retrieval method is that, according to a retrieval word (which may also be called a keyword) input by a user, a search engine retrieves according to the retrieval word and gives a retrieval result to respond. The search engine can give out a search result with higher pertinence for keyword search, and can directly give out comments or result answers for the input search words in most cases.
However, in general, the expandability of the search result obtained according to the input search term is limited, and better judgment and decision support basis cannot be provided.
Disclosure of Invention
Therefore, the embodiment of the invention provides a word stock building method, an information retrieval method and a corresponding system, which are used for solving the problem of large retrieval result limitation caused by single retrieval word in the prior art.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
according to a first aspect of an embodiment of the present invention, there is provided a word stock building method, including:
s1, acquiring associated vocabulary of each vocabulary according to each vocabulary and the specific explanation corresponding to each vocabulary in a dictionary database;
s2, for each vocabulary, storing the vocabulary and the associated vocabulary of the vocabulary as a vocabulary group in a pre-established word sense library;
and S3, storing the classification logic relation between each vocabulary and the associated vocabulary of the vocabulary in a pre-established classification association library.
Further, the step S1 specifically includes:
s11, collecting each vocabulary in a dictionary database and the specific explanation corresponding to each vocabulary;
s12, dividing words into separate words for the specific explanation of each word, and obtaining associated words of each word according to logic characterization relations in the separate words, wherein the associated words comprise a paraphrasing word, a synonymous word, an anti-meaning word, an upper meaning word and a lower meaning word of each word;
correspondingly, the step S2 specifically includes:
for each word, the word and the words of the word, i.e. the word in the meaning group, i.e. the synonym, the anti-meaning word, the upper meaning word and the lower meaning word are stored in the word sense library, and one word is selected as a meta word.
Further, the taxonomic logical relationships include synonyms, paraphraseology, anti-ambiguities, hypernyms, hyponyms, and keywords.
Further, after the step S2, the method further includes:
a. collecting corpus materials outside a dictionary database, and segmenting the corpus materials by using a Chinese word segmentation method to obtain a plurality of segmented words;
b. for each word segment, accessing the word sense library, and if the word segment is not in any vocabulary group of the word sense library, executing the step a or c;
c. and verifying the word segmentation, incorporating the word segmentation into an existing vocabulary group or updating the word sense library by a newly built vocabulary group, and taking the word segmentation as a meta vocabulary of the newly built vocabulary group.
According to a second aspect of an embodiment of the present invention, there is provided an information retrieval method, including:
s1', inquiring related search words of the first search word from a word sense library or a classification related library according to the input first search word;
s2', searching according to the associated search term of the first search term to obtain a corresponding search result; or searching according to the input second search term to obtain a corresponding search result;
the second search word is a search word selected from the associated search words of the first search word, and the word sense library and the classification associated library are established based on the word library establishing method.
Further, the step S1' specifically includes:
and according to the first search word, searching synonyms, hyponyms, and hyponyms of the first search word in the word sense library or the classification association library to serve as association search words of the first search word.
Further, the method further comprises the following steps:
the first search word is taken as an original search word, and the first search word and the associated search word of the first search word are presented in a tree structure;
and presenting the classified logic relationship between the first search term and the associated search term in the form of a report.
According to a third aspect of the embodiment of the present invention, there is provided a thesaurus building system, including:
the acquisition module is used for acquiring the associated vocabulary of each vocabulary according to each vocabulary and the specific explanation corresponding to each vocabulary in the dictionary database;
the first storage module is used for storing each vocabulary and the associated vocabulary of the vocabulary as a vocabulary group in a pre-established word sense library;
and the second storage module is used for storing the classification logic relation between each vocabulary and the associated vocabulary of the vocabulary in a pre-established classification association library.
According to a fourth aspect of an embodiment of the present invention, there is provided an information retrieval system including:
the query module is used for querying the associated search term of the first search term from the word sense library or the classification associated library according to the input first search term;
the retrieval module is used for retrieving according to the associated retrieval word of the first retrieval word to obtain a corresponding retrieval result; or, the method is used for searching according to the input second search term to obtain a corresponding search result;
the second search word is a search word selected from the associated search words of the first search word, and the word sense library and the classification associated library are established based on the word library establishing method.
According to a fifth aspect of embodiments of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a word stock building method or an information retrieval method.
The embodiment of the invention has the following advantages: according to the vocabulary and concrete explanation of the dictionary library, the associated vocabulary of each vocabulary is stored in the word sense library, and the classification logic relation among the vocabularies is stored in the classification associated library, so that when information retrieval is available, the vocabulary expansion is carried out on the retrieval words to obtain associated retrieval words, further, the retrieval is carried out according to the associated retrieval words, the obtained retrieval results are comprehensive, and the initial results are expanded.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the ambit of the technical disclosure.
FIG. 1 is a flowchart of a word stock establishment method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a word sense library creation method according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for updating a word sense library according to an embodiment of the present invention;
FIG. 4 is a flow chart of a method for information retrieval according to an embodiment of the present invention;
FIG. 5 is a block diagram illustrating a thesaurus creation system according to one embodiment of the present invention;
FIG. 6 is a block diagram of an information retrieval system connection according to one embodiment of the present invention.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a word stock building method according to an embodiment of the present invention is provided, including: s1, acquiring associated vocabulary of each vocabulary according to each vocabulary and the specific explanation corresponding to each vocabulary in a dictionary database; s2, for each vocabulary, storing the vocabulary and the associated vocabulary of the vocabulary as a vocabulary group in a pre-established word sense library; and S3, storing the classification logic relation between each vocabulary and the associated vocabulary of the vocabulary in a pre-established classification association library.
According to the embodiment of the invention, according to each vocabulary of the dictionary library and the specific explanation corresponding to each vocabulary, the associated vocabulary of each vocabulary is analyzed, each vocabulary and the associated vocabulary are stored in the word sense library, and the classification logic relations among the vocabularies are stored in the classification associated library, when information retrieval is carried out, the associated words of the original retrieval words can be found in the word sense library and the classification associated library according to the original retrieval words, vocabulary expansion is carried out on the original retrieval words, the associated retrieval words are obtained, and then retrieval is carried out according to the associated retrieval words, so that compared with the retrieval results obtained only according to the original retrieval words, the retrieval results are comprehensively expanded.
Referring to fig. 2, in one embodiment of the present invention, the step S1 specifically includes: s11, collecting each vocabulary in a dictionary database and the specific explanation corresponding to each vocabulary; s12, splitting words of the specific explanation of each word, and obtaining associated words of each word according to logic characterization relations in the words, wherein the associated words comprise a paraphrasing word, a synonymousing word, an anticonsite word, an upper onsite word and a lower onsite word of each word.
Correspondingly, the step S2 specifically includes: for each word, the word and the words of the word, i.e. the word in the meaning group, i.e. the synonym, the anti-meaning word, the upper meaning word and the lower meaning word are stored in the word sense library, and one word is selected as a meta word.
Specifically, the word meaning library is established by collecting each word and the corresponding concrete explanation of each word from dictionary libraries, such as Xinhua dictionary and modern Chinese dictionary. Aiming at each vocabulary, the Chinese word segmentation component is adopted for splitting word segmentation for the concrete explanation of the vocabulary, and word segmentation results are filtered and obtained. The process can add manual checking, checking and correcting work, thereby forming a reliable basic word segmentation list and a reliable word filtering list, and obtaining { word = segmentation 1, word = segmentation 2, … and word n }. In the specific interpretation of the words, the relevance between the specific interpretation and the words is searched according to the specific words in the specific interpretation, and the related word sets of each word are summarized, wherein the related word sets comprise word types of the word, such as the upper sense word, the lower sense word, the same/close sense word, the anti-sense word and the like.
After the associated vocabulary of each vocabulary is found, for each vocabulary, the vocabulary and the associated vocabulary are stored in a word sense library as a vocabulary group, one vocabulary is selected from the vocabulary group as a unit vocabulary of the vocabulary group, and the classification logic relationship between every two vocabularies in the vocabulary group is stored in a classification association library. The classification logic relationship between two vocabularies comprises synonyms, paraphraseology, disambiguation, hypernym, hyponym and keywords.
The vocabulary expresses the classification logic relation as follows: [ { category=specific vocabulary, original vocabulary=vocabulary 1, destination vocabulary=vocabulary 2}, {.}, ], wherein specific vocabulary means: hypernyms, hyponyms, homonyms, anticnyms, and the like.
When the specific word is an upper sense word, the upper sense word of the explanatory word 1 is a word 2 or the word 2 is a lower sense word of the word 1; when the specific vocabulary is the hyponym, the hyponym of the explanatory vocabulary 1 is the vocabulary 2 or the vocabulary 2 is the hyponym of the vocabulary 1; when the specific vocabulary is the synonym, the synonym of the explanatory vocabulary 1 is the synonym of the vocabulary 2 or the vocabulary 2 is the synonym of the vocabulary 1; when the specific vocabulary is an anti-meaning word, the anti-meaning word of the description vocabulary 1 is a vocabulary 2 or the vocabulary 2 is an anti-meaning word of the vocabulary 1.
The method and process for establishing word meaning library, classification association library and keyword library in the embodiments of the present invention are described below with several specific examples, and since the Xinhua dictionary and the modern chinese dictionary are mainly used for word interpretation, the corresponding dictionary is used to mainly obtain the word meaning word and the word meaning word of the vocabulary, and the homonym word and the anticnym word are obtained according to specific words such as "also called" and "not". The following is shown:
example one: "law of nature" refers to the law that exists inside the objectivity of nature, also called the law of nature. "
Acquiring an upper sense word: the upper sense word is analyzed, { category=upper sense word, original word=natural rule, target word=rule }, and the upper sense word meaning the natural rule is rule.
Acquiring homonyms/hyponyms: for example: "also called", "also called … or", "yes", "just", "also said" may be followed by a rough cut-off as synonyms, examples: "nature law" refers to the law that exists inside the objectivity of nature, also known as the law of nature. "the same/close meaning word association is analyzed, { category=synonym, original vocabulary=natural rule, target vocabulary=natural rule }, and synonym meaning natural rule is natural rule.
The classification logic relationship is obtained in this example: [ { category=synonym, original vocabulary=natural rule, destination vocabulary=natural rule }, { category=upper sense word, original vocabulary=natural rule, destination vocabulary=rule } ]; keyword: { natural law= [ nature, objective things, law ] }, i.e., { { category=keyword, original vocabulary=natural law, objective vocabulary=nature }, { category=keyword, original vocabulary=natural law, objective vocabulary=objective things }, { category=keyword, original vocabulary=natural law, objective vocabulary=law }, the keyword meaning of natural law includes nature, objective things, law.
Example two: "Natural science" the science of studying various substances and phenomena in nature. Including physics, chemistry, zoology, botanic, mineralogy, physiology, mathematics, etc.
The following sense words are obtained: the following information included in the explanation is the hyponym of the word "natural science", which is physical, chemical, zoology, botany, mineralogy, physiology, mathematics, and the like, and the hyponym association is resolved, [ { category=hyponym, original word=natural science, objective word=physical }, { category=hyponym, original word=natural science, objective word=chemical }, { category=hyponym, original word=natural science, objective word=zoology }, { category=hyponym, original word=natural science, objective word=botany }, { category=hyponym, original word=natural science, objective word=mineralogy }, { category=hyponym, original word=natural science, objective word=physiological }, { category=hyponym, original word=natural science, objective word=math } ].
Classification is obtained in this example: [ { category=hyponym, original vocabulary=natural science, objective vocabulary=physical }, { category=hyponym, original vocabulary=natural science, objective vocabulary=chemical }, { category=hyponym, original vocabulary=natural science, objective vocabulary=zoology }, { category=hyponym, original vocabulary=natural science, objective vocabulary=botany }, { category=hyponym, original vocabulary=natural science, objective vocabulary=mineralogy }, { category=hyponym, original vocabulary=natural science, objective vocabulary=physiology }, { category=hyponym, original vocabulary=natural science, objective vocabulary=mathematic }, { category=hypernym original vocabulary=natural science, objective vocabulary=science } ], keyword: { natural science= [ nature, substance, phenomenon, science ] }.
Acquiring an anticompction word: for example: "not", "not" and "not" may be later roughly truncated to an antisense relationship. Further confirm whether it is an anticompction. Examples: artificial, non-natural: fiber-ice-earth satellite. "the relation of the anti-ambiguities is resolved, { the category=the anti-ambiguities, { the original vocabulary=artificial, the target vocabulary=natural }, which means that the artificial anti-ambiguities are natural.
The same/close meaning word and the opposite meaning word of the vocabulary can be inquired for the same/close meaning word, the opposite meaning word dictionary or the corpus logic analysis to obtain basic vocabulary relation, and the same/close meaning word and the opposite meaning word dictionary are read to obtain vocabulary association. Homonym/paraphraseology example: [ { category=synonym, original vocabulary=happy, destination vocabulary=happy }, { category=synonym, original vocabulary=happy, destination vocabulary=happy } ]. Examples of anti-ambiguities: [ { category=disambiguation, original vocabulary=happy, destination vocabulary=sad }, { category=disambiguation, original vocabulary=happy, destination vocabulary=difficult } ].
The corpus logic analysis mainly splits sentences according to logic words to obtain the same or opposite meanings. When the opposite meaning is obtained, i.e., the front piece of the same or similar meaning, the cis result and the turn result are obtained in comparison, and the two results are roughly judged to be opposite. Examples: "although this bridge has been built for many years, she is still very strong. The bridge has been built for many years, and appears to be somewhat loose. Wherein "although this bridge has been built for many years, she is still very strong. The word "firm" is extracted from the middle but the turning result is followed by the middle; "this bridge has been built for many years and appears to be somewhat loose. The following is the following result, and the vocabulary "loose" is extracted. The anti-meaning word association is resolved, { category=anti-meaning word, original word=firm, objective word=loose }, meaning firm anti-meaning word is loose.
Through the above logical analysis of various dictionaries and corpora, each vocabulary and the associated vocabulary (homonym, hyponym, upper sense word, lower sense word, etc.) of each vocabulary are obtained, each vocabulary and the associated vocabulary are used as a group of vocabularies, namely vocabulary groups, one of the vocabularies is selected as a meta vocabulary, each vocabulary group is stored in a word sense library, and the classification logic relationship of each vocabulary is stored in a classification association library. The related keywords of each vocabulary can also be stored in a keyword library, and the related keywords of each vocabulary can be queried through the keyword library.
Referring to fig. 3, the step S2 further includes: a. collecting corpus materials outside a dictionary database, and segmenting the corpus materials by using a Chinese word segmentation method to obtain a plurality of segmented words; b. for each word segment, accessing the word sense library, and if the word segment is not in any vocabulary group of the word sense library, executing the step a or c; c. and verifying the word segmentation, incorporating the word segmentation into an existing vocabulary group or updating the word sense library by a newly built vocabulary group, and taking the word segmentation as a meta vocabulary of the newly built vocabulary group.
The data sources used in the establishment of the word sense library and the classification association library are mainly various dictionaries, so that the data sources of the word sense library and the classification library established based on various dictionaries are not comprehensive enough.
Specifically, a website corpus can be acquired or obtained, corpus materials are segmented by using a Chinese word segmentation method, and word segmentation results are obtained; and accessing a word sense library for each word segment, and if the word segment is in a vocabulary group of the word sense library, reading element vocabularies in the vocabulary group, and further accessing a classification association library so as to acquire classification association of the word segment.
If the vocabulary is not in the vocabulary group, the word segmentation is further split, and query is carried out in a word sense library and a classification association library after the splitting, so that the split word is used as auxiliary input information for manual verification.
For all the vocabularies of which the segmented words are not in the vocabulary group, manually checking, correcting and verifying the vocabularies, if the vocabularies can be included in the existing vocabulary group in the word sense library, for example, if the related vocabularies of the vocabularies are vocabularies in a certain vocabulary group, the vocabularies are included in the existing vocabulary group, and meta-vocabularies are obtained, so that classification association of the vocabularies is obtained; if the vocabulary does not belong to any vocabulary group, creating a vocabulary group, wherein the vocabulary is used as a meta-vocabulary of the vocabulary group, the vocabulary group is put into a word sense library, and meanwhile, the meta-vocabulary is incorporated into a classification association library, so that classification association of the vocabulary is obtained.
Referring to fig. 4, there is provided an information retrieval method according to an embodiment of the present invention, including: s1', inquiring related search words of the first search word from a word sense library or a classification related library according to the input first search word; s2', searching according to the associated search term of the first search term to obtain a corresponding search result; or searching according to the input second search term to obtain a corresponding search result; the second search word is a search word selected from the associated search words of the first search word, and the word sense library and the classification associated library are established based on a word library establishment method.
In one embodiment of the present invention, the step S1' specifically includes: and according to the first search word, searching synonyms, hyponyms, and hyponyms of the first search word in the word sense library or the classification association library to serve as association search words of the first search word.
The above embodiments establish a word sense library, a classification association library and a keyword library, and when information retrieval is performed, the embodiments of the present invention query the association search word (mainly the same/similar meaning word, upper meaning word and lower meaning word of the first search word) of the first search word from the word sense library or the classification association library according to the input first search word. Then searching according to the associated search word of the first search word to obtain a search result, wherein the search result is more comprehensive than the search result obtained by searching by only using the first search word; and a part of search words can be selected from the associated search words of the first search words according to the requirement to search, so that the search can be performed in a targeted manner.
In one embodiment of the present invention, further comprising: the first search word is taken as an original search word, and the first search word and the associated search word of the first search word are presented in a tree structure; and presenting the classified logic relationship between the first search term and the associated search term in the form of a report.
Specifically, according to a first search word input by a user, the associated search word of the first search word is queried from a word sense library, namely, the same/similar meaning word, the upper meaning word, the lower meaning word, the anti-meaning word and the like of the first search word, and the first search word and the associated search word thereof are presented in a tree structure. And querying the classification association library for the classification logic relationship between the first search term and the associated search term thereof, and presenting the classification logic relationship in the form of a report. That is, the related information of the first search term input by the user is presented to the user, and when the user performs search, the search can be performed according to the presented related information as a reference.
Referring to fig. 5, a word stock building system according to an embodiment of the present invention includes an obtaining module 51, a first saving module 52, and a second saving module 53.
And the obtaining module 51 is configured to obtain, according to each vocabulary in the dictionary database and the specific explanation corresponding to each vocabulary, an associated vocabulary of each vocabulary.
The first saving module 52 is configured to save, for each word, the word and the word associated with the word as a word group in a word sense library established in advance.
The second saving module 53 is configured to save the classification logic relationship between each vocabulary and the associated vocabulary of the vocabulary in a pre-established classification association library.
The word stock building system provided by the embodiment of the present invention corresponds to the word stock building method provided by the foregoing embodiment, and technical features of the word stock building system provided by the embodiment may refer to relevant technical features of the word stock building method in the foregoing embodiment, which are not described herein again.
Referring to fig. 6, an information retrieval system of one embodiment of the present invention is provided, comprising a query module 61 and a retrieval module 62.
And a query module 61, configured to query, according to the input first term, an associated term of the first term from a term meaning library or a classification association library.
The retrieval module 62 is configured to retrieve according to the associated search term of the first search term, so as to obtain a corresponding retrieval result; or the method is used for searching according to the input second search term to obtain a corresponding search result.
The second search term is a search term selected from the associated search terms of the first search term, and the word sense library and the classification associated library are established based on the word library establishment method described in each embodiment.
The information retrieval system provided by the embodiment of the present invention corresponds to the information retrieval method provided by the foregoing embodiment, and technical features of the information retrieval system provided by the present embodiment may refer to relevant technical features of the information retrieval method in the foregoing embodiment, which are not described herein again.
In one embodiment of the present invention, there is also provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements a word stock building method or an information retrieval method.
According to the word stock establishing method, the information searching method and the corresponding system, according to the words and the specific explanation of the dictionary stock, the associated words of each word are stored in the word sense stock, and the classification logic relations among the words are stored in the classification association stock; when information retrieval is carried out, related retrieval words of the retrieval words can be found in a word sense library and a classification related library according to the retrieval words, vocabulary expansion is carried out on the retrieval words to obtain related retrieval words, then retrieval is carried out according to the related retrieval words, the obtained retrieval results are comprehensive, and initial results are expanded; and the associated search words of the search words and the classification logic relations between the search words and the associated search words are presented to the user for reference by the user, and basis is provided for judgment and decision support of the user.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims (8)

1. A method for creating a word stock, comprising:
s1, acquiring associated vocabulary of each vocabulary according to each vocabulary and the specific explanation corresponding to each vocabulary in a dictionary database;
s2, for each vocabulary, storing the vocabulary and the associated vocabulary of the vocabulary as a vocabulary group in a pre-established word sense library;
s3, storing the classification logic relation between each vocabulary and the associated vocabulary of the vocabulary in a pre-established classification association library;
the step S1 specifically includes:
s11, collecting each vocabulary in a dictionary database and the specific explanation corresponding to each vocabulary;
s12, dividing words into separate words for the specific explanation of each word, and obtaining associated words of each word according to logic characterization relations in the separate words, wherein the associated words comprise a paraphrasing word, a synonymous word, an anti-meaning word, an upper meaning word and a lower meaning word of each word;
correspondingly, the step S2 specifically includes:
for each word, storing the word and its paraphrasing, synonyms, anti-ambiguities, hypernyms and hyponyms as word groups in a word sense library, and selecting one of the words as a primary word;
the step S2 further includes:
a. collecting corpus materials outside a dictionary database, and segmenting the corpus materials by using a Chinese word segmentation method to obtain a plurality of segmented words;
b. for each word segment, accessing the word sense library, and if the word segment is not in any vocabulary group of the word sense library, executing the step a or c;
c. and verifying the word segmentation, incorporating the word segmentation into an existing vocabulary group or updating the word sense library by a newly built vocabulary group, and taking the word segmentation as a meta vocabulary of the newly built vocabulary group.
2. The method of claim 1, wherein the taxonomic logic relationships include synonyms, paraphraseology, anti-ambiguities, hypernyms, hyponyms, and keywords.
3. An information retrieval method, comprising:
s1', inquiring related search words of the first search word from a word sense library or a classification related library according to the input first search word;
s2', searching according to the associated search term of the first search term to obtain a corresponding search result; or searching according to the input second search term to obtain a corresponding search result;
wherein the second term is a term selected from the associated terms of the first term, and the word sense library and the classification association library are established based on the word library establishment method according to any one of claims 1-2.
4. The information retrieval method as recited in claim 3, wherein the step S1' specifically includes:
and according to the first search word, searching synonyms, hyponyms, and hyponyms of the first search word in the word sense library or the classification association library to serve as association search words of the first search word.
5. The information retrieval method as recited in claim 4, further comprising:
the first search word is taken as an original search word, and the first search word and the associated search word of the first search word are presented in a tree structure;
and presenting the classified logic relationship between the first search term and the associated search term in the form of a report.
6. A word stock building system employing the word stock building method according to any one of claims 1 to 2, characterized by comprising:
the acquisition module is used for acquiring the associated vocabulary of each vocabulary according to each vocabulary and the specific explanation corresponding to each vocabulary in the dictionary database;
the first storage module is used for storing each vocabulary and the associated vocabulary of the vocabulary as a vocabulary group in a pre-established word sense library;
and the second storage module is used for storing the classification logic relation between each vocabulary and the associated vocabulary of the vocabulary in a pre-established classification association library.
7. An information retrieval system, comprising:
the query module is used for querying the associated search term of the first search term from the word sense library or the classification associated library according to the input first search term;
the retrieval module is used for retrieving according to the associated retrieval word of the first retrieval word to obtain a corresponding retrieval result; or, the method is used for searching according to the input second search term to obtain a corresponding search result;
wherein the second term is a term selected from the associated terms of the first term, and the word sense library and the classification association library are established based on the word library establishment method according to any one of claims 1-2.
8. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the word stock building method of any one of claims 1 to 2 or the information retrieval method of any one of claims 3 to 5.
CN201910568339.4A 2019-06-27 2019-06-27 Word stock establishment method, information retrieval method and corresponding system Active CN110276079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910568339.4A CN110276079B (en) 2019-06-27 2019-06-27 Word stock establishment method, information retrieval method and corresponding system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910568339.4A CN110276079B (en) 2019-06-27 2019-06-27 Word stock establishment method, information retrieval method and corresponding system

Publications (2)

Publication Number Publication Date
CN110276079A CN110276079A (en) 2019-09-24
CN110276079B true CN110276079B (en) 2023-05-26

Family

ID=67962399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910568339.4A Active CN110276079B (en) 2019-06-27 2019-06-27 Word stock establishment method, information retrieval method and corresponding system

Country Status (1)

Country Link
CN (1) CN110276079B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051898A (en) * 2019-12-27 2021-06-29 北京阿博茨科技有限公司 Word meaning accumulation and word segmentation method, tool and system for structured data searched by natural language
CN113515585A (en) * 2020-04-10 2021-10-19 中国石油化工股份有限公司 Construction method, retrieval method and system of special lexicon in dangerous chemical safety field
CN113407668B (en) * 2021-06-11 2022-10-11 武夷学院 Data processing method and device for cognitive association capacity training

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000222410A (en) * 1999-01-28 2000-08-11 Matsushita Electric Ind Co Ltd Thesaurus retrieving device and thesaurus retrieval system
TWI290684B (en) * 2003-05-09 2007-12-01 Webgenie Information Ltd Incremental thesaurus construction method
US20120124084A1 (en) * 2010-11-06 2012-05-17 Ning Zhu Method to semantically search domain name by utilizing hyponym, hypernym, troponym, entailment and coordinate term
CN108959314A (en) * 2017-05-24 2018-12-07 西安科技大市场创新云服务股份有限公司 A kind of semantic retrieving method and device

Also Published As

Publication number Publication date
CN110276079A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN102479191B (en) Method and device for providing multi-granularity word segmentation result
CN101819578B (en) Retrieval method, method and device for establishing index and retrieval system
US8316007B2 (en) Automatically finding acronyms and synonyms in a corpus
CN110276079B (en) Word stock establishment method, information retrieval method and corresponding system
US8161036B2 (en) Index optimization for ranking using a linear model
WO2005041063A1 (en) Information retrieval
JP2009537901A (en) Annotation by search
WO2019169858A1 (en) Searching engine technology based data analysis method and system
US20110264997A1 (en) Scalable Incremental Semantic Entity and Relatedness Extraction from Unstructured Text
US10936667B2 (en) Indication of search result
CN114036930A (en) Text error correction method, device, equipment and computer readable medium
US10606903B2 (en) Multi-dimensional query based extraction of polarity-aware content
CN105653701A (en) Model generating method and device as well as word weighting method and device
CN105005630A (en) Method for multi-dimensional detection of specific targets from omnimedia
CN103914569B (en) Input creation method, the device of reminding method, device and dictionary tree-model
US20180189380A1 (en) Job search engine
US11151317B1 (en) Contextual spelling correction system
US8799268B2 (en) Consolidating tags
WO2017215244A1 (en) Method and device for providing relevant words
US20140280050A1 (en) Term searching based on context
US20170344533A1 (en) Patent claims disassembling and analyzing method
Barari et al. CloniZER spell checker adaptive language independent spell checker
CN110705285A (en) Government affair text subject word bank construction method, device, server and readable storage medium
US10467530B2 (en) Searching text via function learning
US20190213486A1 (en) Virtual Adaptive Learning of Financial Articles Utilizing Artificial Intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant