CN107169124B - Query method of bilingual double-solution dictionary - Google Patents

Query method of bilingual double-solution dictionary Download PDF

Info

Publication number
CN107169124B
CN107169124B CN201710399683.6A CN201710399683A CN107169124B CN 107169124 B CN107169124 B CN 107169124B CN 201710399683 A CN201710399683 A CN 201710399683A CN 107169124 B CN107169124 B CN 107169124B
Authority
CN
China
Prior art keywords
language
database
entries
double
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710399683.6A
Other languages
Chinese (zh)
Other versions
CN107169124A (en
Inventor
范剑淼
孔祥顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Haidi Digital Publishing Technology Co ltd
Original Assignee
Shanghai Haidi Digital Publishing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Haidi Digital Publishing Technology Co ltd filed Critical Shanghai Haidi Digital Publishing Technology Co ltd
Priority to CN201710399683.6A priority Critical patent/CN107169124B/en
Publication of CN107169124A publication Critical patent/CN107169124A/en
Application granted granted Critical
Publication of CN107169124B publication Critical patent/CN107169124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a query method of a bilingual double-solution dictionary, which comprises the following steps: constructing an auxiliary database containing a mapping relation between entries of a first language and entries of a second language; establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language paraphrase of the double-solution dictionary; generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database; taking the first language as a search word, inquiring corresponding second language entry mapping in the comprehensive index database, and returning a second language entry item corresponding to the search word; by introducing an auxiliary database, the first language retrieval of the double-solution dictionary is carried out, the best vocabulary entry of the second language matched when the first language in the double-solution dictionary is used as a retrieval keyword is output, and the rest of mapped second language vocabulary entries are used as candidate vocabulary entries according to the mapping relation, so that the user experience is improved.

Description

Query method of bilingual double-solution dictionary
Technical Field
The invention relates to a query method of an electronic bilingual double-solution dictionary, in particular to a Chinese query method of the bilingual double-solution dictionary.
Background
The English-Chinese double-interpretation dictionary is a dictionary which takes English as a prefix and performs simultaneous interpretation on the prefix by English and Chinese, namely, English interpretation is performed firstly, Chinese interpretation is performed subsequently, the English-Chinese double-interpretation dictionary is a good tool book for learning English and is suitable for English beginners and middle-level learners, English interpretation can enable a user to understand words with original taste, and users can not understand English contents in the interpretation, so that the users need to help the users to understand the words more quickly by means of the Chinese interpretation, and when the English level reaches the level that the users can understand the word interpretation by using English completely, the users can only use English to interpret.
At present, English prefixes are used for searching English double-solution dictionaries, if a user wants to search corresponding English vocabulary entries by using Chinese characters or terms, the user needs to translate Chinese into English first and then search by using English, so that the search is very troublesome, and ambiguity can be caused if the translation is wrong, so that the search effect is greatly reduced.
For the dictionary user market taking Chinese as the mother language, the user with the mother language of Chinese prefers to use Chinese to retrieve English-Chinese double-solution dictionaries; although the English-Chinese two-way dictionary can be used for Chinese retrieval, the retrieval content is too single, only Chinese-English paraphrases and English-Chinese paraphrases exist, the double-interpretation effect aiming at English word heads is not provided, and the English learner is not beneficial to learning, so that some English learners prefer to learn by using the English-Chinese double-interpretation dictionary.
At present, no electronic dictionary capable of searching English-Chinese double-solution dictionary by adopting Chinese is available in the market. Because the Chinese paraphrases in the general English-Chinese double-paraphrase dictionary are descriptive languages or are contrasted translations of English descriptive paraphrases, Chinese contained in the English-Chinese double-paraphrase dictionary is limited in direct mapping English prefix information, and if only reverse indexes of the Chinese paraphrases in the English-Chinese double-paraphrase dictionary are used for query, the English-Chinese double-paraphrase query word user who only queries in a reverse index mode of the Chinese paraphrases is poor in experience, such as the condition that a vocabulary entry cannot be queried, the quantity of queried vocabulary entry is small, and the queried vocabulary entry is not a target vocabulary entry.
Other double-solution dictionaries in the current market also have the problems that Chinese inquiry cannot be carried out, inquired information is single, inquiry accuracy and coverage need to be improved and the like English-Chinese double-solution dictionaries. For a querier using a common native language as a first query language, when querying a bilingual dictionary which is a default query language and includes the first language (native language), a relatively effective and fast retrieval method is needed to be adopted to quickly retrieve, so that the query in the first language can be satisfied, and the query accuracy and coverage can be improved.
Disclosure of Invention
In view of the above-mentioned shortcomings, the present invention provides a method for querying a bilingual dictionary, which enables a user using a conventional language as a first language to quickly query a bilingual dictionary using a first language and a second language having a default query language as a second language, and specifically, by introducing an auxiliary database of a bilingual mapping relationship and combining with a comprehensive index database composed of paraphrase data of the first language in the bilingual dictionary, the user experience of searching the bilingual dictionary in the first language is improved by querying the bilingual dictionary using the first language and returning a plurality of entries of related second languages, and the accuracy and coverage of candidate words queried by the user using the first language are greatly improved.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
a method for constructing a double-solution dictionary comprehensive index database comprises the following steps:
constructing an auxiliary database containing a mapping relation between entries of a first language and entries of a second language;
establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language paraphrase of the double-solution dictionary;
and generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database.
According to one aspect of the present invention, the auxiliary database is a database containing mapping relationships between entries in a first language and entries in a second language, and includes one or more of a semantic knowledge base, a dictionary database, and a corpus.
According to one aspect of the present invention, the creating a reverse full-text search database according to the word segmentation result of the first language definition of the double-solution dictionary comprises: and establishing a reverse full-text retrieval database according to the mapping of the first language and the second language vocabulary entry obtained by the first language paraphrase word segmentation processing, wherein the first language paraphrase word segmentation processing mode comprises one or more of natural language segmentation, unitary segmentation and binary segmentation.
According to one aspect of the invention, the method for constructing the double-solution dictionary comprehensive index database comprises the following steps: recording the mapping relation of the first language to the second language entries in the auxiliary database as a weight W1; recording the mapping relation of the first language to the second language entries of the reverse full-text retrieval database as a weight W2; the weight W1 of the secondary database and the weight W2 of the reverse full text search database are pre-given and have different weight values.
A query method of a double-solution dictionary comprises the following steps:
constructing an auxiliary database containing a mapping relation between entries of a first language and entries of a second language;
establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language paraphrase of the double-solution dictionary;
generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database;
and inquiring the corresponding second language entry mapping in the comprehensive index database by taking the first language as a search word, and returning the second language entry item corresponding to the search word.
According to one aspect of the present invention, the auxiliary database is a database containing mapping relationships between entries in a first language and entries in a second language, and includes one or more of a semantic knowledge base, a dictionary database, and a corpus.
According to one aspect of the present invention, the creating a reverse full-text search database according to the word segmentation result of the first language definition of the double-solution dictionary comprises: and establishing a reverse full-text retrieval database according to the mapping of the first language and the second language vocabulary entry obtained by the first language paraphrase word segmentation processing, wherein the first language paraphrase word segmentation processing mode comprises one or more of natural language segmentation, unitary segmentation and binary segmentation.
A query method of a double-solution dictionary comprises the following steps:
constructing an auxiliary database containing mapping relations between the first language entries and the second language entries, and recording the mapping relations between the first language entries and the second language entries as weights W1;
establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language meaning of the double-solution dictionary, and recording the mapping relation between the first language and the second language vocabulary entries as a weight W2;
generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database;
and inquiring the corresponding second language entry mapping in the comprehensive index database by taking the first language as a search word, and returning the second language entry item corresponding to the search word.
According to one aspect of the present invention, the auxiliary database is a database containing mapping relationships between entries in a first language and entries in a second language, and includes one or more of a semantic knowledge base, a dictionary database, and a corpus.
According to one aspect of the present invention, the creating a reverse full-text search database according to the word segmentation result of the first language definition of the double-solution dictionary comprises: and establishing a reverse full-text retrieval database according to the mapping of the first language and the second language vocabulary entry obtained by the first language paraphrase word segmentation processing, wherein the first language paraphrase word segmentation processing mode comprises one or more of natural language segmentation, unitary segmentation and binary segmentation. The natural language segmentation is specifically to perform word segmentation according to default or self-contained word segmentation modes of different languages.
According to one aspect of the invention, the query method of the double-solution dictionary comprises the following steps: and searching the mapping of the second language vocabulary entries corresponding to the first language in the reverse full-text search database and the auxiliary database, and then respectively calculating the weight of the searched mapping results and returning the weight or returning the corresponding second language vocabulary entries according to the vocabulary entry ordering mode.
In accordance with one aspect of the invention, the weight W1 of the secondary database is higher than the weight W2 of the reverse full text search database.
In accordance with one aspect of the present invention, the weight W1 of the secondary database is lower than the weight W2 of the reverse full text search database.
According to one aspect of the invention, the query method of the double-solution dictionary comprises the following steps: when the auxiliary database is from one of the semantic knowledge base, the dictionary database and the corpus, returning corresponding second language vocabulary items according to the weight; when a plurality of database sources exist, the mapping relation weights of the first language entries and the second language entries with intersection are added for calculation, and then the corresponding second language entries are returned according to the weights.
A query method of a double-solution dictionary comprises the following steps:
constructing an auxiliary database containing a mapping relation between entries of a first language and entries of a second language;
using the first language as a search word, inquiring corresponding second language vocabulary entry mapping in the auxiliary database, and returning a second language vocabulary entry of the double-solution dictionary corresponding to the search word;
the auxiliary database is a database containing mapping relations between entries in a first language and entries in a second language, and comprises one or more of a semantic knowledge base, a dictionary database and a corpus.
The implementation of the invention has the advantages that:
according to the technical scheme, word segmentation processing and library building are carried out by utilizing the first language paraphrases of the double-solution dictionary, one or more of a semantic knowledge base, a dictionary database and a corpus are introduced to serve as an auxiliary database to carry out mapping from the first language to the second language vocabulary entries, candidate second language vocabulary entries are output according to the weight or the vocabulary entry ordering mode, and the accuracy and the coverage degree of word selection when a user queries the double-solution dictionary by using the first language are greatly improved. When the comprehensive index database is not constructed, the mapping relation between the first language and the second language vocabulary entry in the auxiliary database is only required to be called, and the corresponding second language vocabulary entry in the double-solution dictionary is returned, so that the first language retrieval of the double-solution dictionary is realized. The first language retrieval of the double-solution dictionary is carried out by introducing one or more mapping relation databases from the first language retrieval words to the second language entries as auxiliary databases, the best entries of the second language matched when the first language in the double-solution dictionary is taken as retrieval keywords are output, and the rest of the mapped second language entries are taken as candidate entries for the double-solution dictionary query of a user according to the mapping relation, so that the first language word searching user experience of the double-solution dictionary is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic diagram of a method for constructing a double-solution dictionary comprehensive index database according to a first embodiment of the present invention;
fig. 2 is a schematic diagram of a method for constructing a double-solution dictionary comprehensive index database according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of a double-solution dictionary query method according to a third embodiment of the present invention;
fig. 4 is a schematic diagram of a double-solution dictionary query method according to a fourth embodiment of the present invention;
fig. 5 is a schematic diagram of a double-solution dictionary query method according to a fifth embodiment of the present invention;
FIG. 6 is a block diagram of a comprehensive index database according to a sixth embodiment of the present invention;
fig. 7 is a schematic diagram of a chinese query method of an english-chinese double-solution dictionary according to a seventh embodiment of the present invention;
FIG. 8 is a diagram illustrating an example of an eighth query according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
As shown in fig. 1, when a user whose conventional language is a first language queries a bilingual dictionary with a default query language being a first language and a second language, a method for constructing a comprehensive index database of the bilingual dictionary is needed, the method for constructing the comprehensive index database of the bilingual dictionary comprises the following steps:
step S11: constructing an auxiliary database containing a mapping relation between entries of a first language and entries of a second language;
the auxiliary database is a database containing the mapping relation between the entries of the first language and the entries of the second language, and comprises one or more of a semantic knowledge base, a dictionary database and a corpus.
Step S12: establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language paraphrase of the double-solution dictionary;
and establishing a reverse full-text retrieval database according to the mapping of the first language vocabulary entry and the second language vocabulary entry obtained by the processing of the first language paraphrase word segmentation of the double-solution dictionary.
The word segmentation processing mode of the first language paraphrase of the double-solution dictionary comprises the following steps: one or more of natural language segmentation, unitary segmentation and binary segmentation.
The natural language segmentation is specifically to perform word segmentation according to default or self-contained word segmentation modes of different languages. For example, English has a word segmentation function, can accurately and quickly segment words, and can directly adopt an automatic word segmentation result.
Step S13: and generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database.
Example two
As shown in fig. 2, when a user with a conventional language of chinese queries a bilingual dictionary in english and chinese, which is a default query language of english, a method for constructing a comprehensive index database of the bilingual dictionary is needed, and the method for constructing the comprehensive index database of the bilingual dictionary comprises the following steps:
step S21: an auxiliary database is introduced, and the mapping relation of Chinese to foreign language entries in the auxiliary database is marked as weight W1.
The auxiliary database is a database containing mapping relations between Chinese and foreign language entries, and comprises one or more of a semantic knowledge base, a dictionary database and a corpus.
The Chinese in the auxiliary database comprises word segmentation results of the double-solution dictionary Chinese paraphrases.
The corpus is a parallel corpus, a translation corpus and other corpora with semantic mapping relation between Chinese and foreign languages.
The dictionary database is used for mapping the semanteme of Chinese and foreign languages from one or more Chinese external dictionaries, or adopting a plurality of Chinese external dictionaries and external Chinese dictionaries, such as a Chinese-English dictionary, an English-Chinese dictionary or a Chinese-German dictionary, a German-Chinese dictionary and the like.
The semantic knowledge base, the dictionary database and the corpus comprise semantic mapping relations between Chinese words and foreign language entries.
Step S22: establishing a reverse full-text retrieval database according to the Chinese meaning word segmentation result of the double-solution dictionary, and recording the mapping relation from Chinese to foreign language vocabulary entry as weight W2;
the word segmentation processing mode of the Chinese paraphrase in the double-solution dictionary comprises the following steps: one or more of natural language segmentation, unitary segmentation and binary segmentation.
The natural language segmentation is specifically to perform word segmentation according to default or self-contained word segmentation modes of different languages. For example, English has a word segmentation function, can accurately and quickly segment words, and can directly adopt an automatic word segmentation result.
The weight W1 and the weight W2 are given in advance, and the weight of the mapping weight of each chinese and foreign language entry is given a weight value in advance.
Step S23: and generating a comprehensive index database which takes Chinese as a search word and takes foreign language entries of the target double-solution dictionary as results according to the auxiliary database and the reverse full-text search database.
The mapping relation weight of the Chinese and foreign language entries in the reverse full text retrieval database and the auxiliary database is given in advance.
In practical applications, the weight W1 of the secondary database is higher than the weight W2 of the reverse full text search database.
In practical applications, the weight W1 of the secondary database is lower than the weight W2 of the reverse full text search database.
EXAMPLE III
As shown in fig. 3, when a user whose conventional language is a first language queries a bilingual double-solution dictionary whose default query language is a second language and the first language and the second language, a query method of the bilingual double-solution dictionary is needed, and the query method of the bilingual double-solution dictionary includes the following steps:
step S31: constructing an auxiliary database containing a mapping relation between entries of a first language and entries of a second language;
the auxiliary database is a database containing the mapping relation between the entries of the first language and the entries of the second language, and comprises one or more of a semantic knowledge base, a dictionary database and a corpus.
Step S32: establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language paraphrase of the double-solution dictionary;
and establishing a reverse full-text retrieval database according to the mapping of the first language vocabulary entry and the second language vocabulary entry obtained by the processing of the first language paraphrase word segmentation of the double-solution dictionary.
The word segmentation processing mode of the first language paraphrase of the double-solution dictionary comprises the following steps: one or more of natural language segmentation, unitary segmentation and binary segmentation.
The natural language segmentation is specifically to perform word segmentation according to default or self-contained word segmentation modes of different languages. For example, English has a word segmentation function, can accurately and quickly segment words, and can directly adopt an automatic word segmentation result.
Step S33: generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database;
step S34: and inquiring the corresponding second language entry mapping in the comprehensive index database by taking the first language as a search word, and returning the second language entry item corresponding to the search word.
Example four
As shown in fig. 4, a method for querying a dual solution dictionary includes the following steps:
step S41: constructing an auxiliary database containing mapping relations between the first language entries and the second language entries, and recording the mapping relations between the first language entries and the second language entries as weights W1;
the auxiliary database is a database containing the mapping relation between the entries of the first language and the entries of the second language, and comprises one or more of a semantic knowledge base, a dictionary database and a corpus.
Step S42: establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language meaning of the double-solution dictionary, and recording the mapping relation between the first language and the second language vocabulary entries as a weight W2;
and establishing a reverse full-text retrieval database according to the mapping of the first language vocabulary entry and the second language vocabulary entry obtained by the processing of the first language paraphrase word segmentation of the double-solution dictionary.
The word segmentation processing mode of the first language paraphrase of the double-solution dictionary comprises the following steps: one or more of natural language segmentation, unitary segmentation and binary segmentation.
The natural language segmentation is specifically to perform word segmentation according to default or self-contained word segmentation modes of different languages. For example, English has a word segmentation function, can accurately and quickly segment words, and can directly adopt an automatic word segmentation result.
Step S43: generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database;
step S44: and inquiring the corresponding second language entry mapping in the comprehensive index database by taking the first language as a search word, and returning the second language entry item corresponding to the search word.
In practical application, the second language vocabulary entry mapping corresponding to the first language in the reverse full-text retrieval database and the auxiliary database is retrieved first, and then the retrieved mapping results are respectively calculated and weighted and returned.
In practical application, the mapping of the second language entries corresponding to the first language in the reverse full-text retrieval database and the auxiliary database is retrieved first, and then the retrieved mapping results are returned to the corresponding second language entries respectively according to the entry ordering mode.
When the retrieved mapping results are respectively weighted back, the weight W1 of the auxiliary database is higher than the weight W2 of the reverse full text retrieval database.
In practical application, when the retrieved mapping results are respectively weighted and returned, the weight W1 of the auxiliary database is lower than the weight W2 of the reverse full-text retrieval database.
When the retrieved mapping results are respectively calculated and weighted and returned, when the auxiliary database is from one of the semantic knowledge base, the dictionary database and the corpus, returning corresponding second language entry items according to the weight; when a plurality of database sources exist, the mapping relation weights of the first language entries and the second language entries with intersection are added for calculation, and then the corresponding second language entries are returned according to the weights.
EXAMPLE five
As shown in fig. 5, when a user whose conventional language is a first language queries a bilingual double-solution dictionary whose default query language is a second language and the first language, a query method of the bilingual double-solution dictionary is needed, and the query method of the bilingual double-solution dictionary includes the following steps:
step S51: constructing an auxiliary database containing a mapping relation between entries of a first language and entries of a second language;
the auxiliary database is a database containing the mapping relation between the entries of the first language and the entries of the second language, and comprises one or more of a semantic knowledge base, a dictionary database and a corpus.
Step S52: using the first language as a search word, inquiring corresponding second language vocabulary entry mapping in the auxiliary database, and returning a second language vocabulary entry of the double-solution dictionary corresponding to the search word;
the auxiliary database is a database containing mapping relations between entries in a first language and entries in a second language, and comprises one or more of a semantic knowledge base, a dictionary database and a corpus.
EXAMPLE six
As shown in fig. 6, when a user with a conventional language of chinese queries a bilingual dictionary in english and chinese as a default query language, a comprehensive index database needs to be constructed according to a reverse full-text search database and an auxiliary database of the bilingual dictionary, and the specific construction method includes:
step S61: introducing an auxiliary database, and recording the mapping from Chinese to English vocabulary entry in the auxiliary database as weight W1;
step S62: establishing a reverse full-text retrieval database according to Chinese meaning word segmentation results of the English-Chinese double-interpretation dictionary, and recording the mapping from Chinese to English vocabulary entry as weight W2;
step S63: and generating a comprehensive index database which takes Chinese as a search word and takes English vocabulary entries of the target English-Chinese double-solution dictionary as results according to the auxiliary database and the reverse full-text search database.
Referring to fig. 6, the comprehensive index database includes a reverse full-text retrieval database and an auxiliary database, the reverse full-text retrieval database includes a mapping relationship between chinese and english entries, the auxiliary database includes a mapping relationship between chinese and english entries, and the mapping between chinese and english entries can be retrieved through retrieval methods such as B-tree search.
And the English vocabulary entries mapped by Chinese in the auxiliary database have corresponding English vocabulary entries in an English-Chinese double-solution dictionary.
The databases used in all embodiments of the present invention preferably adopt simple databases, such as key-value databases, including a main database and an index database, where the main database stores data contents, and chinese search words/words establish data index items in the index database, and point the corresponding chinese data indexes to the corresponding data contents in the main database, and for a database with a B-tree structure, the index database is searched preferentially during searching, so that the search depth can be reduced, and the search efficiency can be improved.
EXAMPLE seven
As shown in fig. 7, when a user with a conventional language of chinese queries a double-interpretation dictionary of english and chinese with a default query language of english, a method for querying a dictionary of chinese and english is needed, which includes the following steps:
step S71: introducing an auxiliary database, and recording the mapping from Chinese to English vocabulary entry in the auxiliary database as weight W1;
step S72: establishing a reverse full-text retrieval database according to Chinese meaning word segmentation results of the English-Chinese double-interpretation dictionary, and recording the mapping from Chinese to English vocabulary entry as weight W2;
the Chinese paraphrase word segmentation processing mode comprises one or more of natural language segmentation, unitary segmentation and binary segmentation, a reverse full-text retrieval database is established according to Chinese obtained by word segmentation processing, and the reverse full-text retrieval database comprises mapping relations of Chinese and English vocabulary entries.
The natural language segmentation is specifically to perform word segmentation according to default or self-contained word segmentation modes of different languages. For example, English has a word segmentation function, can accurately and quickly segment words, and can directly adopt an automatic word segmentation result.
Such as: a Chinese explanation of 'proposal for law change' is contained in Chinese explanations of example sentences corresponding to [ change ] entries in an English-Chinese double-solution dictionary, natural language segmentation is adopted as 'proposal for law change', unitary segmentation is adopted as 'proposal for law change', binary segmentation is adopted as 'proposal for law change', segmentation processing is carried out to obtain a nonrecurring Chinese list, English mapping relation is established between Chinese and English entries in the Chinese list, and the weight of each mapping entry is preset, for example, the weight can be given according to word frequency.
Optionally, the chinese language obtained by chinese paraphrase word segmentation processing is filtered to remove chinese language with low query probability, such as "rule", "mention", and the like.
The word segmentation result is huge, how to realize fast searching of relevant mapping in the presence of a large amount of data, preferably, the data structure adopts a B-tree structure, and an index item is established.
Step S73: generating a comprehensive index database which takes Chinese as a search word and takes English vocabulary entries of a target English-Chinese double-solution dictionary as results according to the auxiliary database and the reverse full-text search database;
step S74: and inquiring corresponding English vocabulary entry mapping in the comprehensive index database by using Chinese as a search word, and returning English vocabulary entries of an English-Chinese double-solution dictionary corresponding to the search word.
In step S74, the mapped english term is returned, preferably by weight.
The auxiliary database is a database containing mapping relations of Chinese and English vocabulary entries, and adopts one or more of a semantic knowledge base, a dictionary database, a corpus and the like, and the mapping relation weights of the Chinese and English in the databases are preset.
Preferably, the chinese language in the auxiliary database includes a chinese language obtained by processing chinese paraphrase and word segmentation in a english-chinese double-interpretation dictionary.
The corpus is a parallel corpus, a translation corpus and other corpora with Chinese and English vocabulary entry mapping relations, and the corpus data is derived from one or more corpora.
The dictionary database is used for mapping the semanteme of Chinese and English from one or more Chinese-English dictionaries or English-Chinese bidirectional dictionaries, or adopting a plurality of Chinese-English dictionaries and English-Chinese dictionaries.
When the auxiliary database is from a semantic knowledge base, a dictionary database or a corpus, returning the corresponding English vocabulary items according to the weight preferably; when there are multiple sources formed by one or more of semantic knowledge base, dictionary data base or language database, it is preferable to add the mapping relation weights of Chinese and English vocabulary entries with intersection, and then return the corresponding English vocabulary entries according to the weights.
When the auxiliary database is derived from a chinese-english dictionary and an english-chinese dictionary, the mapping weights that intersect the mapping of the chinese to english vocabulary entries in the chinese-english dictionary and the inverse mapping of the chinese to english vocabulary entries in the english-chinese dictionary are preferably added.
The auxiliary database is from Chinese mapping inquiry of Chinese-English dictionary and English-Chinese dictionary:
1) and (3) searching mapping relation in the Chinese-English dictionary:
searching the search word in the B tree to obtain the mapping from Chinese to English vocabulary entry, wherein the step of searching the English vocabulary entry corresponding to the mapping [ change ] comprises the following steps:
Meta、vary、remodel、transmute、transform、unwill、shunt、alter、change、alteration。
2) in the english-chinese dictionary: finding [ change ] reverse mapped english entries includes:
change、alter、transform、unwill、alteration、vary。
3) and (4) adding the [ change ] and the mapping weights of change, alter, transform, alteration, unwell, vary and the like respectively to complete the weight calculation of the corresponding mapping relation of the auxiliary database.
4) Adding the mapping of the [ change ] and the English vocabulary entry obtained by the auxiliary database and the English vocabulary entry weight mapped by the [ change ] in the reverse full-text retrieval database, sequencing according to the weight, and returning the English vocabulary entries corresponding to change, alter, transform, alteration, unwill, vary and the like according to the weight.
And if the search word does not correspond to the mapping foreign language entries in the full-text search database, returning by using the mapping foreign language entries of the auxiliary database.
Example eight
Inquiring English-Chinese double-solution dictionary by Chinese [ standing will ]:
1) searching mapping relations in auxiliary databases such as a corpus, a knowledge base and a dictionary database:
searching the search word in the B tree to obtain the mapping from Chinese to English vocabulary entry, wherein the step of searching the English vocabulary entry mapped correspondingly by [ vertical notation ] comprises the following steps: aspire, pupose, will.
2) And (3) searching mapping relation in the reverse full-text retrieval database:
and searching English entries mapped correspondingly by the search word [ stem ] in the B tree, wherein the corresponding entry mapping is not found.
3) Returning English vocabulary entries aspire, pupose and scroll mapped by the auxiliary database in the English-Chinese double-solution dictionary according to the weight or the vocabulary entry ordering mode.
As shown in fig. 8, when performing chinese retrieval using the english-chinese dictionary, a chinese term [ china ] is input, english entries mapped in the reverse full-text retrieval database are retrieved, english entries mapped in the auxiliary database are retrieved, weights of the respective mappings of the retrieved chinese term and english entries are calculated and superimposed on weights of the corresponding chinese term and english entries in the reverse full-text retrieval database, and the corresponding english entries are preferably returned according to the weights.
The Chinese (China) is used for mapping the English entry (China) with the highest weight, and the other English entries are returned in sequence according to the weight, including Chinese, Catay, kung fu, Chink, CN, China town and the like, and the details of the English entry are entered by clicking the corresponding English entry.
Particularly, the mapping result of the chinese language and the english language may be returned according to a weight, and may also be returned according to a plurality of sorting manners such as a vocabulary entry arrangement order.
The present invention is not only suitable for english-chinese double-solution dictionaries, but also suitable for german-chinese, japanese-chinese, russian, korean-chinese, french-chinese, western-chinese, ideographic and other double-solution dictionaries, as well as other double-solution dictionaries of languages and chinese, and will not be illustrated one by one here.
The implementation of the invention has the advantages that:
according to the technical scheme, word segmentation processing and library building are carried out by utilizing the first language paraphrases of the double-solution dictionary, one or more of a semantic knowledge base, a dictionary database and a corpus are introduced to serve as an auxiliary database to carry out mapping from the first language to the second language vocabulary entries, candidate second language vocabulary entries are output according to the weight or the vocabulary entry ordering mode, and the accuracy and the coverage degree of word selection when a user queries the double-solution dictionary by using the first language are greatly improved. When the comprehensive index database is not constructed, the mapping relation between the first language and the second language vocabulary entry in the auxiliary database is only required to be called, and the corresponding second language vocabulary entry in the double-solution dictionary is returned, so that the first language retrieval of the double-solution dictionary is realized. The first language retrieval of the double-solution dictionary is carried out by introducing one or more mapping relation databases from the first language retrieval words to the second language entries as auxiliary databases, the best entries matched when the first language in the double-solution dictionary is used as retrieval keywords are output, and the rest of mapped entries are used as candidate entries for the double-solution dictionary query of a user according to the mapping relation, so that the first language word searching user experience of the double-solution dictionary is improved.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention disclosed herein are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (7)

1. A method for constructing a double-solution dictionary comprehensive index database is characterized by comprising the following steps of:
constructing an auxiliary database containing a mapping relation between entries of a first language and entries of a second language;
establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language paraphrase of the double-solution dictionary, comprising the following steps: establishing a reverse full-text retrieval database according to mapping of a first language and a second language vocabulary entry obtained by processing a first language paraphrase word, wherein the first language paraphrase word processing mode comprises one or more of natural language segmentation, unitary segmentation and binary segmentation;
recording the mapping relation of the first language to the second language entries in the auxiliary database as a weight W1; recording the mapping relation of the first language to the second language entries of the reverse full-text retrieval database as a weight W2; the weight W1 of the auxiliary database and the weight W2 of the reverse full-text search database are given in advance and have different weight values;
and generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database.
2. The method for constructing a dual solution dictionary comprehensive index database according to claim 1, wherein the auxiliary database is a database containing mapping relationships between entries of a first language and entries of a second language, and includes one or more of a semantic knowledge base, a dictionary database, and a corpus.
3. A query method of a double-solution dictionary is characterized by comprising the following steps:
constructing an auxiliary database containing the mapping relation between the first language and the second language vocabulary entries, and recording the mapping relation between the first language and the second language vocabulary entries in the auxiliary database as a weight W1;
establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language paraphrase of the double-solution dictionary, comprising the following steps: establishing a reverse full-text retrieval database according to mapping of a first language and a second language vocabulary entry obtained by processing a first language paraphrase word, wherein the first language paraphrase word processing mode comprises one or more of natural language segmentation, unitary segmentation and binary segmentation;
recording the mapping relation between the first language and the second language entries in the reverse full-text retrieval database as a weight W2;
generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database;
and inquiring the corresponding second language entry mapping in the comprehensive index database by taking the first language as a search word, and returning the second language entry item corresponding to the search word.
4. A query method of a double-solution dictionary is characterized by comprising the following steps:
constructing an auxiliary database containing the mapping relation between the first language and the second language vocabulary entries, and recording the mapping relation between the first language and the second language vocabulary entries in the auxiliary database as a weight W1;
establishing a reverse full-text retrieval database containing the mapping relation between the first language and the second language vocabulary entries according to the word segmentation result of the first language paraphrase of the double-solution dictionary, and recording the mapping relation between the first language and the second language vocabulary entries in the reverse full-text retrieval database as a weight W2;
generating a comprehensive index database which takes the first language as a search word and takes the second language vocabulary entry of the target double-solution dictionary as a result according to the auxiliary database and the reverse full-text search database;
and inquiring the corresponding second language entry mapping in the comprehensive index database by taking the first language as a search word, and returning the second language entry item corresponding to the search word.
5. The method of searching the double-solution dictionary according to claim 4, wherein the method of searching the double-solution dictionary comprises: and searching the mapping of the second language vocabulary entries corresponding to the first language in the reverse full-text search database and the auxiliary database, and then respectively calculating the weight of the searched mapping results and returning the weight or returning the corresponding second language vocabulary entries according to the vocabulary entry ordering mode.
6. The method for searching a dictionary with double solutions according to claim 5, wherein the weight W1 of the auxiliary database and the weight W2 of the reverse full text search database are predetermined and different.
7. The method for querying a double-solution dictionary according to one of claims 4 to 6, wherein the method for querying a double-solution dictionary comprises: when the auxiliary database is from one of the semantic knowledge base, the dictionary database and the corpus, returning corresponding second language vocabulary items according to the weight; when a plurality of database sources exist, the mapping relation weights of the first language entries and the second language entries with intersection are added for calculation, and then the corresponding second language entries are returned according to the weights.
CN201710399683.6A 2017-05-31 2017-05-31 Query method of bilingual double-solution dictionary Active CN107169124B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710399683.6A CN107169124B (en) 2017-05-31 2017-05-31 Query method of bilingual double-solution dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710399683.6A CN107169124B (en) 2017-05-31 2017-05-31 Query method of bilingual double-solution dictionary

Publications (2)

Publication Number Publication Date
CN107169124A CN107169124A (en) 2017-09-15
CN107169124B true CN107169124B (en) 2020-10-02

Family

ID=59822082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710399683.6A Active CN107169124B (en) 2017-05-31 2017-05-31 Query method of bilingual double-solution dictionary

Country Status (1)

Country Link
CN (1) CN107169124B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345694B (en) * 2018-03-19 2021-09-03 华北电力大学(保定) Document retrieval method and system based on theme database
CN110909128B (en) * 2019-11-08 2023-08-11 土巴兔集团股份有限公司 Method, equipment and storage medium for carrying out data query by using root list

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101425058A (en) * 2007-10-31 2009-05-06 英业达股份有限公司 Generation system of first language inverse-checking thesaurus and method thereof
JP2012022637A (en) * 2010-07-16 2012-02-02 Sharp Corp Information retrieval apparatus, control program, computer readable recording medium with control program recorded thereon, control method of information retrieval apparatus, and data structure of retrieval history data
CN102654866A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Method and device for establishing example sentence index and method and device for indexing example sentences
CN106294639A (en) * 2016-08-01 2017-01-04 金陵科技学院 Method is analyzed across the newly property the created anticipation of language patent based on semantic

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101425058A (en) * 2007-10-31 2009-05-06 英业达股份有限公司 Generation system of first language inverse-checking thesaurus and method thereof
JP2012022637A (en) * 2010-07-16 2012-02-02 Sharp Corp Information retrieval apparatus, control program, computer readable recording medium with control program recorded thereon, control method of information retrieval apparatus, and data structure of retrieval history data
CN102654866A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Method and device for establishing example sentence index and method and device for indexing example sentences
CN106294639A (en) * 2016-08-01 2017-01-04 金陵科技学院 Method is analyzed across the newly property the created anticipation of language patent based on semantic

Also Published As

Publication number Publication date
CN107169124A (en) 2017-09-15

Similar Documents

Publication Publication Date Title
US11514035B1 (en) Query refinements using search data
US10789309B1 (en) Associating an entity with a search query
US8799307B2 (en) Cross-language information retrieval
US8655901B1 (en) Translation-based query pattern mining
US8346536B2 (en) System and method for multi-lingual information retrieval
US9542476B1 (en) Refining search queries
US8762358B2 (en) Query language determination using query terms and interface language
US8250046B2 (en) Cross-language search
US8977624B2 (en) Enhancing search-result relevance ranking using uniform resource locators for queries containing non-encoding characters
US20160041986A1 (en) Smart Search Engine
US10552467B2 (en) System and method for language sensitive contextual searching
Alex et al. Adapting the Edinburgh geoparser for historical georeferencing
US8862595B1 (en) Language selection for information retrieval
US20100293162A1 (en) Automated Keyword Generation Method for Searching a Database
CN104011712A (en) Evaluating query translations for cross-language query suggestion
CN107133259A (en) A kind of searching method and device
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
US20120162244A1 (en) Image search color sketch filtering
Golub et al. Subject indexing in humanities: a comparison between a local university repository and an international bibliographic service
CN107169124B (en) Query method of bilingual double-solution dictionary
Chandra et al. Query expansion for effective retrieval results of hindi–english cross-lingual IR
CN102609455B (en) Method for Chinese homophone searching
CN107123318B (en) Foreign language writing learning system based on input method device
Willems et al. From science to practice: Bringing innovations to agronomy and forestry
Harshawardhan et al. Phrase based English-Tamil translation system by concept labeling using translation memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 201207 Shanghai Jing Pudong New Area Free Trade Zone Road No. 351 Building No. 2 room A661-07

Applicant after: SHANGHAI HAIDI DIGITAL PUBLISHING TECHNOLOGY Co.,Ltd.

Address before: 201207 Shanghai Jing Pudong New Area Free Trade Zone Road No. 351 Building No. 2 room A661-07

Applicant before: SHANGHAI MINGSHU DIGIT PUBLICATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant