CN107480195B - Indexing relation-based ancient literature unified logic retrieval method - Google Patents

Indexing relation-based ancient literature unified logic retrieval method Download PDF

Info

Publication number
CN107480195B
CN107480195B CN201710574556.5A CN201710574556A CN107480195B CN 107480195 B CN107480195 B CN 107480195B CN 201710574556 A CN201710574556 A CN 201710574556A CN 107480195 B CN107480195 B CN 107480195B
Authority
CN
China
Prior art keywords
sentence
index table
difference
indicates
same
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710574556.5A
Other languages
Chinese (zh)
Other versions
CN107480195A (en
Inventor
邵玉斌
朱小妮
杨美菊
王逍翔
曹云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201710574556.5A priority Critical patent/CN107480195B/en
Publication of CN107480195A publication Critical patent/CN107480195A/en
Application granted granted Critical
Publication of CN107480195B publication Critical patent/CN107480195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an index relation-based ancient literature unified logic retrieval method, in particular to a method for extracting any logic relation contained in a text string input in text retrieval to perform logic combination, and belongs to the technical field of literature retrieval. The invention specifically comprises the following patents: constructing an index system; counting the occurrence times of sentences with fixed sentence lengths; establishing corresponding rules for the logic relations; extracting the logic relation contained in the input text string; combining the rules therein; and displaying and outputting the six technical steps. The method can meet different retrieval requirements of the user based on the understanding of the logical relationship, and greatly improves the user experience.

Description

Indexing relation-based ancient literature unified logic retrieval method
Technical Field
The invention relates to an indexing-relation-based ancient literature uniform logic retrieval method, and belongs to the technical field of literature retrieval.
Background
Ancient literature data is a storage of mass information, how to obtain information meeting the needs of users through reasonable and quick retrieval, and research different objects of ancient literature automatically in a computer mode to discover changes and further acquire valuable knowledge. Because the language is greatly different relative to different cultures of different countries, the search for setting specific ancient documents aiming at China is very important, and a foundation is laid for knowledge discovery.
The existing patents related to retrieval are mostly focused on the rapid retrieval of information on the Internet, and the retrieval research on ancient documents is less; for example, application publication No.: CN 105989030 a, a text retrieval method and device proposed by applicant; in the patent, the text input by the user is divided into words, each keyword is displayed, and then the user selects the keyword to search, so that the information on the internet is only quickly searched, and the ancient documents cannot be quickly searched and various objects cannot be researched and analyzed.
For example, application publication No.: CN 105354325 a, a document retrieval and analysis system proposed by the applicant, which is provided with a basic retrieval module, wherein the retrieval module is used for retrieving in a structured database; setting an expansion retrieval module, wherein the retrieval module is used for searching by combining with natural language processing according to a user request; setting a multi-source integrated retrieval module, wherein the retrieval module is used for integrating multiple data sources of a patent database and performing cross-database retrieval on users; although this patent outputs results more conforming to the user's requirements through multi-aspect association, the setup cannot perform statistical analysis studies on various objects of ancient documents.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an indexing relation-based ancient document unified logic retrieval method, which is mainly used for solving the retrieval problem of ancient documents and laying a foundation for knowledge discovery.
The technical scheme adopted by the invention is as follows: a method for searching ancient literature unification logic based on index relationship comprises the following steps:
1) constructing an index system:
reading a text;
establishing a first index table, wherein the first index table comprises a document number and a document name corresponding to the document number;
establishing a second index table, wherein the second index table comprises different characters in all documents and the documents in which the characters appear;
establishing a third index table, wherein the third index table comprises all different characters in each document and the positions of the characters;
writing the first index table, the second index table and the third index table into an index file for storage;
2) counting the occurrence times of sentences with fixed sentence length:
reading a third index table;
because the period, question mark and exclamation mark represent the pause at the end of the period, the index information of the period, question mark and exclamation mark in each document can be obtained by reading the third index table and is respectively marked as A, B, C, wherein the corresponding relation in A, B, C is as follows: a [ a1, a2, a 3. cndot. an ], B [ B1, B2, B3. cndot. bn ], C [ C1, C2, C3. cndot. cng ], A1< a2< a 3. cndot. a, B1< B2. cndot. cng, C1. cndot. cng 2. cndot., (B1. bn), (C1. cndot. cng) are mutually exclusive, the mark A, B, C represents an equal question mark, the question mark 36 1-cndot, and the three-index mark shows the appearance in the appearance position of the three-index table 1, the three index mark shows the appearance of the three-index mark 1-cndot sequence;
the already ordered A, B, C are merged to define D, E set:
a, B is merged, each sequence maintains a position pointer, and two pointers are moved backwards in two lists at the same time, the beginning a1 and b1 of the two sequences are respectively taken to be compared, if a1 is less than b1, D [ a1, b1], the pointers are respectively moved backwards by one bit, a2 and b2 are taken to be compared, if b2 is less than a2, D [ a1, b1, b2], the pointers corresponding to the small arrays are moved backwards by one bit, namely, b3 and a2 are compared, sorting is carried out according to the sequence from small to large until A, B numbers in the two sequences are all taken out, then the numbers in the sequence C and the numbers in the sequence D are compared again according to the principle, and stored in the set E, and then A, B, C is merged into a set E which is arranged according to the size sequence;
set E [ E1, E2, E3. cndot. n ] wherein E1< E2< E3. cndot. F, F is defined as: f [ e2-e1, e3-e2, e4-e3, en-e (n-1) ];
counting the occurrence times of the same numerical values in the set F;
3) establishing a corresponding rule for the logic relation:
establishing intersection, and for the character x and the character y, setting the interval set of x as follows:
{x1∈{a1<x1<b1},x2∈{a2<x2<b2},x3∈{a3<x3<b3},······,xn∈{an<xn<bn}}
wherein the set of intervals for y:
{ y1 ∈ { c1< y 1< d1}, y2 ∈ { c2< y 2< d2}, y3 ∈ { c3< y 3< d3}, · · · · · ·, yn ∈ { cn < yn < dn } }, a2 ═ c2, b2 ═ d2, a3 ═ c3, b3 ═ d3, a5 ═ c5, b5 ═ d5}, and so on
Then x ∩ y { { a2< x < b2}, { a3< x < b3}, { a5 < x < b5}
Or x ∩ y { { c2< y < d2}, { c3< y < d3}, { c5 < y < d5} };
intersection of the intersections: given the intersection set up, it is known that,
z ∈ { y2-x2, y3-x3, y5-x5} and y2-x 2-y 5-x 5-c, where z represents a set of differences for characters in the same interval,
x∩y={{a2<x<b2},{a3<x<b3},{a5<x<b5}}∩{z∈{y2-x2,y5-x5}},x∈{a<x<b}∩y∈{c<y<d}∩{b-a=c};
difference set 1: knowing the established intersection, then
{x1∈{a1<x1<b1},x2∈{a2<x2<b2},x3∈{a3<x3<b3},······,xn∈{an<xn<bn}}- x∩y={{a2<x2<b2},{a3<x3<b3},{a5<x5<b5}}= {x1∈{a1<x1<b1},x4∈{a4<x4<b4},x6∈{a6<x6<b6},······,xn∈{an<xn<bn}};
Difference set 2: knowing the established intersection, then
{y1∈{c1<y1<d1},y2∈{c2<y2<d2},x3∈{c3<y3<d3},······,yn∈{cn<yn<dn}}- x∩y={{c2<y2<d2},{a3<y3<d3},{c5<y5<d5}}= {y1∈{c1<y1<d1},y4∈{c4<y4<d4},y6∈{c6<y6<d6},······,yn∈{cn<yn<dn}};
4) Extracting the logic relation contained in the input text string:
from step 2), 3):
x ^ y, which indicates that x and y exist in the same sentence;
Figure BDA0001350598260000031
indicates that x is present or not in the same sentence;
Figure BDA0001350598260000032
indicating that y is present or not in the same sentence;
yi-xi ═ p, which means that the difference between y and x in the same sentence is a constant p;
yi-xi > p, which means that the difference between y and x in the same sentence is larger than a constant p;
yi-xi < p, which means that the difference between y and x in the same sentence is less than a constant p;
bi-ai ═ Q, meaning that a sentence length equals a constant Q;
bi-ai > Q, meaning that a sentence length is greater than a constant Q;
bi-ai < Q, which means that the length of a sentence is less than a constant Q;
5) combining the rules:
Figure BDA0001350598260000033
indicates that within the same sentence there is both x and y and no z;
(yi-xi) ═ P ^ (bi-ai) ═ Q, meaning that the difference between y and x in the same sentence is P, sentence length is Q;
(yi-xi) ═ P ^ (bi-ai) > Q, which indicates in the same sentence, the difference between y and x is P, and the sentence length is greater than Q;
(yi-xi) ═ P ^ (bi-ai) < Q, which indicates in the same sentence, the difference between y and x is P, and the sentence length is less than Q;
(yi-xi) > P ^ (bi-ai) ═ Q, which indicates that in the same sentence, the difference between y and x is greater than P, and the sentence length is Q;
(yi-xi) > P ^ (bi-ai) > Q, which indicates that in the same sentence, the difference value between y and x is greater than P, and the sentence length is greater than Q;
(yi-xi) > P ^ (bi-ai) < Q, which indicates that in the same sentence, the difference value between y and x is greater than P, and the sentence length is less than Q;
(yi-xi) < P ^ (bi-ai) ═ Q, which indicates in the same sentence that the difference between y and x is less than P, and the sentence length is Q;
(yi-xi) < P ^ (bi-ai) > Q, which indicates that in the same sentence, the difference value between y and x is less than P, and the sentence length is greater than Q;
(yi-xi) < P ^ (bi-ai) < Q, which indicates that in the same sentence, the difference value between y and x is less than P, and the sentence length is less than Q;
6) and (4) displaying and outputting the result:
extracting the logic relations according to the step 4), combining the logic relations according to the step 5), inquiring in the index table according to the step 1), and displaying the inquiring result.
The invention has the beneficial effects that: the invention provides an efficient and reasonable calculation method aiming at the retrieval problem of ancient documents, which not only realizes the retrieval problem aiming at the ancient documents in China, but also can automatically count some information, and lays a foundation for knowledge discovery by defining some rules for logical relations and combining the rules.
Drawings
FIG. 1 is a flow chart of the present invention patent building indexing system;
fig. 2 is a general flow chart in the present patent.
Detailed Description
The invention is further described with reference to the following drawings and detailed description.
Example 1: as shown in fig. 1 and 2, a method for unified logical search of ancient documents based on index relationship includes the following steps:
1) constructing an index system:
reading a text;
establishing a first index table, wherein the first index table comprises a document number and a document name corresponding to the document number;
establishing a second index table, wherein the second index table comprises different characters in all documents and the documents in which the characters appear;
establishing a third index table, wherein the third index table comprises all different characters in each document and the positions of the characters;
writing the first index table, the second index table and the third index table into an index file for storage;
2) counting the occurrence times of sentences with fixed sentence length:
reading a third index table;
because the period, question mark and exclamation mark represent the pause at the end of the period, the index information of the period, question mark and exclamation mark in each document can be obtained by reading the third index table and is respectively marked as A, B, C, wherein the corresponding relation in A, B, C is as follows: a [ a1, a2, a 3. cndot. cndot. ], B [ B1, B2, B3. cndot. cndot. ], C [ C1, C2, C3. cndot. cndot. ], a1< a2< a 3. cndot. cndot. ], B1< B2. cndot. C1< C2. cndot. cndot., (a 1. cndot.), (B1. bn), (C1. cndot. dot. cndot. cndot., (a1, respectively) represent non-index points, and the occurrence of the three-index No. cndot. 1 represents the occurrence positions;
the already ordered A, B, C are merged to define D, E set:
a, B is merged, each sequence maintains a position pointer, and two pointers are moved backwards in two lists at the same time, the beginning a1 and b1 of the two sequences are respectively taken to be compared, if a1 is less than b1, D [ a1, b1], the pointers are respectively moved backwards by one bit, a2 and b2 are taken to be compared, if b2 is less than a2, D [ a1, b1, b2], the pointers corresponding to the small arrays are moved backwards by one bit, namely, b3 and a2 are compared, sorting is carried out according to the sequence from small to large until A, B numbers in the two sequences are all taken out, then the numbers in the sequence C and the numbers in the sequence D are compared again according to the principle, and stored in the set E, and then A, B, C is merged into a set E which is arranged according to the size sequence;
set E [ E1, E2, E3. cndot. n ] wherein E1< E2< E3. cndot. F, defined as set F: f [ e2-e1, e3-e2, e4-e3, en-e (n-1) ];
counting the occurrence times of the same numerical values in the set F;
3) establishing a corresponding rule for the logic relation:
establishing intersection, and for the character x and the character y, setting the interval set of x as follows:
{ x1 ∈ { a1< x 1< b1}, x2 ∈ { a2< x 2< b2}, x3 ∈ { a3< x 3< b3}, · · · · · · · ·, xn ∈ { an < xn < bn } } where the set of intervals for y:
{ y1 ∈ { c1< y 1< d1}, y2 ∈ { c2< y 2< d2}, y3 ∈ { c3< y 3< d3}, · · · · · ·, yn ∈ { cn < yn < dn } }, a2 ═ c2, b2 ═ d2, a3 ═ c3, b3 ═ d3, a5 ═ c5, b5 ═ d5}, and so on
Then x ∩ y { { a2< x < b2}, { a3< x < b3}, { a5 < x < b5}
Or x ∩ y { { c2< y < d2}, { c3< y < d3}, { c5 < y < d5} };
intersection of the intersections: given the intersection set up, it is known that,
z ∈ { y2-x2, y3-x3, y5-x5} and y2-x 2-y 5-x 5-c, where z represents a set of differences for characters in the same interval,
x∩y={{a2<x<b2},{a3<x<b3},{a5<x<b5}}∩{z∈{y2-x2,y5-x5}}, x∈{a<x<b}∩y∈{c<y<d}∩{b-a=c};
difference set 1: knowing the established intersection, then
{x1∈{a1<x1<b1},x2∈{a2<x2<b2},x3∈{a3<x3<b3},······,xn∈{an<xn<bn}}- x∩y={{a2<x2<b2},{a3<x3<b3},{a5<x5<b5}}= {x1∈{a1<x1<b1},x4∈{a4<x4<b4},x6∈{a6<x6<b6},······,xn∈{an<xn<bn}};
Difference set 2: knowing the established intersection, then
{y1∈{c1<y1<d1},y2∈{c2<y2<d2},x3∈{c3<y3<d3},······,yn∈{cn<yn<dn}}- x∩y={{c2<y2<d2},{a3<y3<d3},{c5<y5<d5}}= {y1∈{c1<y1<d1},y4∈{c4<y4<d4},y6∈{c6<y6<d6},······,yn∈{cn<yn<dn}};
4) Extracting the logic relation contained in the input text string:
from step 2), 3):
x ^ y, which indicates that x and y exist in the same sentence;
Figure BDA0001350598260000061
indicates that x is present or not in the same sentence;
Figure BDA0001350598260000062
indicating that y is present or not in the same sentence;
yi-xi ═ p, which means that the difference between y and x in the same sentence is a constant p;
yi-xi > p, which means that the difference between y and x in the same sentence is larger than a constant p;
yi-xi < p, which means that the difference between y and x in the same sentence is less than a constant p;
bi-ai ═ Q, meaning that a sentence length equals a constant Q;
bi-ai > Q, meaning that a sentence length is greater than a constant Q;
bi-ai < Q, which means that the length of a sentence is less than a constant Q;
5) combining the rules:
Figure BDA0001350598260000071
indicates that within the same sentence there is both x and y and no z;
(yi-xi) ═ P ^ (bi-ai) ═ Q, meaning that the difference between y and x in the same sentence is P, sentence length is Q;
(yi-xi) ═ P ^ (bi-ai) > Q, which indicates in the same sentence, the difference between y and x is P, and the sentence length is greater than Q;
(yi-xi) ═ P ^ (bi-ai) < Q, which indicates in the same sentence, the difference between y and x is P, and the sentence length is less than Q;
(yi-xi) > P ^ (bi-ai) ═ Q, which indicates that in the same sentence, the difference between y and x is greater than P, and the sentence length is Q;
(yi-xi) > P ^ (bi-ai) > Q, which indicates that in the same sentence, the difference value between y and x is greater than P, and the sentence length is greater than Q;
(yi-xi) > P ^ (bi-ai) < Q, which indicates that in the same sentence, the difference value between y and x is greater than P, and the sentence length is less than Q;
(yi-xi) < P ^ (bi-ai) ═ Q, which indicates in the same sentence that the difference between y and x is less than P, and the sentence length is Q;
(yi-xi) < P ^ (bi-ai) > Q, which indicates that in the same sentence, the difference value between y and x is less than P, and the sentence length is greater than Q;
(yi-xi) < P ^ (bi-ai) < Q, which indicates that in the same sentence, the difference value between y and x is less than P, and the sentence length is less than Q;
6) and (4) displaying and outputting the result:
extracting the logic relations according to the step 4), combining the logic relations according to the step 5), inquiring in the index table according to the step 1), and displaying the inquiring result.
For example, the following steps are carried out: as shown in fig. 1, taking four famous titles as an example, an index system is constructed, a text is read, and a first index table is established, where the first index table includes a document number and a document name corresponding to the document number; the first index table is as in table 1:
table 1:
document numbering Document name
DocID_0 Txt, a three kingdoms speech meaning
DocID_1 Txt of water transfer
DocID_2 Dream of Red mansions txt
DocID_3 Txt for journey to the West
……. ……
Aiming at the information of the first index table, a second index table is established by adopting the method in the embodiment 1, the second index table comprises different words in all documents and the words appearing in the documents, and the number of times of the words appearing can be counted; taking a part of the words, the second index table is as shown in table 2:
table 2:
Figure BDA0001350598260000081
Figure BDA0001350598260000091
for the second index table, a third index table is created by the method in embodiment 1, the third index table includes all the different characters and the positions of the characters in the document for dream of red mansions, and the third index table is shown in table 3:
table 3:
Figure BDA0001350598260000092
Figure BDA0001350598260000101
the method in embodiment 1 is adopted to count the times of appearance of a sentence with a fixed sentence length, and index information of a period, a question mark and an exclamation mark is obtained by reading an index table three, as shown in table 4:
table 4:
Figure BDA0001350598260000102
the positions of punctuation marks are sorted in order of magnitude by the method in example 1, for a total of 34390 positions, and the positions are sorted to obtain table 5:
table 5:
Figure BDA0001350598260000111
the length of the sentence is counted using the method in example 1, as shown in table 6:
table 6:
Figure BDA0001350598260000112
the number of occurrences of a fixed sentence length was counted using the method of example 1, as shown in table 7:
table 7:
Figure BDA0001350598260000113
Figure BDA0001350598260000121
the method in embodiment 1 is used to establish corresponding rules for the logical relationships, and establish intersection, intersection of intersection, difference 1, difference 2, interval set of character "because" is shown in table 8, interval set of character "so" is shown in table 9, and interval set of both character "because" and character "so" in the same sentence is shown in table 10:
table 8:
interval(s) "because" position
[122314,122338] [122317]
[123276,123335] [123307]
[253308,253339] [253331]
[255769,255802] [255784]
…… ……
[91142,91171] [91158]
TABLE 9
Interval(s) Position of "so
[101437,101480] [101471]
[108878,108926] [108918]
[111389,111416] [111372]
[255769,255802] [255794]
…… ……
[99754,99836] [99829]
Watch 10
Intersection interval "because" position Position of "so
[255769,255802] [255784] [255794]
[398953,399003] [398956] [398996]
[459956,460018] [459964] [459988]
[515751,515794] [515755] [515780]
[66039,66085] [66001] [66028]
[749675,749746] [749685] [749697]
[91142,91171] [91158] [91166]
With the embodiment 1, the logical relationship contained in the input text string is extracted, as shown in table 11: table 11:
Figure BDA0001350598260000131
the rules in example 1 were used in combination, as shown in table 12:
table 12:
Figure BDA0001350598260000141
in embodiment 1, the result of the query is displayed and output, and the result matching the query is displayed according to table 13. Table 13:
Figure BDA0001350598260000142
Figure BDA0001350598260000151
and combining the table 13, splitting the input character string into substrings meeting the rules, and finally displaying and outputting the result meeting the conditions.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims (1)

1. A method for unified logical retrieval of ancient documents based on index relationship is characterized in that: the method comprises the following steps:
1) constructing an index system:
reading a text;
establishing a first index table, wherein the first index table comprises a document number and a document name corresponding to the document number;
establishing a second index table, wherein the second index table comprises different characters in all documents and the documents in which the characters appear;
establishing a third index table, wherein the third index table comprises all different characters in each document and the positions of the characters;
writing the first index table, the second index table and the third index table into an index file for storage;
2) counting the occurrence times of sentences with fixed sentence length:
reading a third index table;
because the period, question mark and exclamation mark represent the pause at the end of the period, the index information of the period, question mark and exclamation mark in each document can be obtained by reading the third index table and is respectively marked as A, B, C, wherein the corresponding relation in A, B, C is as follows: a [ a1, a2, a3 … … an ], B [ B1, B2, B3 … … bn ], C [ C1, C2, C3 … … cn ], A: a1< a2< a3< … … < an, B: B1< B2< B3< … … < bn, C: C1< C2< C3< … … < cn and (a1 … an), (B1 … bn), (C1 … cn) are mutually different, A, B, C respectively represent punctuation marks, question marks, exclamation marks, a1-an represent positions where periods appear in the third index, B1-bn represent positions where question marks appear in the third index table, C1-cn represent positions where exclamation marks appear in the third index table;
the already ordered A, B, C are merged to define D, E set:
a, B is merged, each sequence maintains a position pointer, and two pointers are moved backwards in two lists at the same time, the beginning a1 and b1 of the two sequences are respectively taken to be compared, if a1 is less than b1, D [ a1, b1], the pointers are respectively moved backwards by one bit, a2 and b2 are taken to be compared, if b2 is less than a2, D [ a1, b1, b2], the pointers corresponding to the small array are moved backwards by one bit, namely, b3 and a2 are compared, sorting is carried out according to the sequence from small to large until A, B numbers in the two sequences are all taken out, then the numbers in the sequence C and the numbers in the sequence D are compared again according to the principle that A, B is merged into D, and stored into a set E, thus A, B, C is merged into a set E which is arranged according to the sequence of large and small;
set E [ E1, E2, E3 … … en ] wherein E1< E2< E3< … … < en defines set F which is: f [ e2-e1, e3-e2, e4-e3, … …, en-e (n-1) ];
counting the occurrence times of the same numerical values in the set F;
3) establishing a corresponding rule for the logic relation:
establishing intersection, and for the character x and the character y, setting the interval set of x as follows:
{x1∈{a1<x1<b1},x2∈{a2<x2<b2},x3∈{a3<x3<b3},……,xn∈{an<xn<bn}}
wherein the set of intervals for y:
{y1∈{c1<y1<d1},y2∈{c2<y2<d2},y3∈{c3<y3<d3},……,yn∈{cn<yn<dn}}
let a2 ═ c2, b2 ═ d 2; a3 ═ c3, b3 ═ d 3; a5 ═ c5, b5 ═ d5
Then x ∩ y { { a2< x < b2}, { a3< x < b3}, { a5 < x < b5}
Or x ∩ y { { c2< y < d2}, { c3< y < d3}, { c5 < y < d5} };
intersection of the intersections: given the intersection set up, it is known that,
z ∈ { y2-x2, y3-x3, y5-x5} and y2-x 2-y 5-x 5-c, where z represents a set of differences for characters in the same interval,
x∩y={{a2<x<b2},{a3<x<b3},{a5<x<b5}}∩{z∈{y2-x2,y5-x5}},x∈{a<x<b}∩y∈{c<y<d}∩{b-a=c};
difference set 1: knowing the established intersection, then
{x1∈{a1<x1<b1},x2∈{a2<x2<b2},x3∈{a3<x3<b3},……,xn∈{an<xn<bn}}-x∩y={{a2<x2<b2},{a3<x3<b3},{a5<x5<b5}}={x1∈{a1<x1<b1},x4∈{a4<x4<b4},x6∈{a6<x6<b6},……,xn∈{an<xn<bn}};
Difference set 2: knowing the established intersection, then
{y1∈{c1<y1<d1},y2∈{c2<y2<d2},x3∈{c3<y3<d3},……,yn∈{cn<yn<dn}}-x∩y={{c2<y2<d2},{a3<y3<d3},{c5<y5<d5}}={y1∈{c1<y1<d1},y4∈{c4<y4<d4},y6∈{c6<y6<d6},……,yn∈{cn<yn<dn}};
4) Extracting the logic relation contained in the input text string:
from step 2), 3):
x ^ y, which indicates that x and y exist in the same sentence;
Figure FDA0002417608210000021
indicates that x is present or not in the same sentence;
Figure FDA0002417608210000031
indicating that y is present or not in the same sentence;
yi-xi ═ p, which means that the difference between y and x in the same sentence is a constant p;
yi-xi > p, which means that the difference between y and x in the same sentence is larger than a constant p;
yi-xi < p, which means that the difference between y and x in the same sentence is less than a constant p;
bi-ai ═ Q, meaning that a sentence length equals a constant Q;
bi-ai > Q, meaning that a sentence length is greater than a constant Q;
bi-ai < Q, which means that the length of a sentence is less than a constant Q;
5) combining the rules:
Figure FDA0002417608210000032
indicates that within the same sentence there is both x and y and no z;
(yi-xi) ═ P ^ (bi-ai) ═ Q, meaning that the difference between y and x in the same sentence is P, sentence length is Q;
(yi-xi) ═ P ^ (bi-ai) > Q, which indicates in the same sentence, the difference between y and x is P, and the sentence length is greater than Q;
(yi-xi) ═ P ^ (bi-ai) < Q, which indicates in the same sentence, the difference between y and x is P, and the sentence length is less than Q;
(yi-xi) > P ^ (bi-ai) ═ Q, which indicates that in the same sentence, the difference between y and x is greater than P, and the sentence length is Q;
(yi-xi) > P ^ (bi-ai) > Q, which indicates that in the same sentence, the difference value between y and x is greater than P, and the sentence length is greater than Q;
(yi-xi) > P ^ (bi-ai) < Q, which indicates that in the same sentence, the difference value between y and x is greater than P, and the sentence length is less than Q;
(yi-xi) < P ^ (bi-ai) = Q, which means in the same sentence, the difference between y and x is less than P, and the sentence length is Q;
(yi-xi) < P ^ (bi-ai) > Q, which indicates that in the same sentence, the difference value between y and x is less than P, and the sentence length is greater than Q;
(yi-xi) < P ^ (bi-ai) < Q, which indicates that in the same sentence, the difference value between y and x is less than P, and the sentence length is less than Q;
6) and (4) displaying and outputting the result:
extracting the logic relations according to the step 4), combining the logic relations according to the step 5), inquiring in the index table according to the step 1), and displaying the inquiring result.
CN201710574556.5A 2017-07-14 2017-07-14 Indexing relation-based ancient literature unified logic retrieval method Active CN107480195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710574556.5A CN107480195B (en) 2017-07-14 2017-07-14 Indexing relation-based ancient literature unified logic retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710574556.5A CN107480195B (en) 2017-07-14 2017-07-14 Indexing relation-based ancient literature unified logic retrieval method

Publications (2)

Publication Number Publication Date
CN107480195A CN107480195A (en) 2017-12-15
CN107480195B true CN107480195B (en) 2020-07-10

Family

ID=60596512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710574556.5A Active CN107480195B (en) 2017-07-14 2017-07-14 Indexing relation-based ancient literature unified logic retrieval method

Country Status (1)

Country Link
CN (1) CN107480195B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5129074A (en) * 1988-09-22 1992-07-07 Hitachi Vlsi Engineering Corporation Data string storage device and method of storing and retrieving data strings
CN102033891A (en) * 2009-09-29 2011-04-27 高德软件有限公司 Retrieval method for Chinese information, retrieval engine for Chinese information and embedded terminal
CN102810096A (en) * 2011-06-02 2012-12-05 阿里巴巴集团控股有限公司 Retrieval method and device based on separate character indexing system
CN102819569A (en) * 2012-07-18 2012-12-12 中国科学院软件研究所 Matching method for data in distributed interactive simulation system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5129074A (en) * 1988-09-22 1992-07-07 Hitachi Vlsi Engineering Corporation Data string storage device and method of storing and retrieving data strings
CN102033891A (en) * 2009-09-29 2011-04-27 高德软件有限公司 Retrieval method for Chinese information, retrieval engine for Chinese information and embedded terminal
CN102810096A (en) * 2011-06-02 2012-12-05 阿里巴巴集团控股有限公司 Retrieval method and device based on separate character indexing system
CN102819569A (en) * 2012-07-18 2012-12-12 中国科学院软件研究所 Matching method for data in distributed interactive simulation system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于单汉字索引的全文检索系统的研究与实现;席敏;《中国优秀硕士学位论文全文数据库》;20120315(第3期);第I138-2666页正文 *

Also Published As

Publication number Publication date
CN107480195A (en) 2017-12-15

Similar Documents

Publication Publication Date Title
CN105653706B (en) A kind of multilayer quotation based on literature content knowledge mapping recommends method
CN107180045B (en) Method for extracting geographic entity relation contained in internet text
CN107590128B (en) Paper homonymy author disambiguation method based on high-confidence characteristic attribute hierarchical clustering method
CN101794307A (en) Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea
CN102622371B (en) Historical association database system, implementation method and electronic learning equipment thereof
JP6722615B2 (en) Query clustering device, method, and program
CN113673252B (en) Automatic join recommendation method for data table based on field semantics
US10380065B2 (en) Method for establishing a digitized interpretation base of dongba classic ancient books
CN102385597B (en) The fault-tolerant searching method of a kind of POI
CN110928978A (en) Standard literature classification retrieval method
CN107391690B (en) Method for processing document information
CN107480195B (en) Indexing relation-based ancient literature unified logic retrieval method
CN102915304A (en) Document retrieval device and document retrieval method
JP5299963B2 (en) Analysis system and information analysis method
CN103324644A (en) Query result diversification method
CN112765311A (en) Method for searching referee document
CN102521267B (en) In-station information searching method and system
Chen Building a web‐snippet clustering system based on a mixed clustering method
Wool Filing and precoordination: how subject headings are displayed in online catalogs and why it matters
CN111241283B (en) Rapid characterization method for portrait of scientific research student
CN108363696A (en) A kind of processing method and processing device of text message
KR101247346B1 (en) System and method for searching dictionary
JPH02287876A (en) Text type data base device
Taveekarn et al. Data++: An automated tool for intelligent data augmentation using wikidata
CN103699540A (en) Establishment method for natural gas and pipe technical standard content extraction and display system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant