CN107480195A - A kind of ancient documents uniform logical search method based on index relative - Google Patents

A kind of ancient documents uniform logical search method based on index relative Download PDF

Info

Publication number
CN107480195A
CN107480195A CN201710574556.5A CN201710574556A CN107480195A CN 107480195 A CN107480195 A CN 107480195A CN 201710574556 A CN201710574556 A CN 201710574556A CN 107480195 A CN107480195 A CN 107480195A
Authority
CN
China
Prior art keywords
sentence
difference
represent
concordance list
same
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710574556.5A
Other languages
Chinese (zh)
Other versions
CN107480195B (en
Inventor
邵玉斌
朱小妮
杨美菊
王逍翔
曹云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201710574556.5A priority Critical patent/CN107480195B/en
Publication of CN107480195A publication Critical patent/CN107480195A/en
Application granted granted Critical
Publication of CN107480195B publication Critical patent/CN107480195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of ancient documents uniform logical search method based on index relative, specific saying is that the text string inputted in text retrieval extracts any logical relation progress logical combination included, belongs to literature search technical field.The present invention specifically includes:Build directory system;Count the number that the sentence of fixed sentence length occurs;Corresponding rule is established to logical relation;The logical relation included in it is extracted to the text string of input;Rule therein is combined;As a result more than display output six technical steps.The understanding that this method is capable of logic-based relation meets the different Search Requirement of user, is greatly improved Consumer's Experience.

Description

A kind of ancient documents uniform logical search method based on index relative
Technical field
The present invention relates to a kind of ancient documents uniform logical search method based on index relative, belongs to literature search technology neck Domain.
Background technology
Ancient documents data are a kind of storages of magnanimity information, how to be obtained by rational quickly retrieval and meet to use The information that family needs, and automatically the different objects of ancient documents are studied by using the mode of computer, find one A little changes, and then obtain some valuable knowledge.Because relative to the Different Culture of country variant, there is also very big for language Difference, therefore, set specifically for China ancient documents retrieval it is most important, laid the foundation for Knowledge Discovery.
It is existing about retrieval in terms of patent focus mostly on for the information quick-searching on internet, and be directed to ancient documents Retrieval research it is fewer;Such as application publication number:The A of CN 105989030, a kind of text retrieval retrieval side that applicant proposes Method and device;Participle division is carried out by the text inputted to user in that patent, shows each keyword, then again by with Family is gone to select keyword therein to be retrieved, and only realizes and quick-searching is carried out to the information on internet, and can not be directed to Gu The quick-searching of document and various objects are researched and analysed.
Such as application publication number:The A of CN 105354325, a kind of literature search and analysis system that applicant proposes, this is special For profit by setting basic retrieval module, the retrieval module is retrieved in the database of structuring;Set and expand retrieval mould Block, the retrieval module are to ask unified with nature Language Processing to scan for according to user;Multi-source Aggregated search module, the inspection are set Rope module is the multi-data source integration and the cross search of user to patent database;Although the patent is associated by many-side Output more meets the result of user's requirement, but sets the various objects that can not be directed to ancient documents to carry out Research on Statistics and Analysis.
The content of the invention
The technical problem to be solved in the present invention is to provide a kind of ancient documents uniform logical search method based on index relative, It is mainly used in the search problem for solving ancient documents, is laid a good foundation for Knowledge Discovery.
The technical solution adopted by the present invention is:A kind of ancient documents uniform logical search method based on index relative, including Following step:
1) directory system is built:
Read text;
The first concordance list is established, the first concordance list includes document title corresponding to document code and the document numbering;
The second concordance list is established, the second concordance list includes character different in all documents and which text the character appears in In shelves;
The 3rd concordance list is established, the 3rd concordance list includes the position of all different characters and the character in each document Put;
First concordance list, the second concordance list, the 3rd concordance list are write in index file and preserved;
2) number that the sentence of fixed sentence length occurs is counted:
Read the 3rd concordance list;
Because fullstop, question mark, exclamation mark represent the pause of end of the sentence, by reading the 3rd concordance list, each document can be obtained Middle fullstop, question mark, the index information of exclamation mark, A, B, C are designated as respectively, the wherein corresponding relation in A, B, C is: A[a1,a2, A3an], B [b1, b2, b3bn], C [c1, c2, c3cn], Α:a1< a2<a3<······<an、B:b1<b2<b3<······<bn、C:c1<c2<c3<······<Cn and (a1an), (b1bn), (c1cn) are not mutually equal, A, B, C represent respectively punctuation mark fullstop, Question mark, exclamation mark, a1-an represent the position that fullstop occurs in the 3rd index, and b1-bn represents question mark in the 3rd concordance list The position of appearance, c1-cn represent the position that exclamation mark occurs in the 3rd concordance list;
Sorted A, B, C are merged, define D, E set:
A, B are merged first, each sequence safeguards a position indicator pointer, and allows two pointers simultaneously in two row In table after move, take the beginning a1 of two sequences respectively compared with b1, if a1 < b1, D [a1, b1], pointer respectively to One is moved afterwards, a2 is taken compared with b2, if b2 < a2, D [a1, b1, b2], by the finger corresponding to that small array After pin move one i.e. b3 and a2 contrasted, be ranked up according to order from small to large, until the number in two sequences of A, B all Take, then by the number in the number and sequence D in sequence C, compared again according to mentioned above principle, in deposit set E, such general A, B, C merge into a set E arranged according to size order;
Set E [e1, e2, e3en] wherein E:e1<e2<e3<······<En, definition set F, F are:F [e2-e1, e3-e2, e4-e3, en-e (n-1)];
The number that identical numerical value occurs in statistics set F;
3) corresponding rule is established to logical relation:
Establish and occur simultaneously, for character x and character y, wherein x section is gathered:
X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈ { an < xn < bn } }
Wherein y section set:
Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and y3 ∈ { c3 < y3 < d3 }, yn ∈ { cn < yn < dn } } set a2=c2, b2=d2;A3=c3, b3=d3;A5=c5, b5=d5
Then x ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } }
Or x ∩ y={ { c2 < y < d2 }, { c3 < y < d3 }, { c5 < y < d5 } };
The common factor of common factor:The common factor of known foundation,
Z ∈ { y2-x2, y3-x3, y5-x5 } and y2-x2=y5-x5=c, wherein z represent difference of the character in same section One set,
X ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } } ∩ { z ∈ { y2-x2, y5-x5 } }, x ∈ { a < x < b } ∩ y ∈ { c < y < d } ∩ { b-a=c };
Difference set 1:The common factor of known foundation, then
X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈ { an < xn < bn } }-x ∩ y={ { a2 < x2 < b2 }, { a3 < x3 < b3 }, { a5 < x5 < b5 } }={ x1 ∈ { a1 < x1 < b1 }, x4 ∈ { a4 < x4 < b4 }, x6 ∈ { a6 < x6 < b6 }, xn ∈ { an < xn < bn } };
Difference set 2:The common factor of known foundation, then
Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and x3 ∈ { c3 < y3 < d3 }, yn ∈ { cn < yn < dn } }-x ∩ y={ { c2 < y2 < d2 }, { a3 < y3 < d3 }, { c5 < y5 < d5 } }={ y1 ∈ { c1 < y1 < d1 }, y4 ∈ { c4 < y4 < d4 }, y6 ∈ { c6 < y6 < d6 }, yn ∈ { cn < yn < dn } };
4) logical relation included in it is extracted to the text string of input:
By step 2), 3) know:
X ∧ y, represent that existing x has y again in same sentence;
Expression has x without y in same sentence;
Expression has y without x in same sentence;
Yi-xi=p, it is a constant p to represent the difference in same sentence between y and x;
Yi-xi > p, represent that the difference in same sentence between y and x is more than a constant p;
Yi-xi < p, represent that the difference in same sentence between y and x is less than a constant p;
Bi-ai=Q, represent that a sentence length is equal to a constant Q;
Bi-ai > Q, represent that a sentence length is more than a constant Q;
Bi-ai < Q, represent that a sentence length is less than a constant Q;
5) rule therein is combined:
Represent that existing x has y without z again in same a word;
(yi-xi)=p ∧ (bi-ai)=Q, represent that the difference in same sentence between y and x is P, sentence length is Q;
(yi-xi)=p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is P, sentence length More than Q;
(yi-xi)=p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is P, sentence length Less than Q;
(yi-xi) > p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is more than P, sentence length Spend for Q;
(yi-xi) > p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is more than P, sentence length Degree is more than Q;
(yi-xi) > p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is more than P, sentence length Degree is less than Q;
(yi-xi) < p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is less than P, sentence length Spend for Q;
(yi-xi) < p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is less than P, sentence length Degree is more than Q;
(yi-xi) < p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is less than P, sentence length Degree is less than Q;
6) result display output:
Possessed logical relation is extracted according to step 4), logical relation is combined according to step 5), according to step It is rapid 1) to be inquired about in concordance list, Query Result is shown.
The beneficial effects of the invention are as follows:Patent of the present invention is directed to the search problem of ancient documents, it is proposed that a kind of efficient, reasonable Computational methods, not only realize a kind of the problem of offering retrieval for ancient Chinese prose, and can automatically count some information, it is logical Cross and some rules are defined to logical relation, and these rules are combined, laid a good foundation for Knowledge Discovery.
Brief description of the drawings
Fig. 1 is the flow chart of patent structure directory system of the present invention;
Fig. 2 is the overview flow chart in patent of the present invention.
Embodiment
With reference to the accompanying drawings and detailed description, the present invention is described further.
Embodiment 1:As shown in Figure 1, 2, a kind of ancient documents uniform logical search method based on index relative, including it is following Step:
1) directory system is built:
Read text;
The first concordance list is established, the first concordance list includes document title corresponding to document code and the document numbering;
The second concordance list is established, the second concordance list includes character different in all documents and which text the character appears in In shelves;
The 3rd concordance list is established, the 3rd concordance list includes the position of all different characters and the character in each document Put;
First concordance list, the second concordance list, the 3rd concordance list are write in index file and preserved;
2) number that the sentence of fixed sentence length occurs is counted:
Read the 3rd concordance list;
Because fullstop, question mark, exclamation mark represent the pause of end of the sentence, by reading the 3rd concordance list, each document can be obtained Middle fullstop, question mark, the index information of exclamation mark, A, B, C are designated as respectively, the wherein corresponding relation in A, B, C is: A[a1,a2, A3an], B [b1, b2, b3bn], C [c1, c2, c3cn], Α:a1< A2 < a3 <<an、B:b1<b2<b3<······<bn、C:c1<c2<c3<······<cn And (a1an), (b1bn), (c1cn) are not mutually equal (i.e.:The numerical value that these letters represent all is not Equal), A, B, C represent punctuation mark fullstop, question mark, exclamation mark respectively, and a1-an represents that fullstop occurs in the 3rd index Position, b1-bn represents the position that occurs in the 3rd concordance list of question mark, and c1-cn represents that exclamation mark goes out in the 3rd concordance list Existing position;
Sorted A, B, C are merged, define D, E set:
A, B are merged first, each sequence safeguards a position indicator pointer, and allows two pointers simultaneously in two row In table after move, take the beginning a1 of two sequences respectively compared with b1, if a1 < b1, D [a1, b1], pointer respectively to One is moved afterwards, a2 is taken compared with b2, if b2 < a2, D [a1, b1, b2], by the finger corresponding to that small array After pin move one i.e. b3 and a2 contrasted, be ranked up according to order from small to large, until the number in two sequences of A, B all Take, then by the number in the number and sequence D in sequence C, compared again according to mentioned above principle, in deposit set E, such general A, B, C merge into a set E arranged according to size order;
Set E [e1, e2, e3en] wherein E:E1 < e2<e3<······<En, definition set F, F are:F [e2-e1, e3-e2, e4-e3, en-e (n-1)];
The number that identical numerical value occurs in statistics set F;
3) corresponding rule is established to logical relation:
Establish and occur simultaneously, for character x and character y, wherein x section is gathered:
X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈ { an < xn < bn } } wherein y section set:
Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and y3 ∈ { c3 < y3 < d3 }, yn ∈ { cn < yn < dn } } set a2=c2, b2=d2;A3=c3, b3=d3;A5=c5, b5=d5
Then x ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } }
Or x ∩ y={ { c2 < y < d2 }, { c3 < y < d3 }, { c5 < y < d5 } };
The common factor of common factor:The common factor of known foundation,
Z ∈ { y2-x2, y3-x3, y5-x5 } and y2-x2=y5-x5=c, wherein z represent difference of the character in same section One set,
X ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } } ∩ { z ∈ { y2-x2, y5-x5 } }, x ∈ { a < x < b } ∩ y ∈ { c < y < d } ∩ { b-a=c };
Difference set 1:The common factor of known foundation, then
X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈ { an < xn < bn } }-x ∩ y={ { a2 < x2 < b2 }, { a3 < x3 < b3 }, { a5 < x5 < b5 } }={ x1 ∈ { a1 < x1 < b1 }, x4 ∈ { a4 < x4 < b4 }, x6 ∈ { a6 < x6 < b6 }, xn ∈ { an < xn < bn } };
Difference set 2:The common factor of known foundation, then
Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and x3 ∈ { c3 < y3 < d3 }, yn ∈ { cn < yn < dn } }-x ∩ y={ { c2 < y2 < d2 }, { a3 < y3 < d3 }, { c5 < y5 < d5 } }={ y1 ∈ { c1 < y1 < d1 }, y4 ∈ { c4 < y4 < d4 }, y6 ∈ { c6 < y6 < d6 }, yn ∈ { cn < yn < dn } };
4) logical relation included in it is extracted to the text string of input:
By step 2), 3) know:
X ∧ y, represent that existing x has y again in same sentence;
Expression has x without y in same sentence;
Expression has y without x in same sentence;
Yi-xi=p, it is a constant p to represent the difference in same sentence between y and x;
Yi-xi > p, represent that the difference in same sentence between y and x is more than a constant p;
Yi-xi < p, represent that the difference in same sentence between y and x is less than a constant p;
Bi-ai=Q, represent that a sentence length is equal to a constant Q;
Bi-ai > Q, represent that a sentence length is more than a constant Q;
Bi-ai < Q, represent that a sentence length is less than a constant Q;
5) rule therein is combined:
Represent that existing x has y without z again in same a word;
(yi-xi)=p ∧ (bi-ai)=Q, represent that the difference in same sentence between y and x is P, sentence length is Q;
(yi-xi)=p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is P, sentence length More than Q;
(yi-xi)=p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is P, sentence length Less than Q;
(yi-xi) > p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is more than P, sentence length Spend for Q;
(yi-xi) > p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is more than P, sentence length Degree is more than Q;
(yi-xi) > p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is more than P, sentence length Degree is less than Q;
(yi-xi) < p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is less than P, sentence length Spend for Q;
(yi-xi) < p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is less than P, sentence length Degree is more than Q;
(yi-xi) < p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is less than P, sentence length Degree is less than Q;
6) result display output:
Possessed logical relation is extracted according to step 4), logical relation is combined according to step 5), according to step It is rapid 1) to be inquired about in concordance list, Query Result is shown.
Illustrate:As shown in figure 1, by taking four great classical masterpieces as an example, directory system is built, text is read, establishes the first index Table, the first concordance list include document title corresponding to document code and the document numbering;First concordance list such as table 1:
Table 1:
Document code Document title
DocID_0 The Romance of the Three Kingdoms .txt
DocID_1 Water Margin .txt
DocID_2 A Dream of Red Mansions .txt
DocID_3 Journey to the West .txt
……. ……
For above-mentioned first index table information, the second concordance list, the second concordance list bag are established using the method in embodiment 1 Include word different in all documents and the word is appeared in those documents, moreover it is possible to count the word and how many times occur;Take therein A part of word, the second concordance list such as table 2:
Table 2:
For above-mentioned second concordance list, the 3rd concordance list is established using the method in embodiment 1, the 3rd concordance list includes should It is as shown in table 3 for all different characters in A Dream of Red Mansions document and the position of the character, the 3rd concordance list:
Table 3:
The number that the sentence grown using the method statistic fixed sentence in embodiment 1 is occurred, is obtained by reading concordance list three Fullstop, question mark, the index information of exclamation mark, such as table 4:
Table 4:
The position of punctuation mark is ranked up according to size order using the method in embodiment 1, a total of 34390 Put, position is ranked up, obtain table 5:
Table 5:
Using the length of the method statistic sentence in embodiment 1, as shown in table 6:
Table 6:
The number of sentence length appearance is fixed using the method statistic in embodiment 1, as shown in table 7:
Table 7:
Corresponding rule is established to logical relation using the method in embodiment 1, establishes and occurs simultaneously, the common factor of common factor is poor Collection 1, difference set 2, character " because " Interval Set be combined into shown in table 8, character " so " Interval Set is combined into shown in table 9, same Existing character in sentence " because " have again character " " Interval Set be combined into shown in table 10:
Table 8:
Section " because " position
[122314,122338] [122317]
[123276,123335] [123307]
[253308,253339] [253331]
[255769,255802] [255784]
…… ……
[91142,91171] [91158]
Table 9
Section " so " position
[101437,101480] [101471]
[108878,108926] [108918]
[111389,111416] [111372]
[255769,255802] [255794]
…… ……
[99754,99836] [99829]
Table 10
Common factor section " because " position " so " position
[255769,255802] [255784] [255794]
[398953,399003] [398956] [398996]
[459956,460018] [459964] [459988]
[515751,515794] [515755] [515780]
[66039,66085] [66001] [66028]
[749675,749746] [749685] [749697]
[91142,91171] [91158] [91166]
Using in embodiment 1, the logical relation included in it is extracted to the text string of input, as shown in table 11:Table 11:
Using in embodiment 1, rule therein is combined, such as table 12:
Table 12:
Using in embodiment 1, display output is carried out to the result of inquiry, according to table 13, inquiry is met the progress of result Display.Table 13:
With reference to table 13, the character string of input is split into legal substring, finally the result for the condition that meets carried out Display output.
Above in association with accompanying drawing to the present invention embodiment be explained in detail, but the present invention be not limited to it is above-mentioned Embodiment, can also be before present inventive concept not be departed from those of ordinary skill in the art's possessed knowledge Put that various changes can be made.

Claims (1)

  1. A kind of 1. ancient documents uniform logical search method based on index relative, it is characterised in that:Comprise the steps:
    1) directory system is built:
    Read text;
    The first concordance list is established, the first concordance list includes document title corresponding to document code and the document numbering;
    The second concordance list is established, the second concordance list includes character different in all documents and which document the character appears in In;
    The 3rd concordance list is established, the 3rd concordance list includes the position of all different characters and the character in each document;
    First concordance list, the second concordance list, the 3rd concordance list are write in index file and preserved;
    2) number that the sentence of fixed sentence length occurs is counted:
    Read the 3rd concordance list;
    Because fullstop, question mark, exclamation mark represent the pause of end of the sentence, by reading the 3rd concordance list, sentence in each document can be obtained Number, the index information of question mark, exclamation mark, be designated as A, B, C respectively, the wherein corresponding relation in A, B, C is:A[a1,a2, A3an], B [b1, b2, b3bn], C [c1, c2, c3cn], Α:A1 < A2 < a3 < < an, B:B1 < b2 < b3 < < bn, C:C1 < c2 < c3 < < cn and (a1an), (b1bn), (c1cn) are not mutually equal, A, B, C difference Punctuation mark fullstop, question mark, exclamation mark are represent, a1-an represents the position that fullstop occurs in the 3rd index, and b1-bn is represented The position that question mark occurs in the 3rd concordance list, c1-cn represent the position that exclamation mark occurs in the 3rd concordance list;
    Sorted A, B, C are merged, define D, E set:
    A, B are merged first, each sequence safeguards a position indicator pointer, and allows two pointers simultaneously in two lists After move, taking the beginning a1 of two sequences respectively, if, a1 < b1, D [a1, b1], pointer moves respectively backward compared with b1 It is dynamic one, a2 is taken compared with b2, if b2 < a2, D [a1, b1, b2], after the pointer corresponding to that small array Move an i.e. b3 and a2 to be contrasted, be ranked up according to order from small to large, until the number in two sequences of A, B all takes It is complete, then by the number in the number and sequence D in sequence C, compared again according to mentioned above principle, in deposit set E, so by A, B, C merges into a set E arranged according to size order;
    Set E [e1, e2, e3en] wherein E:E1 < e2 < e3 < < en, definition set F, F are:F [e2-e1, e3-e2, e4-e3, en-e (n-1)];
    The number that identical numerical value occurs in statistics set F;
    3) corresponding rule is established to logical relation:
    Establish and occur simultaneously, for character x and character y, wherein x section is gathered:
    X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈ { an < xn < bn } }
    Wherein y section set:
    Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and y3 ∈ { c3 < y3 < d3 }, yn ∈ { cn < yn < dn } } set a2=c2, b2=d2;A3=c3, b3=d3;A5=c5, b5=d5
    Then x ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } }
    Or x ∩ y={ { c2 < y < d2 }, { c3 < y < d3 }, { c5 < y < d5 } };
    The common factor of common factor:The common factor of known foundation,
    Z ∈ { y2-x2, y3-x3, y5-x5 } and y2-x2=y5-x5=c, wherein z represent character same section difference one Individual set,
    X ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } } ∩ { z ∈ { y2-x2, y5-x5 } }, x ∈ { a < x < b } ∩ y ∈ { c < y < d } ∩ { b-a=c };
    Difference set 1:The common factor of known foundation, then
    X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈ { an < xn < bn } }-x ∩ y={ { a2 < x2 < b2 }, { a3 < x3 < b3 }, { a5 < x5 < b5 } }={ x1 ∈ { a1 < x1 < b1 } X4 ∈ { a4 < x4 < b4 }, x6 ∈ { a6 < x6 < b6 }, xn ∈ { an < xn < bn } };
    Difference set 2:The common factor of known foundation, then
    Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and x3 ∈ { c3 < y3 < d3 }, yn ∈ { cn < yn < dn } }-x ∩ y={ { c2 < y2 < d2 }, { a3 < y3 < d3 }, { c5 < y5 < d5 } }={ y1 ∈ { c1 < y1 < d1 } Y4 ∈ { c4 < y4 < d4 }, y6 ∈ { c6 < y6 < d6 }, yn ∈ { cn < yn < dn } };
    4) logical relation included in it is extracted to the text string of input:
    By step 2), 3) know:
    X ∧ y, represent that existing x has y again in same sentence;
    Expression has x without y in same sentence;
    Expression has y without x in same sentence;
    Yi-xi=p, it is a constant p to represent the difference in same sentence between y and x;
    Yi-xi > p, represent that the difference in same sentence between y and x is more than a constant p;
    Yi-xi < p, represent that the difference in same sentence between y and x is less than a constant p;
    Bi-ai=Q, represent that a sentence length is equal to a constant Q;
    Bi-ai > Q, represent that a sentence length is more than a constant Q;
    Bi-ai < Q, represent that a sentence length is less than a constant Q;
    5) rule therein is combined:
    Represent that existing x has y without z again in same a word;
    (yi-xi)=p ∧ (bi-ai)=Q, it is P, sentence length Q to represent the difference in same sentence between y and x;
    (yi-xi)=p ∧ (bi-ai) > Q, represent in same sentence, the difference between y and x is P, sentence length is more than Q;
    (yi-xi)=p ∧ (bi-ai) < Q, represent in same sentence, the difference between y and x is P, sentence length is less than Q;
    (yi-xi) > p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is more than P, and sentence length is Q;
    (yi-xi) > p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is more than P, and sentence length is big In Q;
    (yi-xi) > p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is more than P, and sentence length is small In Q;
    (yi-xi) < p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is less than P, and sentence length is Q;
    (yi-xi) < p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is less than P, and sentence length is big In Q;
    (yi-xi) < p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is less than P, and sentence length is small In Q;
    6) result display output:
    Possessed logical relation is extracted according to step 4), logical relation is combined according to step 5), according to step 1), Inquired about in concordance list, Query Result is shown.
CN201710574556.5A 2017-07-14 2017-07-14 Indexing relation-based ancient literature unified logic retrieval method Active CN107480195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710574556.5A CN107480195B (en) 2017-07-14 2017-07-14 Indexing relation-based ancient literature unified logic retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710574556.5A CN107480195B (en) 2017-07-14 2017-07-14 Indexing relation-based ancient literature unified logic retrieval method

Publications (2)

Publication Number Publication Date
CN107480195A true CN107480195A (en) 2017-12-15
CN107480195B CN107480195B (en) 2020-07-10

Family

ID=60596512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710574556.5A Active CN107480195B (en) 2017-07-14 2017-07-14 Indexing relation-based ancient literature unified logic retrieval method

Country Status (1)

Country Link
CN (1) CN107480195B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5129074A (en) * 1988-09-22 1992-07-07 Hitachi Vlsi Engineering Corporation Data string storage device and method of storing and retrieving data strings
CN102033891A (en) * 2009-09-29 2011-04-27 高德软件有限公司 Retrieval method for Chinese information, retrieval engine for Chinese information and embedded terminal
CN102810096A (en) * 2011-06-02 2012-12-05 阿里巴巴集团控股有限公司 Retrieval method and device based on separate character indexing system
CN102819569A (en) * 2012-07-18 2012-12-12 中国科学院软件研究所 Matching method for data in distributed interactive simulation system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5129074A (en) * 1988-09-22 1992-07-07 Hitachi Vlsi Engineering Corporation Data string storage device and method of storing and retrieving data strings
CN102033891A (en) * 2009-09-29 2011-04-27 高德软件有限公司 Retrieval method for Chinese information, retrieval engine for Chinese information and embedded terminal
CN102810096A (en) * 2011-06-02 2012-12-05 阿里巴巴集团控股有限公司 Retrieval method and device based on separate character indexing system
CN102819569A (en) * 2012-07-18 2012-12-12 中国科学院软件研究所 Matching method for data in distributed interactive simulation system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
席敏: "基于单汉字索引的全文检索系统的研究与实现", 《中国优秀硕士学位论文全文数据库》 *

Also Published As

Publication number Publication date
CN107480195B (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN107180045B (en) Method for extracting geographic entity relation contained in internet text
Pal et al. An approach to automatic text summarization using WordNet
Abello et al. Computational folkloristics
US7870113B2 (en) System and method for organizing data
CN110674252A (en) High-precision semantic search system for judicial domain
Fu et al. Automatic record linkage of individuals and households in historical census data
CN1158460A (en) Multiple languages automatic classifying and searching method
Nualart et al. How we draw texts: a review of approaches to text visualization and exploration
CN106897437B (en) High-order rule multi-classification method and system of knowledge system
CN104778201A (en) Multi-query result combination-based prior art retrieval method
Ma et al. Matching descriptions to spatial entities using a siamese hierarchical attention network
CN107391690B (en) Method for processing document information
Petrus Soft and hard clustering for abstract scientific paper in Indonesian
Miotto et al. Supporting the Curation of Biological Databases Reusable Text Mining
CN103699542A (en) Natural gas and pipe technical standard ontology base establishment method
Modoni et al. A semantic framework for graph-based enterprise search
Kim et al. Network of institutions, source journals, and keywords on COVID-19 by Korean authors based on the Web of Science Core Collection in January 2021
CN107480195A (en) A kind of ancient documents uniform logical search method based on index relative
Wang et al. Normalized Storage Model Construction and Query Optimization of Book Multi-Source Heterogeneous Massive Data
van Hooland et al. Cleaning data with OpenRefine
Hagedorn et al. Bearing a bag-of-tales: An open corpus of annotated folktales for reproducible research
Berenguer et al. Word embeddings for retrieving tabular data from research publications
Liu et al. Knowledge Engineering Research Topic Mining Based on Co-word Analysis.
Laurent et al. Scalable Fuzzy Algorithms for Data Management and Analysis: Methods and Design: Methods and Design
Ingle Processing of unstructured data for information extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant