CN107480195A - A kind of ancient documents uniform logical search method based on index relative - Google Patents
A kind of ancient documents uniform logical search method based on index relative Download PDFInfo
- Publication number
- CN107480195A CN107480195A CN201710574556.5A CN201710574556A CN107480195A CN 107480195 A CN107480195 A CN 107480195A CN 201710574556 A CN201710574556 A CN 201710574556A CN 107480195 A CN107480195 A CN 107480195A
- Authority
- CN
- China
- Prior art keywords
- sentence
- difference
- represent
- concordance list
- same
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000013332 literature search Methods 0.000 abstract description 3
- 239000000284 extract Substances 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013173 literature analysis Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of ancient documents uniform logical search method based on index relative, specific saying is that the text string inputted in text retrieval extracts any logical relation progress logical combination included, belongs to literature search technical field.The present invention specifically includes:Build directory system;Count the number that the sentence of fixed sentence length occurs;Corresponding rule is established to logical relation;The logical relation included in it is extracted to the text string of input;Rule therein is combined;As a result more than display output six technical steps.The understanding that this method is capable of logic-based relation meets the different Search Requirement of user, is greatly improved Consumer's Experience.
Description
Technical field
The present invention relates to a kind of ancient documents uniform logical search method based on index relative, belongs to literature search technology neck
Domain.
Background technology
Ancient documents data are a kind of storages of magnanimity information, how to be obtained by rational quickly retrieval and meet to use
The information that family needs, and automatically the different objects of ancient documents are studied by using the mode of computer, find one
A little changes, and then obtain some valuable knowledge.Because relative to the Different Culture of country variant, there is also very big for language
Difference, therefore, set specifically for China ancient documents retrieval it is most important, laid the foundation for Knowledge Discovery.
It is existing about retrieval in terms of patent focus mostly on for the information quick-searching on internet, and be directed to ancient documents
Retrieval research it is fewer;Such as application publication number:The A of CN 105989030, a kind of text retrieval retrieval side that applicant proposes
Method and device;Participle division is carried out by the text inputted to user in that patent, shows each keyword, then again by with
Family is gone to select keyword therein to be retrieved, and only realizes and quick-searching is carried out to the information on internet, and can not be directed to Gu
The quick-searching of document and various objects are researched and analysed.
Such as application publication number:The A of CN 105354325, a kind of literature search and analysis system that applicant proposes, this is special
For profit by setting basic retrieval module, the retrieval module is retrieved in the database of structuring;Set and expand retrieval mould
Block, the retrieval module are to ask unified with nature Language Processing to scan for according to user;Multi-source Aggregated search module, the inspection are set
Rope module is the multi-data source integration and the cross search of user to patent database;Although the patent is associated by many-side
Output more meets the result of user's requirement, but sets the various objects that can not be directed to ancient documents to carry out Research on Statistics and Analysis.
The content of the invention
The technical problem to be solved in the present invention is to provide a kind of ancient documents uniform logical search method based on index relative,
It is mainly used in the search problem for solving ancient documents, is laid a good foundation for Knowledge Discovery.
The technical solution adopted by the present invention is:A kind of ancient documents uniform logical search method based on index relative, including
Following step:
1) directory system is built:
Read text;
The first concordance list is established, the first concordance list includes document title corresponding to document code and the document numbering;
The second concordance list is established, the second concordance list includes character different in all documents and which text the character appears in
In shelves;
The 3rd concordance list is established, the 3rd concordance list includes the position of all different characters and the character in each document
Put;
First concordance list, the second concordance list, the 3rd concordance list are write in index file and preserved;
2) number that the sentence of fixed sentence length occurs is counted:
Read the 3rd concordance list;
Because fullstop, question mark, exclamation mark represent the pause of end of the sentence, by reading the 3rd concordance list, each document can be obtained
Middle fullstop, question mark, the index information of exclamation mark, A, B, C are designated as respectively, the wherein corresponding relation in A, B, C is: A[a1,a2,
A3an], B [b1, b2, b3bn], C [c1, c2, c3cn], Α:a1<
a2<a3<······<an、B:b1<b2<b3<······<bn、C:c1<c2<c3<······<Cn and
(a1an), (b1bn), (c1cn) are not mutually equal, A, B, C represent respectively punctuation mark fullstop,
Question mark, exclamation mark, a1-an represent the position that fullstop occurs in the 3rd index, and b1-bn represents question mark in the 3rd concordance list
The position of appearance, c1-cn represent the position that exclamation mark occurs in the 3rd concordance list;
Sorted A, B, C are merged, define D, E set:
A, B are merged first, each sequence safeguards a position indicator pointer, and allows two pointers simultaneously in two row
In table after move, take the beginning a1 of two sequences respectively compared with b1, if a1 < b1, D [a1, b1], pointer respectively to
One is moved afterwards, a2 is taken compared with b2, if b2 < a2, D [a1, b1, b2], by the finger corresponding to that small array
After pin move one i.e. b3 and a2 contrasted, be ranked up according to order from small to large, until the number in two sequences of A, B all
Take, then by the number in the number and sequence D in sequence C, compared again according to mentioned above principle, in deposit set E, such general
A, B, C merge into a set E arranged according to size order;
Set E [e1, e2, e3en] wherein E:e1<e2<e3<······<En, definition set
F, F are:F [e2-e1, e3-e2, e4-e3, en-e (n-1)];
The number that identical numerical value occurs in statistics set F;
3) corresponding rule is established to logical relation:
Establish and occur simultaneously, for character x and character y, wherein x section is gathered:
X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈
{ an < xn < bn } }
Wherein y section set:
Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and y3 ∈ { c3 < y3 < d3 }, yn ∈
{ cn < yn < dn } } set a2=c2, b2=d2;A3=c3, b3=d3;A5=c5, b5=d5
Then x ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } }
Or x ∩ y={ { c2 < y < d2 }, { c3 < y < d3 }, { c5 < y < d5 } };
The common factor of common factor:The common factor of known foundation,
Z ∈ { y2-x2, y3-x3, y5-x5 } and y2-x2=y5-x5=c, wherein z represent difference of the character in same section
One set,
X ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } } ∩ { z ∈ { y2-x2, y5-x5 } }, x ∈ { a
< x < b } ∩ y ∈ { c < y < d } ∩ { b-a=c };
Difference set 1:The common factor of known foundation, then
X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈
{ an < xn < bn } }-x ∩ y={ { a2 < x2 < b2 }, { a3 < x3 < b3 }, { a5 < x5 < b5 } }={ x1 ∈ { a1 < x1
< b1 }, x4 ∈ { a4 < x4 < b4 }, x6 ∈ { a6 < x6 < b6 }, xn ∈ { an < xn < bn } };
Difference set 2:The common factor of known foundation, then
Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and x3 ∈ { c3 < y3 < d3 }, yn ∈
{ cn < yn < dn } }-x ∩ y={ { c2 < y2 < d2 }, { a3 < y3 < d3 }, { c5 < y5 < d5 } }={ y1 ∈ { c1 < y1
< d1 }, y4 ∈ { c4 < y4 < d4 }, y6 ∈ { c6 < y6 < d6 }, yn ∈ { cn < yn < dn } };
4) logical relation included in it is extracted to the text string of input:
By step 2), 3) know:
X ∧ y, represent that existing x has y again in same sentence;
Expression has x without y in same sentence;
Expression has y without x in same sentence;
Yi-xi=p, it is a constant p to represent the difference in same sentence between y and x;
Yi-xi > p, represent that the difference in same sentence between y and x is more than a constant p;
Yi-xi < p, represent that the difference in same sentence between y and x is less than a constant p;
Bi-ai=Q, represent that a sentence length is equal to a constant Q;
Bi-ai > Q, represent that a sentence length is more than a constant Q;
Bi-ai < Q, represent that a sentence length is less than a constant Q;
5) rule therein is combined:
Represent that existing x has y without z again in same a word;
(yi-xi)=p ∧ (bi-ai)=Q, represent that the difference in same sentence between y and x is P, sentence length is
Q;
(yi-xi)=p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is P, sentence length
More than Q;
(yi-xi)=p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is P, sentence length
Less than Q;
(yi-xi) > p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is more than P, sentence length
Spend for Q;
(yi-xi) > p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is more than P, sentence length
Degree is more than Q;
(yi-xi) > p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is more than P, sentence length
Degree is less than Q;
(yi-xi) < p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is less than P, sentence length
Spend for Q;
(yi-xi) < p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is less than P, sentence length
Degree is more than Q;
(yi-xi) < p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is less than P, sentence length
Degree is less than Q;
6) result display output:
Possessed logical relation is extracted according to step 4), logical relation is combined according to step 5), according to step
It is rapid 1) to be inquired about in concordance list, Query Result is shown.
The beneficial effects of the invention are as follows:Patent of the present invention is directed to the search problem of ancient documents, it is proposed that a kind of efficient, reasonable
Computational methods, not only realize a kind of the problem of offering retrieval for ancient Chinese prose, and can automatically count some information, it is logical
Cross and some rules are defined to logical relation, and these rules are combined, laid a good foundation for Knowledge Discovery.
Brief description of the drawings
Fig. 1 is the flow chart of patent structure directory system of the present invention;
Fig. 2 is the overview flow chart in patent of the present invention.
Embodiment
With reference to the accompanying drawings and detailed description, the present invention is described further.
Embodiment 1:As shown in Figure 1, 2, a kind of ancient documents uniform logical search method based on index relative, including it is following
Step:
1) directory system is built:
Read text;
The first concordance list is established, the first concordance list includes document title corresponding to document code and the document numbering;
The second concordance list is established, the second concordance list includes character different in all documents and which text the character appears in
In shelves;
The 3rd concordance list is established, the 3rd concordance list includes the position of all different characters and the character in each document
Put;
First concordance list, the second concordance list, the 3rd concordance list are write in index file and preserved;
2) number that the sentence of fixed sentence length occurs is counted:
Read the 3rd concordance list;
Because fullstop, question mark, exclamation mark represent the pause of end of the sentence, by reading the 3rd concordance list, each document can be obtained
Middle fullstop, question mark, the index information of exclamation mark, A, B, C are designated as respectively, the wherein corresponding relation in A, B, C is: A[a1,a2,
A3an], B [b1, b2, b3bn], C [c1, c2, c3cn], Α:a1<
A2 < a3 <<an、B:b1<b2<b3<······<bn、C:c1<c2<c3<······<cn
And (a1an), (b1bn), (c1cn) are not mutually equal (i.e.:The numerical value that these letters represent all is not
Equal), A, B, C represent punctuation mark fullstop, question mark, exclamation mark respectively, and a1-an represents that fullstop occurs in the 3rd index
Position, b1-bn represents the position that occurs in the 3rd concordance list of question mark, and c1-cn represents that exclamation mark goes out in the 3rd concordance list
Existing position;
Sorted A, B, C are merged, define D, E set:
A, B are merged first, each sequence safeguards a position indicator pointer, and allows two pointers simultaneously in two row
In table after move, take the beginning a1 of two sequences respectively compared with b1, if a1 < b1, D [a1, b1], pointer respectively to
One is moved afterwards, a2 is taken compared with b2, if b2 < a2, D [a1, b1, b2], by the finger corresponding to that small array
After pin move one i.e. b3 and a2 contrasted, be ranked up according to order from small to large, until the number in two sequences of A, B all
Take, then by the number in the number and sequence D in sequence C, compared again according to mentioned above principle, in deposit set E, such general
A, B, C merge into a set E arranged according to size order;
Set E [e1, e2, e3en] wherein E:E1 < e2<e3<······<En, definition set
F, F are:F [e2-e1, e3-e2, e4-e3, en-e (n-1)];
The number that identical numerical value occurs in statistics set F;
3) corresponding rule is established to logical relation:
Establish and occur simultaneously, for character x and character y, wherein x section is gathered:
X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈
{ an < xn < bn } } wherein y section set:
Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and y3 ∈ { c3 < y3 < d3 }, yn ∈
{ cn < yn < dn } } set a2=c2, b2=d2;A3=c3, b3=d3;A5=c5, b5=d5
Then x ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } }
Or x ∩ y={ { c2 < y < d2 }, { c3 < y < d3 }, { c5 < y < d5 } };
The common factor of common factor:The common factor of known foundation,
Z ∈ { y2-x2, y3-x3, y5-x5 } and y2-x2=y5-x5=c, wherein z represent difference of the character in same section
One set,
X ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } } ∩ { z ∈ { y2-x2, y5-x5 } }, x ∈
{ a < x < b } ∩ y ∈ { c < y < d } ∩ { b-a=c };
Difference set 1:The common factor of known foundation, then
X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈
{ an < xn < bn } }-x ∩ y={ { a2 < x2 < b2 }, { a3 < x3 < b3 }, { a5 < x5 < b5 } }={ x1 ∈ { a1 < x1
< b1 }, x4 ∈ { a4 < x4 < b4 }, x6 ∈ { a6 < x6 < b6 }, xn ∈ { an < xn < bn } };
Difference set 2:The common factor of known foundation, then
Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and x3 ∈ { c3 < y3 < d3 }, yn ∈
{ cn < yn < dn } }-x ∩ y={ { c2 < y2 < d2 }, { a3 < y3 < d3 }, { c5 < y5 < d5 } }={ y1 ∈ { c1 < y1
< d1 }, y4 ∈ { c4 < y4 < d4 }, y6 ∈ { c6 < y6 < d6 }, yn ∈ { cn < yn < dn } };
4) logical relation included in it is extracted to the text string of input:
By step 2), 3) know:
X ∧ y, represent that existing x has y again in same sentence;
Expression has x without y in same sentence;
Expression has y without x in same sentence;
Yi-xi=p, it is a constant p to represent the difference in same sentence between y and x;
Yi-xi > p, represent that the difference in same sentence between y and x is more than a constant p;
Yi-xi < p, represent that the difference in same sentence between y and x is less than a constant p;
Bi-ai=Q, represent that a sentence length is equal to a constant Q;
Bi-ai > Q, represent that a sentence length is more than a constant Q;
Bi-ai < Q, represent that a sentence length is less than a constant Q;
5) rule therein is combined:
Represent that existing x has y without z again in same a word;
(yi-xi)=p ∧ (bi-ai)=Q, represent that the difference in same sentence between y and x is P, sentence length is
Q;
(yi-xi)=p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is P, sentence length
More than Q;
(yi-xi)=p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is P, sentence length
Less than Q;
(yi-xi) > p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is more than P, sentence length
Spend for Q;
(yi-xi) > p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is more than P, sentence length
Degree is more than Q;
(yi-xi) > p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is more than P, sentence length
Degree is less than Q;
(yi-xi) < p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is less than P, sentence length
Spend for Q;
(yi-xi) < p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is less than P, sentence length
Degree is more than Q;
(yi-xi) < p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is less than P, sentence length
Degree is less than Q;
6) result display output:
Possessed logical relation is extracted according to step 4), logical relation is combined according to step 5), according to step
It is rapid 1) to be inquired about in concordance list, Query Result is shown.
Illustrate:As shown in figure 1, by taking four great classical masterpieces as an example, directory system is built, text is read, establishes the first index
Table, the first concordance list include document title corresponding to document code and the document numbering;First concordance list such as table 1:
Table 1:
Document code | Document title |
DocID_0 | The Romance of the Three Kingdoms .txt |
DocID_1 | Water Margin .txt |
DocID_2 | A Dream of Red Mansions .txt |
DocID_3 | Journey to the West .txt |
……. | …… |
For above-mentioned first index table information, the second concordance list, the second concordance list bag are established using the method in embodiment 1
Include word different in all documents and the word is appeared in those documents, moreover it is possible to count the word and how many times occur;Take therein
A part of word, the second concordance list such as table 2:
Table 2:
For above-mentioned second concordance list, the 3rd concordance list is established using the method in embodiment 1, the 3rd concordance list includes should
It is as shown in table 3 for all different characters in A Dream of Red Mansions document and the position of the character, the 3rd concordance list:
Table 3:
The number that the sentence grown using the method statistic fixed sentence in embodiment 1 is occurred, is obtained by reading concordance list three
Fullstop, question mark, the index information of exclamation mark, such as table 4:
Table 4:
The position of punctuation mark is ranked up according to size order using the method in embodiment 1, a total of 34390
Put, position is ranked up, obtain table 5:
Table 5:
Using the length of the method statistic sentence in embodiment 1, as shown in table 6:
Table 6:
The number of sentence length appearance is fixed using the method statistic in embodiment 1, as shown in table 7:
Table 7:
Corresponding rule is established to logical relation using the method in embodiment 1, establishes and occurs simultaneously, the common factor of common factor is poor
Collection 1, difference set 2, character " because " Interval Set be combined into shown in table 8, character " so " Interval Set is combined into shown in table 9, same
Existing character in sentence " because " have again character " " Interval Set be combined into shown in table 10:
Table 8:
Section | " because " position |
[122314,122338] | [122317] |
[123276,123335] | [123307] |
[253308,253339] | [253331] |
[255769,255802] | [255784] |
…… | …… |
[91142,91171] | [91158] |
Table 9
Section | " so " position |
[101437,101480] | [101471] |
[108878,108926] | [108918] |
[111389,111416] | [111372] |
[255769,255802] | [255794] |
…… | …… |
[99754,99836] | [99829] |
Table 10
Common factor section | " because " position | " so " position |
[255769,255802] | [255784] | [255794] |
[398953,399003] | [398956] | [398996] |
[459956,460018] | [459964] | [459988] |
[515751,515794] | [515755] | [515780] |
[66039,66085] | [66001] | [66028] |
[749675,749746] | [749685] | [749697] |
[91142,91171] | [91158] | [91166] |
Using in embodiment 1, the logical relation included in it is extracted to the text string of input, as shown in table 11:Table
11:
Using in embodiment 1, rule therein is combined, such as table 12:
Table 12:
Using in embodiment 1, display output is carried out to the result of inquiry, according to table 13, inquiry is met the progress of result
Display.Table 13:
With reference to table 13, the character string of input is split into legal substring, finally the result for the condition that meets carried out
Display output.
Above in association with accompanying drawing to the present invention embodiment be explained in detail, but the present invention be not limited to it is above-mentioned
Embodiment, can also be before present inventive concept not be departed from those of ordinary skill in the art's possessed knowledge
Put that various changes can be made.
Claims (1)
- A kind of 1. ancient documents uniform logical search method based on index relative, it is characterised in that:Comprise the steps:1) directory system is built:Read text;The first concordance list is established, the first concordance list includes document title corresponding to document code and the document numbering;The second concordance list is established, the second concordance list includes character different in all documents and which document the character appears in In;The 3rd concordance list is established, the 3rd concordance list includes the position of all different characters and the character in each document;First concordance list, the second concordance list, the 3rd concordance list are write in index file and preserved;2) number that the sentence of fixed sentence length occurs is counted:Read the 3rd concordance list;Because fullstop, question mark, exclamation mark represent the pause of end of the sentence, by reading the 3rd concordance list, sentence in each document can be obtained Number, the index information of question mark, exclamation mark, be designated as A, B, C respectively, the wherein corresponding relation in A, B, C is:A[a1,a2, A3an], B [b1, b2, b3bn], C [c1, c2, c3cn], Α:A1 < A2 < a3 < < an, B:B1 < b2 < b3 < < bn, C:C1 < c2 < c3 < < cn and (a1an), (b1bn), (c1cn) are not mutually equal, A, B, C difference Punctuation mark fullstop, question mark, exclamation mark are represent, a1-an represents the position that fullstop occurs in the 3rd index, and b1-bn is represented The position that question mark occurs in the 3rd concordance list, c1-cn represent the position that exclamation mark occurs in the 3rd concordance list;Sorted A, B, C are merged, define D, E set:A, B are merged first, each sequence safeguards a position indicator pointer, and allows two pointers simultaneously in two lists After move, taking the beginning a1 of two sequences respectively, if, a1 < b1, D [a1, b1], pointer moves respectively backward compared with b1 It is dynamic one, a2 is taken compared with b2, if b2 < a2, D [a1, b1, b2], after the pointer corresponding to that small array Move an i.e. b3 and a2 to be contrasted, be ranked up according to order from small to large, until the number in two sequences of A, B all takes It is complete, then by the number in the number and sequence D in sequence C, compared again according to mentioned above principle, in deposit set E, so by A, B, C merges into a set E arranged according to size order;Set E [e1, e2, e3en] wherein E:E1 < e2 < e3 < < en, definition set F, F are:F [e2-e1, e3-e2, e4-e3, en-e (n-1)];The number that identical numerical value occurs in statistics set F;3) corresponding rule is established to logical relation:Establish and occur simultaneously, for character x and character y, wherein x section is gathered:X1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈ { an < xn < bn } }Wherein y section set:Y1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and y3 ∈ { c3 < y3 < d3 }, yn ∈ { cn < yn < dn } } set a2=c2, b2=d2;A3=c3, b3=d3;A5=c5, b5=d5Then x ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } }Or x ∩ y={ { c2 < y < d2 }, { c3 < y < d3 }, { c5 < y < d5 } };The common factor of common factor:The common factor of known foundation,Z ∈ { y2-x2, y3-x3, y5-x5 } and y2-x2=y5-x5=c, wherein z represent character same section difference one Individual set,X ∩ y={ { a2 < x < b2 }, { a3 < x < b3 }, { a5 < x < b5 } } ∩ { z ∈ { y2-x2, y5-x5 } }, x ∈ { a < x < b } ∩ y ∈ { c < y < d } ∩ { b-a=c };Difference set 1:The common factor of known foundation, thenX1 ∈ { a1 < x1 < b1 }, x2 ∈ { a2 < x2 < b2 }, and x3 ∈ { a3 < x3 < b3 }, xn ∈ { an < xn < bn } }-x ∩ y={ { a2 < x2 < b2 }, { a3 < x3 < b3 }, { a5 < x5 < b5 } }={ x1 ∈ { a1 < x1 < b1 } X4 ∈ { a4 < x4 < b4 }, x6 ∈ { a6 < x6 < b6 }, xn ∈ { an < xn < bn } };Difference set 2:The common factor of known foundation, thenY1 ∈ { c1 < y1 < d1 }, y2 ∈ { c2 < y2 < d2 }, and x3 ∈ { c3 < y3 < d3 }, yn ∈ { cn < yn < dn } }-x ∩ y={ { c2 < y2 < d2 }, { a3 < y3 < d3 }, { c5 < y5 < d5 } }={ y1 ∈ { c1 < y1 < d1 } Y4 ∈ { c4 < y4 < d4 }, y6 ∈ { c6 < y6 < d6 }, yn ∈ { cn < yn < dn } };4) logical relation included in it is extracted to the text string of input:By step 2), 3) know:X ∧ y, represent that existing x has y again in same sentence;Expression has x without y in same sentence;Expression has y without x in same sentence;Yi-xi=p, it is a constant p to represent the difference in same sentence between y and x;Yi-xi > p, represent that the difference in same sentence between y and x is more than a constant p;Yi-xi < p, represent that the difference in same sentence between y and x is less than a constant p;Bi-ai=Q, represent that a sentence length is equal to a constant Q;Bi-ai > Q, represent that a sentence length is more than a constant Q;Bi-ai < Q, represent that a sentence length is less than a constant Q;5) rule therein is combined:Represent that existing x has y without z again in same a word;(yi-xi)=p ∧ (bi-ai)=Q, it is P, sentence length Q to represent the difference in same sentence between y and x;(yi-xi)=p ∧ (bi-ai) > Q, represent in same sentence, the difference between y and x is P, sentence length is more than Q;(yi-xi)=p ∧ (bi-ai) < Q, represent in same sentence, the difference between y and x is P, sentence length is less than Q;(yi-xi) > p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is more than P, and sentence length is Q;(yi-xi) > p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is more than P, and sentence length is big In Q;(yi-xi) > p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is more than P, and sentence length is small In Q;(yi-xi) < p ∧ (bi-ai)=Q, is represented in same sentence, and the difference between y and x is less than P, and sentence length is Q;(yi-xi) < p ∧ (bi-ai) > Q, are represented in same sentence, and the difference between y and x is less than P, and sentence length is big In Q;(yi-xi) < p ∧ (bi-ai) < Q, are represented in same sentence, and the difference between y and x is less than P, and sentence length is small In Q;6) result display output:Possessed logical relation is extracted according to step 4), logical relation is combined according to step 5), according to step 1), Inquired about in concordance list, Query Result is shown.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710574556.5A CN107480195B (en) | 2017-07-14 | 2017-07-14 | Indexing relation-based ancient literature unified logic retrieval method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710574556.5A CN107480195B (en) | 2017-07-14 | 2017-07-14 | Indexing relation-based ancient literature unified logic retrieval method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107480195A true CN107480195A (en) | 2017-12-15 |
CN107480195B CN107480195B (en) | 2020-07-10 |
Family
ID=60596512
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710574556.5A Active CN107480195B (en) | 2017-07-14 | 2017-07-14 | Indexing relation-based ancient literature unified logic retrieval method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107480195B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5129074A (en) * | 1988-09-22 | 1992-07-07 | Hitachi Vlsi Engineering Corporation | Data string storage device and method of storing and retrieving data strings |
CN102033891A (en) * | 2009-09-29 | 2011-04-27 | 高德软件有限公司 | Retrieval method for Chinese information, retrieval engine for Chinese information and embedded terminal |
CN102810096A (en) * | 2011-06-02 | 2012-12-05 | 阿里巴巴集团控股有限公司 | Retrieval method and device based on separate character indexing system |
CN102819569A (en) * | 2012-07-18 | 2012-12-12 | 中国科学院软件研究所 | Matching method for data in distributed interactive simulation system |
-
2017
- 2017-07-14 CN CN201710574556.5A patent/CN107480195B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5129074A (en) * | 1988-09-22 | 1992-07-07 | Hitachi Vlsi Engineering Corporation | Data string storage device and method of storing and retrieving data strings |
CN102033891A (en) * | 2009-09-29 | 2011-04-27 | 高德软件有限公司 | Retrieval method for Chinese information, retrieval engine for Chinese information and embedded terminal |
CN102810096A (en) * | 2011-06-02 | 2012-12-05 | 阿里巴巴集团控股有限公司 | Retrieval method and device based on separate character indexing system |
CN102819569A (en) * | 2012-07-18 | 2012-12-12 | 中国科学院软件研究所 | Matching method for data in distributed interactive simulation system |
Non-Patent Citations (1)
Title |
---|
席敏: "基于单汉字索引的全文检索系统的研究与实现", 《中国优秀硕士学位论文全文数据库》 * |
Also Published As
Publication number | Publication date |
---|---|
CN107480195B (en) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107180045B (en) | Method for extracting geographic entity relation contained in internet text | |
Pal et al. | An approach to automatic text summarization using WordNet | |
Abello et al. | Computational folkloristics | |
US7870113B2 (en) | System and method for organizing data | |
CN110674252A (en) | High-precision semantic search system for judicial domain | |
Fu et al. | Automatic record linkage of individuals and households in historical census data | |
CN1158460A (en) | Multiple languages automatic classifying and searching method | |
Nualart et al. | How we draw texts: a review of approaches to text visualization and exploration | |
CN106897437B (en) | High-order rule multi-classification method and system of knowledge system | |
CN104778201A (en) | Multi-query result combination-based prior art retrieval method | |
Ma et al. | Matching descriptions to spatial entities using a siamese hierarchical attention network | |
CN107391690B (en) | Method for processing document information | |
Petrus | Soft and hard clustering for abstract scientific paper in Indonesian | |
Miotto et al. | Supporting the Curation of Biological Databases Reusable Text Mining | |
CN103699542A (en) | Natural gas and pipe technical standard ontology base establishment method | |
Modoni et al. | A semantic framework for graph-based enterprise search | |
Kim et al. | Network of institutions, source journals, and keywords on COVID-19 by Korean authors based on the Web of Science Core Collection in January 2021 | |
CN107480195A (en) | A kind of ancient documents uniform logical search method based on index relative | |
Wang et al. | Normalized Storage Model Construction and Query Optimization of Book Multi-Source Heterogeneous Massive Data | |
van Hooland et al. | Cleaning data with OpenRefine | |
Hagedorn et al. | Bearing a bag-of-tales: An open corpus of annotated folktales for reproducible research | |
Berenguer et al. | Word embeddings for retrieving tabular data from research publications | |
Liu et al. | Knowledge Engineering Research Topic Mining Based on Co-word Analysis. | |
Laurent et al. | Scalable Fuzzy Algorithms for Data Management and Analysis: Methods and Design: Methods and Design | |
Ingle | Processing of unstructured data for information extraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |