CN102682033A - Method for querying words by matching binary characteristic values - Google Patents

Method for querying words by matching binary characteristic values Download PDF

Info

Publication number
CN102682033A
CN102682033A CN2011100653004A CN201110065300A CN102682033A CN 102682033 A CN102682033 A CN 102682033A CN 2011100653004 A CN2011100653004 A CN 2011100653004A CN 201110065300 A CN201110065300 A CN 201110065300A CN 102682033 A CN102682033 A CN 102682033A
Authority
CN
China
Prior art keywords
character
value
frequency
unit
coupling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100653004A
Other languages
Chinese (zh)
Inventor
张华恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitac International Corp
Original Assignee
Mitac International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitac International Corp filed Critical Mitac International Corp
Priority to CN2011100653004A priority Critical patent/CN102682033A/en
Publication of CN102682033A publication Critical patent/CN102682033A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for querying words by matching binary characteristic values. The method is used for querying the words in a database, wherein character assemblies are stored in the database. The method comprises the following steps of: dividing characters in the database into X units, wherein the total frequency of the characters in each unit is equalized, and the X units respectively correspond to bits of X binary codes; respectively converting the character assemblies in the database into contrast characteristic values of the X binary codes; receiving query words, and converting the query words into query characteristic values of the X binary codes; and comparing the query characteristic values with the contrast characteristic values one by one, and thus obtaining matched contrast characteristic values. The method has the advantage that quick search is facilitated by comparing the binary characteristic values.

Description

Through the method for binary features value coupling with the inquiry literal
[technical field]
The present invention relates to a kind of text search method, particularly a kind of through the method for binary features value coupling with the inquiry literal.
[background technology]
Existing text search method adopts character string relatively usually, to obtain Search Results.For example, patented claim No. 98103003.3 patent in China's Mainland discloses a kind of method according to a query and search database, and it comprises the steps: that (a) provides a database character string; (b) inquiry string is provided; (c) identification not only is present in said inquiry string but also be present in the multiple graphs in the said database character string; (d) cost of the multiple graphs of each said identification is provided; (e) inquiry string is located with respect to each database character string; The multiple graphs that (f) will be present in inquiry string and each database character string matees, and cost relatively provides the similarity of the numerical value indication between said inquiry string and each the said database character string; (g) the described inquiry string of collimation is to reduce cost again, and this realizes through the border that inspection is present in the matching scheme; (h) repeating described coupling and above-mentioned is collimated to a predetermined times again or no longer increases up to the coupling cost; And (i) each database character string is repeated above-mentioned steps (c) to (h), to discern those database character strings the most similar with above-mentioned inquiry string.
But character string speed relatively is slow, if inquiry string often needs the user to wait for the long time in a huge database, can't realize quick search.
[summary of the invention]
Fundamental purpose of the present invention is to provide a kind of binary features value of passing through that shortens search time to mate the method with the inquiry literal.
The present invention provides a kind of and matees the method with the inquiry literal through the binary features value, and it is used in a database, inquiring about literal, and wherein, said databases contains character combination, and said method comprising the steps of:
(1) character in the database is divided into X unit, wherein the character sum frequency in each unit is balanced;
(2) X unit is corresponding with the figure place of X position binary code respectively;
(3) character combination in the database is converted to contrast characteristic's value of X position binary code respectively;
(4) receive the inquiry literal, and will inquire about the query characteristics value that text conversion becomes X position binary code;
(5) query characteristics value and contrast characteristic's value are compared one by one, obtain contrast characteristic's value of coupling.
Especially, said X is 128.
Especially, the step that wherein the character sum frequency in each unit is balanced is specially;
(1-1) number of statistics character inside the library is N;
The sum frequency that (1-2) all characters occur in the staqtistical data base is M time, and W is defined as the balanced number in unit, wherein, W=M/X, the character sum frequency in each unit is positioned at the error range of W.
Especially, said step (1-2) also comprises afterwards,
The frequency that (1-3) each character occurs respectively in the staqtistical data base is defined as Fn with single character frequency, and by frequency height series arrangement, frequency is up to F1, and frequency is minimum to be FK;
(1-4) by F1+ ... Fn-1<W<F1+ ... Fn confirms the n size and obtains critical value P=F1+ ... Fn-1;
(1-5) with P respectively with Fn ... Arbitrary value obtains the character sum frequency in the active cell mutually among the FK, selects optimum frequency Fm so that character sum frequency and W in the active cell are the most approaching, and wherein, m is n ... One of which numerical value among the K;
(1-6) confirm that the character in the active cell is F1 ... Fn-1, the pairing character of Fm also are the character that do not divide into groups with all the other K-n character definition.
(1-7) judge whether K-n equals 0;
(1-8),, and be back to step (1-3) then with K=K-n if K-n is not equal to 0.
Especially, if K-n equals 0, execution in step (2).
Especially, contrast characteristic's value of step (5) coupling is many, and the contrast characteristic of coupling is worth corresponding characters matched combination.
Especially, step (5) comprises also that afterwards step (6) will be inquired about literal and characters matched is combined into the line character string relatively, to obtain accurate Query Result.
Compared with prior art, the present invention utilizes query characteristics value and contrast characteristic's value to compare one by one, and binary features value speed relatively is fast, so that search fast.
[description of drawings]
Fig. 1 is for mating the process flow diagram with the method for inquiry literal through the binary features value.
Fig. 2 is the particular flow sheet of step 10 among Fig. 1.
[embodiment]
See also shown in Figure 1ly, the present invention provides that a kind of it is used in a database, inquiring about literal through the binary features value coupling method with the inquiry literal, and wherein, said databases contains character combination, and said method comprising the steps of:
Step 10: the character in the database is divided into X unit, and wherein the character sum frequency in each unit is balanced; In present embodiment, said X is 128.
Step 20: X unit is corresponding with the figure place of X position binary code respectively;
Step 30: the contrast characteristic's value that the character combination in the database is converted to X position binary code respectively; Wherein, when comprising the character of a plurality of different units in the character combination, put 1 respectively on the corresponding position in contrast characteristic's value;
Step 40: receive the inquiry literal, and will inquire about the query characteristics value that text conversion becomes X position binary code; Wherein, when comprising the character of a plurality of different units in the inquiry literal, put 1 respectively on the corresponding position in contrast characteristic's value:
Step 50: query characteristics value and contrast characteristic's value are compared one by one, obtain contrast characteristic's value of coupling; Wherein, Contrast characteristic's value of coupling can be many, for example, does not put 1 respectively on the coordination for three in the query characteristics value; Can comprise that three are not put 1 respectively on the coordination in contrast characteristic's value of coupling, in contrast characteristic's value put 1 figure place follow through in the inquiry eigenwert to put 1 figure place identical; Also can comprise more than three and not put 1 respectively on the coordination in contrast characteristic's value of coupling, putting in contrast characteristic's value have in 1 the figure place in three inquiry eigenwerts of following through to put 1 figure place identical.And the contrast characteristic of same coupling value can corresponding many characters matched combination, for example, when the contrast characteristic is worth when identical, also possibly be the formed character combination of the different character in same unit.
Step 60: will inquire about literal and characters matched and be combined into the line character string relatively, to obtain accurate Query Result.
See also shown in Figure 2ly, step 10 is specially following steps;
Step 101: the number of statistics character inside the library is N;
Step 102: the sum frequency that all characters occur in the staqtistical data base is M time, and W is defined as the balanced number in unit, wherein, W=M/X, the character sum frequency in each unit is positioned at the error range of W, and wherein error range can set up on their own, for example ± 1%;
Especially, said step (1-2) also comprises afterwards,
Step 103: the frequency that each character occurs respectively in the staqtistical data base, single character frequency is defined as Fn, and by frequency height series arrangement, frequency is up to F1, frequency is minimum to be FK; When calculating the 1st unit character, K=N; When calculating the 2nd unit character, the character number that K=N-has divided into groups Unit the 1st; For example, Unit the 1st comprises 3 characters, when then calculating Unit the 2nd, and K=N-2, if Unit the 2nd comprises 5 characters, then during Unit the 3rd, K=N-2-5; By that analogy;
Step 104: by F1+ ... Fn-1<W<F1+ ... Fn confirms the n size and obtains critical value P=F1+ ... Fn-1;
Step 105: with P respectively with Fn ... Arbitrary value obtains the character sum frequency in the active cell mutually among the FK, selects optimum frequency Fm so that character sum frequency and W in the active cell are the most approaching, and wherein, m is n ... One of which numerical value among the K;
Step 106: confirm that the character in the active cell is F1 ... Fn-1, the pairing character of Fm also are the character that do not divide into groups with all the other K-n character definition.
Step 107: judge whether K-n equals 0; If K-n is not equal to 0, execution in step 108; Otherwise, execution in step 20;
Step 108:, and be back to step 103 with K=K-n.
For example, the number of data character inside the library is 6,000, and X is 128; The sum frequency that all characters occur is 256,000 times, and then the balanced number W in unit is 2000, needs 6; 000 character branch is gone in 128 unit, and the frequency of each character of statistics obtains F1=1000, F2=900 after the ordering more earlier; F3=800, F4=700, F5=600, F6=500; F9=100 ... F6000=5, then after calculating, the 1st unit sum frequency can obtain through F1+F2+F9, and the 1st unit corresponding characters is F1, F2, the pairing character of F9.The 2nd unit sum frequency can obtain through F3+F4+F6, and the 2nd unit corresponding characters is F3, F4, the pairing character of F6, and the rest may be inferred, obtains the character of Unit 128, Unit to the 3rd respectively.Wherein, the character of the 1st unit is corresponding with the 1st figure place of 128 binary features values; The character of the 2nd unit is corresponding with the 2nd figure place of 128 binary features values; By that analogy, the character of the 128th unit is corresponding with the 128th figure place of 128 binary features values.If; If when comprising the character of Unit the 1st, put 1 on the 1st of 128 binary features values in the character combination, in the character combination if when comprising the character of Unit the 78th; Put 1 on the 78th of 128 binary features values, thereby can obtain corresponding 128 binary features values according to character combination.
By way of example, if the user need inquire about " KFC ", and learn by above-mentioned packet mode; " agree " be positioned at Unit the 5th; " moral " is positioned at Unit the 56th, and " base " is positioned at Unit the 118th, and " KFC " pairing binary features value is and puts 128 binary features values of 1 on the 5th, 56,118; According to the binary features value by turn relatively after; Be worth 500 if obtain putting on the 5th, 56,118 the contrast characteristic of 1 binary features value, wherein, though also possibly comprise the character combination of non-" KFC " character among 500 results in Unit the 5th, 56,118; For example three characters " " in the cinema, " shadow ", " institute " also lay respectively at Unit the 5th, 56,118 just, and the character combination that is included " KFC " character after relatively accurately through character string again has 50.If the number of the character combination in the database is 10; Article 000; The time that the character string comparison is 300 is equivalent to the time that the binary features value compares 10000, then utilizes be the required time of 300+50 bar character string binary features value comparative approach institute's time spent, so that search fast.

Claims (7)

1. one kind is passed through binary features value coupling to inquire about the method for literal, and it is used in a database, inquiring about literal, and wherein, said databases contains character combination, it is characterized in that said method comprising the steps of:
(1) character in the database is divided into X unit, wherein the character sum frequency in each unit is balanced;
(2) X unit is corresponding with the figure place of X position binary code respectively;
(3) character combination in the database is converted to contrast characteristic's value of X position binary code respectively;
(4) receive the inquiry literal, and will inquire about the query characteristics value that text conversion becomes X position binary code;
(5) query characteristics value and contrast characteristic's value are compared one by one, obtain contrast characteristic's value of coupling.
2. according to claim 1 through the method for binary features value coupling with the inquiry literal, it is characterized in that: said X is 128.
3. according to claim 2 through the method for binary features value coupling with the inquiry literal, it is characterized in that: the step that wherein the character sum frequency in each unit is balanced is specially;
(1-1) number of statistics character inside the library is N;
The sum frequency that (1-2) all characters occur in the staqtistical data base is M time, and W is defined as the balanced number in unit, wherein, W=M/X, the character sum frequency in each unit is positioned at the error range of W.
4. according to claim 3 through the binary features value coupling method with the inquiry literal, it is characterized in that: said step (1-2) also comprises afterwards,
The frequency that (1-3) each character occurs respectively in the staqtistical data base is defined as Fn with single character frequency, and by frequency height series arrangement, frequency is up to F1, and frequency is minimum to be FK;
(1-4) by F1+ ... Fn-1<W<F1+ ... Fn confirms the n size and obtains critical value P=F1+ ... Fn-1;
(1-5) with P respectively with Fn ... Arbitrary value obtains the character sum frequency in the active cell mutually among the FK, selects optimum frequency Fm so that character sum frequency and W in the active cell are the most approaching, and wherein, m is n ... One of which numerical value among the K;
(1-6) confirm that the character in the active cell is F1 ... Fn-1, the pairing character of Fm also are the character that do not divide into groups with all the other K-n character definition.
(1-7) judge whether K-n equals 0;
(1-8),, and be back to step (1-3) then with K=K-n if K-n is not equal to 0.
5. according to claim 4 through the method for binary features value coupling with the inquiry literal, it is characterized in that: if K-n equals 0, execution in step (2).
6. according to claim 5 through the method for binary features value coupling with the inquiry literal, it is characterized in that: contrast characteristic's value of step (5) coupling is many, and the contrast characteristic of coupling is worth corresponding characters matched combination.
7. according to claim 6 through the method for binary features value coupling with the inquiry literal, it is characterized in that: step (5) comprises also that afterwards step (6) will be inquired about literal and characters matched is combined into the line character string relatively, to obtain accurate Query Result.
CN2011100653004A 2011-03-17 2011-03-17 Method for querying words by matching binary characteristic values Pending CN102682033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100653004A CN102682033A (en) 2011-03-17 2011-03-17 Method for querying words by matching binary characteristic values

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100653004A CN102682033A (en) 2011-03-17 2011-03-17 Method for querying words by matching binary characteristic values

Publications (1)

Publication Number Publication Date
CN102682033A true CN102682033A (en) 2012-09-19

Family

ID=46813979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100653004A Pending CN102682033A (en) 2011-03-17 2011-03-17 Method for querying words by matching binary characteristic values

Country Status (1)

Country Link
CN (1) CN102682033A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104505819A (en) * 2014-11-20 2015-04-08 许继电气股份有限公司 Non-sorting searching method and MMC voltage-sharing method based on the same
CN105022808A (en) * 2015-06-29 2015-11-04 程文举 Binary constant value interval matching method
CN106886566A (en) * 2017-01-12 2017-06-23 北京航空航天大学 The amending method and device of compressed file

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070189613A1 (en) * 2006-02-16 2007-08-16 Fujitsu Limited Word search apparatus, word search method, and recording medium
CN101030216A (en) * 2007-04-02 2007-09-05 丁光耀 Method for matching text string based on parameter characteristics
CN101488127A (en) * 2005-01-17 2009-07-22 徐文新 Bit mark character string retrieval technique

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488127A (en) * 2005-01-17 2009-07-22 徐文新 Bit mark character string retrieval technique
US20070189613A1 (en) * 2006-02-16 2007-08-16 Fujitsu Limited Word search apparatus, word search method, and recording medium
CN101030216A (en) * 2007-04-02 2007-09-05 丁光耀 Method for matching text string based on parameter characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘功申: "基于字频的单模式匹配算法", 《电子学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104505819A (en) * 2014-11-20 2015-04-08 许继电气股份有限公司 Non-sorting searching method and MMC voltage-sharing method based on the same
CN104505819B (en) * 2014-11-20 2017-03-01 许继电气股份有限公司 The lookup method of non-sequence and the MMC method for equalizing voltage based on the method
CN105022808A (en) * 2015-06-29 2015-11-04 程文举 Binary constant value interval matching method
CN106886566A (en) * 2017-01-12 2017-06-23 北京航空航天大学 The amending method and device of compressed file
CN106886566B (en) * 2017-01-12 2019-11-15 北京航空航天大学 The amending method and device of compressed file

Similar Documents

Publication Publication Date Title
CN104731792B (en) The method and system of data base consistency(-tance) method of calibration and system, location database difference
CN107239468B (en) Task node management method and device
MX2020006627A (en) Managing concrete mix design catalogs.
CN103632250A (en) Quick sales order sorting, grouping and screening method
CN103514201A (en) Method and device for querying data in non-relational database
CN107016019B (en) Database index creation method and device
CN108108436B (en) Data storage method and device, storage medium and electronic equipment
CN105354314A (en) Data migration method and device
CN103678408A (en) Method and device for inquiring data
CN103902544A (en) Data processing method and system
CN105843895A (en) EhCache-based data querying and synchronizing method, device and system
CN102682033A (en) Method for querying words by matching binary characteristic values
CN104794130B (en) Relation query method and device between a kind of table
CN101369278A (en) Approximate adaptation method and apparatus
CN102546089A (en) Method and device for implementing cycle redundancy check (CRC) code
CN108304460B (en) Improved database positioning method and system
CN105573843A (en) Data processing method and system
CN108073641B (en) Method and device for querying data table
CN103942196A (en) Method, device and system for data inquiry
CN105847508B (en) A kind of storage method of telephone number, recognition methods and device
RU2010109431A (en) DATA TRANSMISSION METHOD
CN110377681A (en) A kind of method, apparatus of data query, readable storage medium storing program for executing and electronic equipment
Shahanaghi et al. Scheduling and balancing assembly lines with the task deterioration effect
CN105224414A (en) For realizing method of calibration and the device of the code of business task
CN104850716B (en) The optimal case system of selection of design Clustering Model is accessed based on distributed photovoltaic

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120919