CN101714166A - Method and system for testing performance of large-scale multi-keyword precise matching algorithm - Google Patents
Method and system for testing performance of large-scale multi-keyword precise matching algorithm Download PDFInfo
- Publication number
- CN101714166A CN101714166A CN200910236817A CN200910236817A CN101714166A CN 101714166 A CN101714166 A CN 101714166A CN 200910236817 A CN200910236817 A CN 200910236817A CN 200910236817 A CN200910236817 A CN 200910236817A CN 101714166 A CN101714166 A CN 101714166A
- Authority
- CN
- China
- Prior art keywords
- keyword
- matching algorithm
- text
- module
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012360 testing method Methods 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000011056 performance test Methods 0.000 claims abstract description 27
- 238000012795 verification Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000012956 testing procedure Methods 0.000 claims 1
- 238000007781 pre-processing Methods 0.000 abstract description 4
- 230000010354 integration Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 27
- 238000011156 evaluation Methods 0.000 description 13
- 230000008878 coupling Effects 0.000 description 10
- 238000010168 coupling process Methods 0.000 description 10
- 238000005859 coupling reaction Methods 0.000 description 10
- 238000000151 deposition Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 241000700605 Viruses Species 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a system for testing performance of a large-scale multi-keyword precise matching algorithm. The system comprises a test data generating module and a keyword set preprocessing performance test module, wherein the test data generating module specifically comprises a random keyword generating sub-module, a random text data generating sub-module and a sub-module for generating a text to be matched; and the keyword set preprocessing performance test module specifically comprises a matching algorithm preprocessing interface calling sub-module and a test information generating sub-module. The method and the system solve the problems of interface standards and interoperation access between different network information security devices, realize cooperative work and linkage between the network information security devices and finally realize seamless integration of the network information security devices, and can test the performance indexes of various multi-keyword precise matching algorithms.
Description
Technical field
The present invention relates to computer data handling property field tests, relate in particular to a kind of performance test methods of large-scale multi-keyword precise matching algorithm.
Background technology
The multi-key word coupling is called the multi-mode coupling again, is one of basic problem in the computer science.Its problem that need solve is exactly to judge the position of the arbitrary patterns that occurs in text to be measured or the Web content quickly and accurately.The application of multi-mode matching technique is very extensive, except the network safety filed such as fire wall, intrusion detection and defence, virus detection and Web content filtration of being used widely, also expand to other subject and field, the gene order detection in the middle of for example information management, network search engines and the bioinformatics etc.Therefore, research and development multi-key word coupling and correlation technique thereof have very strong science and practical significance, and the science and the industry of being correlated with are paid close attention to.
There have been many classic algorithm in the multi-key word matching technique, based on the Wu-Manber algorithm that jumps, based on the Aho-Corasick algorithm of finite-state automata thought and AC-BM algorithm, based on SBOM algorithm of factor mode or the like.In the last few years, along with application requirements is constantly accelerated to the continuous increase in keyword quantity ground with to processing speed ground, a lot of improved multi-key word matching algorithms have been proposed again.So many multi-key word matching algorithm, does its Performance evaluation criterion all have those? the multi-key word matching algorithm generally comprises two stages: pretreatment stage and search phase.The pretreatment stage of each matching algorithm generally is the pre-service that will finish keyword set; Because each matching algorithm difference, the pre-service work that its pretreatment stage will be done is different, mainly is exactly to set up three tables as the Wu-Manber algorithm at pretreatment stage: skip list, Hash table and prefix table; And the Aho-Corasick algorithm is to set up finite-state automata.Pretreatment stage only need be carried out once, no longer changes in case keyword set is just decided.Therefore, be pretreatment time and storage space occupancy in the main Performance evaluation criterion of pretreatment stage.The search phase of algorithm is mainly finished the coupling work to input text or real time data, and the search phase, just the matching speed of algorithm was the main evaluation criterion in this stage to the processing speed of input text or real time data.So generally speaking, the Performance evaluation criterion of multi-key word matching algorithm mainly is exactly that matching speed, pretreatment time and storage space take situation.
In the accurate matching algorithm of existing multiple key, the algorithm that has has good matching speed, but along with the increase of keyword, storage space consumption is exponential growth, as the Aho-Corasick algorithm; Though the storage space that the algorithm that has consumes can be accepted, pretreatment time is longer, and along with the continuous increase of keyword, pretreatment time reaches unacceptable degree, as the SBOM algorithm; The matching algorithm pretreatment time, space hold and the matching speed that have are all good, but exist the worst case of algorithm, and algorithmic match speed is very low when worst case occurs, as the Wu-Manber algorithm.Just qualitatively each algorithm is carried out general evaluation, performance evaluation that neither one is quantitative and comparison above.At different application, to matching algorithm time and spatial character require differently, in general, most time and the spatial characters that all can take all factors into consideration matching algorithm of using are selected only matching algorithm.So for the application choice matching algorithm or when investigating new improvement algorithm, the performance of each matching algorithm how relatively? how about estimate an algorithm and be better than other each matching algorithms? up to the present go back the unified test evaluation method of neither one.
Summary of the invention
(1) technical matters that will solve
The objective of the invention is to overcome the deficiencies in the prior art, a kind of performance test methods and system of unified large-scale multi-keyword precise matching algorithm is provided, it can be tested the performance index of various multi-keyword precise matching algorithms.
(2) technical scheme
At above problem, the present invention propose a kind of Performance Test System of large-scale multi-keyword precise matching algorithm, described system comprises as lower module:
F1: the test data generation module specifically comprises:
F11: keyword generates submodule at random, is used to generate keyword set at random;
F12: the random text data generate submodule, are used to generate the random text data;
F13: text generation submodule to be matched, be used for keyword set is inserted into text data, produce text to be matched;
F2: keyword set pre-service performance test module specifically comprises:
F21: matching algorithm pre-service interface interchange submodule is used for calling the pre-service interface of matching algorithm by general matching algorithm calling interface;
F22: detecting information generates submodule, be used for keyword set as input file, carry out and generate the keyword related data structure, the key message of statistic algorithm result, described key message comprise the maximum memory information that the data structure of pretreatment time and keyword generation takies;
Wherein, this system also comprises as lower module:
F3: the search performance test module of matching algorithm specifically comprises:
F31: matching algorithm search utility interface interchange submodule is used for calling the search utility interface of matching algorithm to be measured by general matching algorithm calling interface;
F32: search utility scanning submodule, be used for the data structure that finishes the back generation through module F2 processing carrying out the search utility of matching algorithm to be measured as input, treat the matched text file and scan;
F33: detecting information generates submodule, be used for writing down keyword numbering that text to be matched occurs and the position that in text, occurs, these information are saved in the output destination file, simultaneously the maximum memory information of using in record searching time and the search procedure;
Wherein, this system also comprises as lower module:
F4: verification search result and generation statistical report module specifically comprise:
Statistics generates submodule, be used for after the processing of module F2 and F3, expected results data message and actual test result data are compared as input, the correctness of verification algorithm, the performance information that produces after the processing of module F2 and F3 is together as input then, adds up and outputs test result.。
(3) beneficial effect
Adopt the Performance Test System of large-scale multi-keyword precise matching algorithm of the present invention, can produce and use keyword set to come different large-scale and multi-key word matching algorithms is tested with text data to be matched with different qualities, because the present invention has set up a unified architecture platform, the accurate matching algorithm of all multiple keys can be tested by this platform, and the performance to various algorithms, design, efficient that so just can be fair and reasonable be done quantitative evaluation.
Description of drawings
Fig. 1 is the assessment test platform frame diagram among the present invention;
Fig. 2 is a test data generation module principle assumption diagram among the present invention;
Fig. 3 is a matching algorithm test module illustraton of model among the present invention.
Embodiment
The performance test methods and the system of a kind of large-scale multi-keyword precise matching algorithm that the present invention proposes are described as follows in conjunction with the accompanying drawings and embodiments.Following embodiment only is used to illustrate the present invention; and be not limitation of the present invention; the those of ordinary skill in relevant technologies field; under the situation that does not break away from the spirit and scope of the present invention; can also make various variations and modification; therefore all technical schemes that are equal to also belong to category of the present invention, and scope of patent protection of the present invention should be limited by each claim.
Be illustrated in figure 1 as the accurate matching algorithm assessment of general multiple key test platform frame diagram, the test and appraisal platform is made up of two parts, and test data produces part and matching algorithm performance test part.Test data produces part to be made up of three submodules, carries out the function that generates keyword set and text data to be matched; The matching algorithm performance test partly comprises three submodules and a general matching algorithm calling interface.
The present invention is the assessment test platform that is used for the unified multi-key word matching algorithm of text data or network content analysis.Concrete enforcement comprises that two steps, the first step are that test data produces the stage, comprise the generation of keyword set and text data to be matched.Second step was the matching algorithm test phase, carried out matching algorithm by general-purpose interface, tested the pretreatment stage of matching algorithm and the performance index of search phase, obtained the performance situation of each matching algorithm reality according to the performance index that come out.Describe the particular content in each stage of the present invention below in detail.
At first be that test data produces the stage, this module principle structural drawing is made up of three submodules as shown in Figure 2: keyword set maker, random text maker, test text compositor.The keyword set maker can generate the keyword set with specified characteristic according to the configuration file content of input; The random text maker generates text message at random, produces data source as final text data to be matched; The test text compositor generates the text data to be matched with certain specific character according to configuration file, keyword set and the random text of input.
The detailed step in this stage is as follows:
1, read in configuration information file configure, the parameter and the implication thereof that can be provided with in this configuration file are as shown in table 1;
Table 1 generates the configurable parameter instruction card of data source
Configuration item | Type | The configuration item explanation |
??randseed | Integer | Random seed produces the round values that random number is used, and is defaulted as 100 |
??sigmasize | Integer | The character set size |
??beginASC | Integer | Bebinning character ASCII character (sigmasize+beginASC must be less than or equal to 256) |
Configuration item | Type | The configuration item explanation |
??Function | Integer | Function=0 then generates keyword set simultaneously and text to be matched (being equivalent to the integrated of function 1 and 2) Function=1 then generates keyword set at random, and filename is specified by the patternfile parameter; Function=2 then reads in keyword by patternfile, generates text to be matched, and the filename of text is provided by the textfile parameter; Function=3 is the unified binary format of this platform with existing keyword set file conversion |
??textsizeM | Integer | Size text (unit is MB) |
??patternnum | Integer | Keyword number (unit is individual) |
??patterntype | Integer | Length keywords in the patterntype=1 keyword set is variable, below five parameters work the patterntype=0 keyword in conjunction with in length keywords identical, be the length of Lminlen appointment |
??patternratio | Integer | High frequency byte ratio (%) |
??Hminlen | Integer | The minimum length of high frequency keyword |
??Hmaxlen | Integer | The maximum length of high frequency keyword |
??Lminlen | Integer | The minimum length of other keywords |
?Lmaxlen | Integer | The maximum length of other keywords |
?matchtimes | Integer | The matching times of each keyword in text |
?matchfre | Integer | Above two parameter configuration of keyword number that coupling takes place are represented 20% coupling, and 80% changes matchfre into 80,300% changes matchfre into 100, and matchtimes changes 3 into simultaneously |
?textfile | Text | Function=0 and 2 o'clock paths of depositing for the output text, Function=1 is useless |
?patternfile | Text | Function=0,1 and 3 o'clock for the output keyword set deposit the path, be the path of depositing of desiring to read in keyword set during Function=2 |
?verifyfile | Text | Function=0 or 2 o'clock are for output data message to be verified, and are useless during Function=1 |
2, analysis configuration message file, the different value that is provided with according to function Function item in the configuration file produces different data sources, comprises that keyword set, text to be matched or both produce simultaneously.
2.1 function Function item is 1, then should generate keyword set, the characteristic of keyword set is given by the corresponding entry in the configuration file, and the keyword set of generation saves as binary file, filename is that the patternfile parameter is given in the configuration file, is defaulted as pattern.cfg.
(1) have about the parameter that produces keyword set among the configuration file configure: whether character set size sigmasize, keyword number patternnum, keyword are elongated patterntype, high frequency keyword minimum length Hminlen and maximum length Hmaxlen, high frequency keyword ratio patternratio, other keyword minimum length Lminlen and maximum length Lmaxlen etc.Describe in detail below according among the configure about the method for each parameter generating keyword set of keyword set:
(1.1) judge the patterntype parameter item, whether each length keywords is elongated in this parametric representation keyword set, if patterntype is 0, the length keywords of Chan Shenging is identical so, read in the length of the value of parameter L minlen, change step (1.3) over to as keyword; If patterntype is 1, illustrate that the length keywords that requires to produce does not wait, and changes (1.2) over to and continues to read parameter;
(1.2) reading Hminlen and Hmaxlen parameter, be respectively the minimum and the maximum length of high frequency keyword, read the patternratio parameter, is the number percent that the high frequency keyword occupies in all keywords; Read Lminlen and Lmaxlen parameter,, enter next step then for the minimum of other keyword except that the high frequency keyword with to big length;
(1.3) read the sigmasize parameter and obtain the character set size (should be 1~256) that produces keyword, read the keyword number that the patternnum parameter obtains needs generation;
(1.4) according to each parameter that reads above, and the keyword set file name that provides of patternfile parameter is as the input of keyword set maker module, produce keyword set file at random, the keyword form is " keyword numbering+tab+ keyword+newline ".
The example of parameters that produces the keyword set configuration is as shown in table 2:
Table 2 produces the example of parameters table that keyword set can dispose
Configuration item | Numerical value | The configuration item explanation |
??randseed | ??100 | Random seed is made as default value |
??sigmasize | ??256 | The character set size is made as 256 |
??beginASC | ??0 | Bebinning character ASCII character (sigmasize+beginASC must be less than or equal to 256) |
??Function | ??1 | Function=1 generates pattern at random |
??patternnum | ??50000 | 50000 of keyword numbers |
??patterntype | ??1 | Length keywords is variable |
??patternratio | ??80 | High frequency byte ratio (80%), high frequency length keyword accounts for 80% of all keyword sums |
??Hminlen | ??8 | The minimum length 8bytes of high frequency keyword |
??Hmaxlen | ??16 | The most fiery length 16bytes of high frequency keyword |
??Lminlen | ??4 | The minimum length 4bytes of other keywords |
??Lmaxlen | ??100 | The maximum length 100bytes of other keywords |
??patternfile | ??Pattern.cfg | The file of depositing of output keyword set is called pattern.cfg |
(2) if already present keyword set file is arranged, as virus base file or spam library file, can be made as 3 to the function Function item in the configuration file, be the unified binary format file of this platform with the keyword set file conversion, and filename is specified by the patternfile parameter.
2.2 function Function item is 2, then should generate text to be matched, text to be matched will produce according to the keyword set file of patternfile parameter appointment and other configuration item, generation be binary file, filename is specified by the textfile parameter item, is defaulted as text.dat.
(1) have about the parameter that produces text to be matched among the configuration file configure: matching times matchtimes in text of the size text textsizeM of character set size sigmasize, generation, keyword, the keyword that coupling takes place account for total keyword number percent matchfre, read in the authenticating documents name verifyfile of keyword number patternnum and keyword set filename patternfile, output etc.Describe in detail below according among the configure about the method for each parameter generating text of text to be matched:
(1.1) read the patternfile parameter and obtain the keyword set filename, and open this file;
(1.2) read the matchfre parameter, percent value and the keyword sum given by parameter calculate the keyword number that will extract.If matchfre<100, directly utilize matchfre and patternnum to calculate, keyword number patternnum=5000 for example, matchfre=20 (the keyword number that coupling takes place account for total keyword number 20%), to from keyword set, randomly draw 1000 so, be used for next step and insert text to be matched; If matchfre=100 indicates keyword all is inserted in the text so, at this moment need to read again matching times matchtimes parameter, promptly every keyword all will be inserted in the text, and inserting number of times is matchtimes time.For example matchtimes=2 is exactly that every keyword inserts the text random site 2 times;
(1.3) read the sigmasize parameter and obtain the character set size (should be 1~256) that produces keyword, read the size text that the textsizeM parameter obtains producing;
(1.4) by random text maker module generation text at random, as the data source that produces last band matched text, the random text size is to set size text textsizeM and will insert the poor of the total size of keyword;
(1.5) by the test text Senthesizer module by the extraction quantity of calculating, randomly draw keyword, then in executing text that (1.4) back produces randomly chosen position insert keyword, insert the keyword numbering of back record insertion and the position of inserting thereof, all keywords numbering and insertion position all are recorded in the filename of verifyfile parameter appointment (default value is toverify.dat), so that with the matching result contrast of matching algorithm output, the correctness of checking matching algorithm.
The configuration parameter example that produces text to be matched is as shown in table 3:
Table 3 produces the example of parameters table that text to be matched can dispose
Configuration item | Numerical value | The configuration item explanation |
??randseed | ??100 | Random seed is made as default value |
??sigmasize | ??256 | The character set size |
??beginASC | ??0 | Bebinning character ASCII character (sigmasize+beginASC must be less than or equal to 256) |
??Function | ??0 | Function=0 then generates keyword set and text to be matched simultaneously |
Configuration item | Numerical value | The configuration item explanation |
??patternnum | ??50000 | 50000 of keyword numbers |
??textsizeM | ??64 | Size text (64MB) |
??matchtimes | ??1 | Each keyword coupling of randomly drawing 1 time |
??matchfre | ??20 | The keyword number that coupling takes place accounts for 20% of total keyword number |
??textfile | ??Text.dat | The file of depositing of output text is called text.dat |
??patternfile | ??Pattern.cfg | The keyword set file that reads in is called pattern.cfg |
??verifyfile | ??Toverify.dat | The authenticating documents of output toverify.dat by name |
2.3 function Function item is 0, produces keyword set and text to be matched simultaneously, the filename of keyword set and text to be matched is given by patternfile and textfile parameter respectively, and the characteristic of the two is provided by other parameter item of configuration file.
The implementation of this step is exactly to carry out respectively 2.1 and 2.2 liang of steps, exports keyword set file, text to be matched and verification of correctness file at last.
The matching algorithm performance test stage, matching algorithm performance test modular model figure comprises three submodules and a general matching algorithm calling interface as shown in Figure 3: the general-purpose interface of keyword set pre-service performance test submodule, matching algorithm search phase performance test submodule, test result checking and data statistics submodule and replaceable matching algorithm.Test mainly comprises two stages: the evaluation and test of matching algorithm pretreatment stage and the evaluation and test of matching algorithm search phase.Detailed steps is described as follows:
1, obtains multi-keyword precise matching algorithm, keyword set file and the text to be matched that needs evaluation and test.
2, the evaluation and test of matching algorithm preprocessing part
(2.1) keyword set pre-service performance test submodule calls the pre-service interface of matching algorithm by general matching algorithm calling interface;
(2.2) with keyword set as input file, carry out to need the multi-pattern matching algorithm pretreatment stage of evaluation and test;
(2.3) the matching algorithm pretreatment stage complete after, generate the keyword related data structure, and the key message of statistic algorithm result, these information comprise pretreatment time, the maximum memory information that the data structure that keyword generates takies.
3, the evaluation and test of the search phase of matching algorithm
(3.1) matching algorithm search phase performance test submodule calls the search utility interface of matching algorithm to be measured by general matching algorithm calling interface;
(3.2) carry out the back data structure that generates of end as input with pretreatment stage, carry out the search utility of matching algorithm to be measured, treat the matched text file and scan;
(3.3) keyword numbering that occurs in the record text to be matched and the position that occurs in text are saved in these information in the output destination file, simultaneously the information of using in record searching time and the search procedure such as maximum memory.
4, verification search result and generation statistical report stage
Finish pretreatment stage and after the search phase, test result checking and statistical module compare expected results data message (file of verifyfile parameter appointment) and actual test result data as input, the correctness of verification algorithm, the performance information that matching algorithm pretreatment module and matching algorithm search module are produced is together as input then, verified and report of accessment and test is added up and exported to statistical module by test result.The content of report is as shown in table 4.
If 5 assess test to a plurality of multi-keyword precise matching algorithms respectively, and have formed the report of accessment and test as table 4 respectively,
The performance index report example that these test and appraisal of table 4 platform can produce
Each report of accessment and test can be input to test result checking and statistical module, produce the lateral comparison report of each performance index (comprising that matching algorithm pretreatment time, storage space take and matching speed).
Claims (6)
1. the performance test methods of a large-scale multi-keyword precise matching algorithm is characterized in that, described method comprises the steps:
S1: test data produces step, specifically comprises:
S11: generate keyword set at random;
S12: generate the random text data;
S13: keyword set is inserted in the text data, produces text to be matched;
S2: keyword set pre-service performance test step specifically comprises:
S21:, call the pre-service interface of matching algorithm by general matching algorithm calling interface;
S22: as input file, carry out and generate the keyword related data structure with keyword set, the key message of statistic algorithm result, described key message comprise the maximum memory information that the data structure of pretreatment time and keyword generation takies;
2. the performance test methods of large-scale multi-keyword precise matching algorithm as claimed in claim 1 is characterized in that, described method also comprises the steps:
S3: the search performance testing procedure of matching algorithm specifically comprises:
S31: general matching algorithm calling interface, call the search utility interface of matching algorithm to be measured;
S32: carry out the back data structure that generates of end as input with step S2, carry out the search utility of matching algorithm to be measured, treat the matched text file and scan;
S33: write down keyword numbering that occurs in the text to be matched and the position that in text, occurs, these information are saved in the output destination file, simultaneously the maximum memory information of using in record searching time and the search procedure;
3. the performance test methods of large-scale multi-keyword precise matching algorithm as claimed in claim 1 is characterized in that, described method also comprises the steps:
S4: verification search result and generation statistical report step specifically comprise:
After completing steps S2 and S3, expected results data message and actual test result data are compared as input, the correctness of verification algorithm, the performance information that step S2 and S3 are produced is together as importing then, adds up and outputs test result.
4. the Performance Test System of a large-scale multi-keyword precise matching algorithm is characterized in that, described system comprises as lower module:
F1: the test data generation module specifically comprises:
F11: keyword generates submodule at random, is used to generate keyword set at random;
F12: the random text data generate submodule, are used to generate the random text data;
F13: text generation submodule to be matched, be used for keyword set is inserted into text data, produce text to be matched;
F2: keyword set pre-service performance test module specifically comprises:
F21: matching algorithm pre-service interface interchange submodule is used for calling the pre-service interface of matching algorithm by general matching algorithm calling interface;
F22: detecting information generates submodule, be used for keyword set as input file, carry out and generate the keyword related data structure, the key message of statistic algorithm result, described key message comprise the maximum memory information that the data structure of pretreatment time and keyword generation takies;
5. the Performance Test System of large-scale multi-keyword precise matching algorithm as claimed in claim 4 is characterized in that, described system also comprises as lower module:
F3: the search performance test module of matching algorithm specifically comprises:
F31: matching algorithm search utility interface interchange submodule is used for calling the search utility interface of matching algorithm to be measured by general matching algorithm calling interface;
F32: search utility scanning submodule, be used for the data structure that finishes the back generation through module F2 processing carrying out the search utility of matching algorithm to be measured as input, treat the matched text file and scan;
F33: detecting information generates submodule, be used for writing down keyword numbering that text to be matched occurs and the position that in text, occurs, these information are saved in the output destination file, simultaneously the maximum memory information of using in record searching time and the search procedure;
6. the Performance Test System of large-scale multi-keyword precise matching algorithm as claimed in claim 5 is characterized in that, described system also comprises as lower module:
F4: verification search result and generation statistical report module specifically comprise:
Statistics generates submodule, be used for after the processing of module F2 and F3, expected results data message and actual test result data are compared as input, the correctness of verification algorithm, the performance information that produces after the processing of module F2 and F3 is together as input then, adds up and outputs test result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102368178A CN101714166B (en) | 2009-10-30 | 2009-10-30 | Method and system for testing performance of large-scale multi-keyword precise matching algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102368178A CN101714166B (en) | 2009-10-30 | 2009-10-30 | Method and system for testing performance of large-scale multi-keyword precise matching algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101714166A true CN101714166A (en) | 2010-05-26 |
CN101714166B CN101714166B (en) | 2011-12-28 |
Family
ID=42417812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009102368178A Active CN101714166B (en) | 2009-10-30 | 2009-10-30 | Method and system for testing performance of large-scale multi-keyword precise matching algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101714166B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103198160A (en) * | 2013-04-28 | 2013-07-10 | 南京安讯科技有限责任公司 | Keyword combination matching method |
CN104317888A (en) * | 2014-10-23 | 2015-01-28 | 电信科学技术第十研究所 | Text retrieval test data generation method |
CN105760292A (en) * | 2014-12-18 | 2016-07-13 | 阿里巴巴集团控股有限公司 | Assertion verification method and device for unit testing |
CN108983135A (en) * | 2018-05-17 | 2018-12-11 | 济南置真电气有限公司 | A method of for verifying low-current ground fault line selection algorithm accuracy rate |
CN109213921A (en) * | 2017-06-29 | 2019-01-15 | 广州涌智信息科技有限公司 | A kind of searching method and device of merchandise news |
CN112199297A (en) * | 2020-10-30 | 2021-01-08 | 久瓴(江苏)数字智能科技有限公司 | Data testing method and device, nonvolatile storage medium and processor |
CN112580345A (en) * | 2020-12-28 | 2021-03-30 | 成都网安科技发展有限公司 | Text recognition method and device based on regular matching and electronic equipment |
CN113157722A (en) * | 2021-04-01 | 2021-07-23 | 北京达佳互联信息技术有限公司 | Data processing method, device, server, system and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100452055C (en) * | 2007-04-13 | 2009-01-14 | 清华大学 | Large-scale and multi-key word matching method for text or network content analysis |
-
2009
- 2009-10-30 CN CN2009102368178A patent/CN101714166B/en active Active
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103198160A (en) * | 2013-04-28 | 2013-07-10 | 南京安讯科技有限责任公司 | Keyword combination matching method |
CN103198160B (en) * | 2013-04-28 | 2017-02-22 | 南京安讯科技有限责任公司 | Keyword combination matching method |
CN104317888A (en) * | 2014-10-23 | 2015-01-28 | 电信科学技术第十研究所 | Text retrieval test data generation method |
CN104317888B (en) * | 2014-10-23 | 2018-04-27 | 电信科学技术第十研究所 | A kind of full-text search test data generating method |
CN105760292A (en) * | 2014-12-18 | 2016-07-13 | 阿里巴巴集团控股有限公司 | Assertion verification method and device for unit testing |
CN105760292B (en) * | 2014-12-18 | 2019-01-08 | 阿里巴巴集团控股有限公司 | A kind of assertion verification method and apparatus for unit testing |
CN109213921A (en) * | 2017-06-29 | 2019-01-15 | 广州涌智信息科技有限公司 | A kind of searching method and device of merchandise news |
CN108983135A (en) * | 2018-05-17 | 2018-12-11 | 济南置真电气有限公司 | A method of for verifying low-current ground fault line selection algorithm accuracy rate |
CN112199297A (en) * | 2020-10-30 | 2021-01-08 | 久瓴(江苏)数字智能科技有限公司 | Data testing method and device, nonvolatile storage medium and processor |
CN112580345A (en) * | 2020-12-28 | 2021-03-30 | 成都网安科技发展有限公司 | Text recognition method and device based on regular matching and electronic equipment |
CN113157722A (en) * | 2021-04-01 | 2021-07-23 | 北京达佳互联信息技术有限公司 | Data processing method, device, server, system and storage medium |
CN113157722B (en) * | 2021-04-01 | 2023-12-26 | 北京达佳互联信息技术有限公司 | Data processing method, device, server, system and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN101714166B (en) | 2011-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101714166B (en) | Method and system for testing performance of large-scale multi-keyword precise matching algorithm | |
Sarıyüce et al. | Peeling bipartite networks for dense subgraph discovery | |
LeFevre et al. | Grass: Graph structure summarization | |
Shi et al. | Citing for high impact | |
Liu et al. | Research on anti-money laundering based on core decision tree algorithm | |
Liu | Study on application of apriori algorithm in data mining | |
Wang et al. | Clan: An algorithm for mining closed cliques from large dense graph databases | |
CN103927398A (en) | Microblog hype group discovering method based on maximum frequent item set mining | |
US20220279045A1 (en) | Global iterative clustering algorithm to model entities' behaviors and detect anomalies | |
CN110414277B (en) | Gate-level hardware Trojan horse detection method based on multi-feature parameters | |
CN105653554A (en) | File data comparison method and system | |
CN109492219A (en) | A kind of swindle website identification method analyzed based on tagsort and emotional semantic | |
Shi et al. | On selection of objective functions in multi-objective community detection | |
CN112001170A (en) | Method and system for recognizing deformed sensitive words | |
CN102722610A (en) | Method and device for automatically generating coverage rate codes by flow chart | |
CN114239083A (en) | Efficient state register identification method based on graph neural network | |
CN104965846B (en) | Visual human's method for building up in MapReduce platform | |
CN109413047A (en) | Determination method, system, server and the storage medium of Behavior modeling | |
Zhao et al. | Density-based clustering method for hardware trojan detection based on gate-level structural features | |
Wang et al. | Utility-oriented k-anonymization on social networks | |
Jian et al. | Suff: accelerating subgraph matching with historical data | |
Zhu et al. | Making smart contract classification easier and more effective | |
CN104714947A (en) | Preset type number recognition method and device | |
CN113158206A (en) | Document security level dividing method based on decision tree | |
Guo et al. | HUITWU: An efficient algorithm for high-utility itemset mining in transaction databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |