Summary of the invention
In view of the above problems, it is proposed that the present invention in case provide one overcome the problems referred to above or at least in part solve on
State searching method and the device of the sample data of problem.
According to one aspect of the present invention, it is provided that the searching method of a kind of sample data, the method includes:
Sample data is collected from each data source;
Formatting each sample data collected, each sample data after formatting stores sample database
In;
Receive the search word that client sends, this search word is converted to querying condition;
The sample data meeting described querying condition is searched from described sample database;
The sample data found is back to client be shown.
Alternatively, described from each data source collect sample data include:
Reptile is utilized to crawl sample data from each data source;
And/or,
Utilize reptile to crawl daily record from each data source, utilize distributed document to process framework batch and resolve each data source
Daily record, from the daily record of each data source obtain sample data.
Alternatively, described each sample data to collecting formats and includes: each sample data that will collect
It is converted into the sample data of specified format;
Described this search word is converted to querying condition includes: this search word is converted to the querying condition of specified format.
Alternatively, the described sample data that each sample data collected is converted into specified format includes:
For each sample data,
Extract from this sample data and meet pre-conditioned field;
For each field extracted, from this sample data, extract the value of this field, by this field and this field
Value form the two dimensional character that this field is corresponding;
Obtain after the characteristic set of two dimensional character composition corresponding for each field extracted is changed as this sample data
The sample data of specified format.
Alternatively, the described querying condition that this search word is converted to specified format includes:
For each search word, extracting and meet pre-conditioned field from this search word, extracting from this search word should
The value of field, the appointment obtained after the two dimensional character being made up of the value of this field He this field is changed as this search word
The querying condition of form.
Alternatively, the method farther includes: set up tagged word phase library, and described tagged word phase library includes multiple tagged word
Section;
Described extraction from this sample data meets pre-conditioned field and includes: according to described tagged word phase library, traversal
The field that this sample data is comprised, extracts the field hitting described tagged word phase library;
Described extraction from this search word meets pre-conditioned field and includes: according to described tagged word phase library, traversal should
The field that search word is comprised, extracts the field hitting described tagged word phase library.
Alternatively, described tagged word phase library includes one or more feature field following:
Represent that data creation crosses the field of appointment process, represent the packet field containing macrodoce, represent data access mistake
The field of appointed website, the field representing addresses of items of mail, the field representing domain name, the field of expression IP address, expression URL address
Field.
Alternatively, the method farther includes:
Every prefixed time interval, again collect feature field and add in described tagged word phase library, to described tagged word
Phase library is updated;
After described property data base is updated, re-execute what described each sample data to collecting formatted
Operation.
Alternatively, described lookup from described sample database meets the sample data of described querying condition and includes:
Travel through each sample data in described sample database;
For each sample data, travel through the two dimensional character comprised in this sample data, if there is with described inquiry bar
The two dimensional character that two dimensional character in part is identical, determines that this sample data meets described querying condition.
Alternatively, described sample database includes: distributed document processes the distributed file system in framework.
Alternatively, the sample data found being back to before client is shown described, the method is further
Including:
Obtain the data form adapting to client;
The form of the sample data found is converted to adapt to the data form of client;
The most described client that the sample data found is back to is shown and includes: by that find and through form
The sample data of conversion is back to client and is shown.
According to another aspect of the present invention, it is provided that the searcher of a kind of sample data, this device includes:
Sample Data Collection unit, is suitable to collect sample data from each data source;
Sample data processing unit, each sample data being suitable to collect described Sample Data Collection unit carries out form
Changing, each sample data after formatting stores in sample database;
Search interactive unit, is suitable to receive the search word that client sends, this search word is converted to querying condition concurrent
Give search query unit;
Described search query unit, is suitable to search the sample data meeting described querying condition from described sample database
And return to described search interactive unit;
Described search interactive unit, is suitable to that the sample data that described search query unit finds is back to client and enters
Row is shown.
Alternatively, described Sample Data Collection unit, be suitable to utilize reptile to crawl sample data from each data source;With/
Or, utilize reptile to crawl daily record from each data source, utilize distributed document to process framework batch and resolve the day of each data source
Will, obtains sample data from the daily record of each data source.
Alternatively, described sample data processing unit, be suitable to be converted into each sample data collected specifying lattice
The sample data of formula;
Described search interactive unit, is suitable to be converted to the search word received the querying condition of specified format.
Alternatively, described sample data processing unit, be suitable to for each sample data, from this sample data, extract symbol
Close pre-conditioned field;For each field extracted, from this sample data, extract the value of this field, by this field
The two dimensional character that this field is corresponding is formed with the value of this field;Spy by two dimensional character composition corresponding for each field extracted
The sample data of the specified format that collection cooperation obtains after changing for this sample data.
Alternatively, described search interactive unit, be suitable to for each search word, extract from this search word and meet default bar
The field of part, extracts the value of this field from this search word, the two dimensional character being made up of the value of this field He this field
The querying condition of the specified format obtained after changing as this search word.
Alternatively, this device farther includes: tagged word phase library sets up unit;
Described tagged word phase library sets up unit, is adapted to set up tagged word phase library, and described tagged word phase library includes multiple spy
Levy field;
Described sample data processing unit, is suitable to, according to described tagged word phase library, travel through the word that this sample data is comprised
Section, extracts the field hitting described tagged word phase library;
Described search interactive unit, is suitable to according to described tagged word phase library, travels through the field that this search word is comprised, will life
Described in the field of tagged word phase library extract.
Alternatively, described tagged word phase library includes one or more feature field following:
Represent that data creation crosses the field of appointment process, represent the packet field containing macrodoce, represent data access mistake
The field of appointed website, the field representing addresses of items of mail, the field representing domain name, the field of expression IP address, expression URL address
Field.
Alternatively, described tagged word phase library sets up unit, is further adapted for every prefixed time interval, again collects feature
Field is added in described tagged word phase library, is updated described tagged word phase library;
Described sample data processing unit, is further adapted for after described property data base is updated, and re-executes described
The operation that each sample data collected is formatted.
Alternatively, described search query unit, be suitable to each sample data traveling through in described sample database;For each
Sample data, travels through the two dimensional character comprised in this sample data, if there is with the two dimensional character phase in described querying condition
Same two dimensional character, determines that this sample data meets described querying condition.
Alternatively, described sample database includes: distributed document processes the distributed file system in framework.
Alternatively, described search interactive unit, it is further adapted for, described, the sample data found is back to client
Before end is shown, obtain the data form adapting to client;The form of the sample data found is converted to adaptation
Data form in client;Sample data that is that find and that change through form is back to client be shown.
According to technical scheme, by the sample data unified integration of each data source to sample database,
During scanning for, the search word that client sends is converted to querying condition, according to this querying condition from sample data
Storehouse is searched sample data, the sample data found is returned to client as Search Results and shows.According to this scheme,
The sample data of each data source is integrated into consolidation form, not by the data form of each data source self in search procedure
Limit, it is achieved that disposably meet the unified search interface of the sample data of querying condition from the search of each data source, it is possible to big
Width improves search efficiency, reduces search cost of labor;And querying condition is based on the search that client sends in search procedure
Word generates, and this querying condition can be the search to common search characteristics, it is also possible to be to the search not pre-defining field,
The search procedure making sample data has suitable motility, meets different types of search need.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of description, and in order to allow above and other objects of the present invention, the feature and advantage can
Become apparent, below especially exemplified by the detailed description of the invention of the present invention.
Detailed description of the invention
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although accompanying drawing shows the disclosure
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should be by embodiments set forth here
Limited.On the contrary, it is provided that these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
Fig. 1 shows the flow chart of the searching method of a kind of sample data.Such as Fig. 1 institute
Showing, the method includes:
Step S110, collects sample data from each data source.
Step S120, formats each sample data collected, and each sample data after formatting stores
In sample database.
For the sample data in different pieces of information source, each own respective data form, do not support unified search
Journey, to this end, the sample data of different data format is formatted by this step, has unified the form of sample data, it is simple to after
The expansion of continuous search procedure.
Step S130, receives the search word that client sends, this search word is converted to querying condition.
Step S140, searches the sample data meeting described querying condition from described sample database.
Step S150, is back to client by the sample data found and is shown.
Visible, the method shown in Fig. 1, by the sample data unified integration of each data source to sample database, is being carried out
During search, the search word that client sends is converted to querying condition, according to this querying condition from sample database
Search sample data, the sample data found is returned to client as Search Results and shows.According to this scheme, will be each
The sample data of individual data source is integrated into consolidation form, not by the limiting of data form of each data source self in search procedure
System, it is achieved that disposably meet the unified search interface of the sample data of querying condition from the search of each data source, it is possible to significantly
Improve search efficiency, reduce search cost of labor;And querying condition is based on the search word that client sends in search procedure
Generating, this querying condition can be the search to common search characteristics, it is also possible to is to the search not pre-defining field, makes
The search procedure obtaining sample data has suitable motility, meets different types of search need.
In one embodiment of the invention, step S110 collects sample data from each data source to include: utilize and climb
Worm crawls sample data from each data source;And/or, utilize reptile to crawl daily record from each data source, utilize distributed document
Process framework batch and resolve the daily record of each data source, from the daily record of each data source, obtain sample data.Such as, utilization is climbed
Worm crawls sample data from appointed website, utilizes reptile to crawl sample number from specified file (such as the body matter of a report)
According to, it is also possible to crawl, from data source, the daily record that this data source is corresponding first with reptile, utilize Hadoop framework batch to resolve daily record,
Obtaining the sample data of data source, the concrete form of data source does not limits, as long as text envelope valuable to search procedure
Breath, originates regardless of it, can be carried out collecting.
In one embodiment of the invention, sample database includes: distributed document processes the distributed literary composition in framework
Part system such as HDFS, or memory database such as Redis data base.
In one embodiment of the invention, each sample data collected formatted by step S120 include:
The each sample data collected is converted into the sample data of specified format, and concrete transformation process may is that for often
Individual sample data, extracts from this sample data and meets pre-conditioned field;For each field extracted, from this sample
The value of this field of extracting data, is formed, with the value of this field, the two dimensional character that this field is corresponding by this field;To extract
The sample of the specified format that the characteristic set of the two dimensional character composition that each field of going out is corresponding obtains after changing as this sample data
Notebook data.Such as, for a sample data, the pre-conditioned field that meets extracted from this sample data includes: word
Section a and field b, the value extracting field a from this sample data is " true ", and the two dimension that field a and its value are constituted is special
Levy the metadata of (field a, true) substantially key-value form, in the case of one, when field b is the parent word of field a
Duan Shi, comprises the metadata that field a is constituted with its value, if the value of field b is " field a:true in the value of field b;Word
Section c:10010 ", two dimensional character (field b, field a:true that field b is constituted with its value;Field c:10010) substantially same
Sample is the metadata of key-value form, the two dimensional character that field a is constituted with its value and field b and the two of its value composition
Characteristic set { (field a, true), (field b, field a:true of dimensional feature composition;Field c:10010) } in, it can be seen that
(field b, field a:true;Field c:10010) in be included (field a, true), in this case, the most permissible
Characteristic set is directly simplified to (field b, field a:true;Field c:10010) form is as the specified format being converted to
Sample data;In the case of another kind, when field b and field a are coordinations, the value phase of the value of field b and field a
Mutually independent, as the value of field b be "www.microsoft.com", two dimensional character that field b and its value are constituted (field b,www.microsoft.com) it is the metadata of key-value form equally, two dimensional character that field a is constituted with its value and word
The characteristic set of the two dimensional character composition that section b and its value are constituted (field a, true), (field b,www.microsoft.com) as the sample data of the specified format being converted to.It can be seen that extract from sample data
The difference of the relation between the field gone out, causes the sample data of the specified format being finally converted to be slightly different, but essence
All it is made up of metadata, it is simple to carrying out of follow-up unified search process.
Correspondingly, then this search word is converted to querying condition by step S130 include: be converted to this search word specify
The querying condition of form, concrete transformation process may is that for each search word, extracts and meet default bar from this search word
The field of part, extracts the value of this field from this search word, the two dimensional character being made up of the value of this field He this field
The querying condition of the specified format obtained after changing as this search word.Such as, extract from a search word meet default
The field of condition can be one or more, when extracting multiple field, such as: " type " and " attachment ", and extracts
The value going out " type " is " Email ", and the two dimensional character of composition is (type, Email), for the metadata of key-value form,
The value of " attachment " is " true ", and the two dimensional character of composition is (attachment, true), is key-value equally
The metadata of form, this search word conversion after the specified format obtained querying condition for (type, Email),
(attachment, true) }, it can be seen that this querying condition reflects that being intended to search meets the bar of " with the mail of adnexa "
The sample data of part.
Through above-described embodiment to step S120 and the explanation of step S130, the sample data consolidation form that will collect
For the sample data of specified format, the search word received from client is converted to the querying condition of specified format, described appointment
The sample data of form is corresponding with the querying condition of described specified format, and the querying condition according to specified format can be straight
Connect and check whether the sample data of specified format meets search need, i.e. step S140 is searched from sample database and meet institute
The sample data stating querying condition includes: travel through each sample data in described sample database;For each sample data, time
Go through the two dimensional character comprised in this sample data, special if there is the two dimension identical with the two dimensional character in described querying condition
Levy, determine that this sample data meets described querying condition.In one embodiment of the invention, include multiple when querying condition
Two dimensional character, example query condition as mentioned in the above is { (type, Email), (attachment, true) }, then at traversal sample
The when of database, for each sample data, check in this sample data and whether include (type, Email), be then, then
Check in this sample data and whether comprise (attachment, true), be then, determine that this sample data meets querying condition, instead
It, do not meet querying condition.
Further, the method shown in Fig. 1 also includes: set up tagged word phase library, and this feature field storehouse includes multiple spy
Levy field;The most above-mentioned extraction from this sample data meets pre-conditioned field and includes: according to described tagged word phase library, traversal
The field that this sample data is comprised, extracts the field hitting described tagged word phase library.And it is above-mentioned from this search word
Middle extraction meets pre-conditioned field and includes: according to described tagged word phase library, travels through the field that this search word is comprised, will life
Described in the field of tagged word phase library extract.Wherein, the feature field that this feature field storehouse includes can be common
Curing characteristic field, as represented the field of file cryptographic Hash, representing the field of file name, the expression field of addresses of items of mail, table
The field showing domain name, the field etc. representing URL address indicate the field of sample data essential information, it is also possible to be to have quite spirit
Activity uncured feature field, as represent data creation cross appointment process field, represent packet containing macrodoce field,
Represent that data access is crossed the field of appointed website, represented that the field being designated antivirus engine and reporting an error, expression accessed appointment dynamically
The information characteristics of the multiple description sample data such as the field of domain name and/or the field of the behavior characteristics of description sample data.
According to the present embodiment, it is possible to it is envisioned that the scope of tagged word phase library can directly influence the search that this programme provides
Whether process can farthest meet the search need of client, it is therefore desirable to expand tagged word phase library in time
And renewal.Then this programme farther includes: every prefixed time interval, again collects feature field and adds described feature field to
In storehouse, described tagged word phase library is updated;After property data base is updated, re-execute the described various kinds to collecting
The operation that notebook data formats.Specifically, the present embodiment can be by getting the spy of renewal to the collection of search word
Levy field.
The implementation process of this programme is described with a specific example, and in this example, user wants to search for all micro-
Soft antivirus engine reports the sample data of Locky, and the search word receiving client transmission is " scans Microsoft
Result Locky ", this search word is converted to querying condition is: (result, Locky) or (scans Microsoft
Result, Locky), wherein Microsoft is the parent field of result, and scans is the parent field of Microsoft, and this is looked into
The meaning of inquiry condition is: result field under search scans field, under Microsoft field, that value is Locky;Root
According to this querying condition, sample database is traveled through, check whether each sample data meets this querying condition.Specify lattice for one
The content of the sample data of formula is:
By traveling through this sample data it is recognised that this sample data comprises " result ": " RansomWin32/
Locky.lrfn ", i.e. comprise (result, Locky), and the parent of this result field is Microsoft field,
The parent of Microsoft field is scans field, i.e. under scans field, under Microsoft field, value is Locky
Result field, meet querying condition, this sample data is returned to client, and client takes this sample according to querying condition
Other all information associated by notebook data, as the cryptographic Hash " md5 " of this sample data, download address " download_url ",
The value of each field under under scans field, Microsoft field, under scans field, SUPERAntispyware word
The value of each field under Duan, meets search need.
Specifically, in one embodiment of the invention, open up the sample data found is back to client
Before showing, this programme farther includes: obtain the data form adapting to client;The form of the sample data found is turned
It is changed to adapt to the data form of client;Then the sample data found is back to client and is shown bag by step S150
Include: sample data that is that find and that change through form is back to client and is shown.
In another example, user wants to search for all sample datas that have accessed f3322.org DDNS, receives
To client send search word be " Domain f3322.org ", be converted to specified format querying condition be (Domain,
F3322.org), the meaning of this querying condition is: search value is the domain field of f3322.org;According to this querying condition
Sample database is traveled through, checks whether each sample data meets this querying condition.The sample data of one specified format
Content be:
By traveling through this sample data it is recognised that this sample data comprises " Domain ":
" ssxx33.f3322.org*218.244.134.107*112.213.125.52* ", i.e. comprises (Domain, f3322.org),
I.e. value is the domain field of f3322.org, meets querying condition, and this sample data is returned to client, client root
Other all information associated by this sample data are taken, such as cryptographic Hash " md5 ", the type of this sample data according to querying condition
The value of the fields such as " type ", uplink time " up_load time ", meets search need.
In other examples, it is also possible to sample data is specified in search, comprises url field, and this url word in this sample data
The value of section comprises " 115.239.230.228 " field (only illustrating, do not limit);Can also search for specifying sample
Data, comprise in this sample data that to comprise " LSQZA.swf " field in url field, and the value of this url field (the most for example
Illustrate, do not limit) etc.;The user of client using to a sample data any one in terms of description information as inquiry bar
Part, will obtain meeting all related informations of the sample data of this querying condition, and search procedure is the most convenient effectively, searches for dimension
Extensively, it is possible to meet the search need of multi-form.
Wherein, user, when client scans for input, can be inputted by the form of Page Template, it is also possible to
Input is scanned for by other forms.
Fig. 2 shows the schematic diagram of the searcher of a kind of sample data.Such as Fig. 2 institute
Showing, the searcher 200 of this sample data includes:
Sample Data Collection unit 210, is suitable to collect sample data from each data source.
Sample data processing unit 220, each sample data being suitable to collect described Sample Data Collection unit is carried out
Formatting, each sample data after formatting stores in sample database.
Search interactive unit 240, is suitable to receive the search word that client sends, this search word is converted to querying condition also
It is sent to search query unit 230.
Search query unit 230, is suitable to search from sample database meet the sample data of querying condition and return to
Search interactive unit 240.
Search interactive unit 240, is further adapted for that the sample data that search query unit 230 finds is back to client and enters
Row is shown.
Visible, the device shown in Fig. 2 is cooperated, by the sample data unified integration of each data source by each unit
In sample database, during scanning for, the search word that client sends is converted to querying condition, looks into according to this
Inquiry condition searches sample data from sample database, and the sample data found is returned to client also as Search Results
Show.According to this scheme, the sample data of each data source is integrated into consolidation form, not by each data source in search procedure
The restriction of the data form of self, it is achieved that disposably meet the unification of the sample data of querying condition from the search of each data source
Searching interface, it is possible to search efficiency is greatly improved, reduces search cost of labor;And querying condition is based on visitor in search procedure
The search word that family end sends generates, and this querying condition can be the search to common search characteristics, it is also possible to be to the most in advance
The search of definition field so that the search procedure of sample data has suitable motility, meets different types of search need.
In one embodiment of the invention, Sample Data Collection unit 210, be suitable to utilize reptile to climb from each data source
Sampling notebook data;And/or, utilize reptile to crawl daily record from each data source, utilize distributed document to process framework batch and resolve
The daily record of each data source, obtains sample data from the daily record of each data source.
In one embodiment of the invention, sample database includes: distributed document processes the distributed literary composition in framework
Part system.
In one embodiment of the invention, search for interactive unit 240, be further adapted in the sample data that will find
It is back to before client is shown, obtain the data form adapting to client;The form of sample data that will find
Be converted to adapt to the data form of client;Sample data that is that find and that change through form is back to client enter
Row is shown.
In one embodiment of the invention, sample data processing unit 220, be suitable to each sample data that will collect
It is converted into the sample data of specified format.Search interactive unit 240, is suitable to the search word received is converted to specified format
Querying condition.
In one embodiment of the invention, sample data processing unit 220, be suitable to for each sample data, from this
Sample data is extracted and meets pre-conditioned field;For each field extracted, from this sample data, extract this word
The value of section, is formed, with the value of this field, the two dimensional character that this field is corresponding by this field;The each field correspondence that will extract
The characteristic set of two dimensional character composition change as this sample data after the sample data of specified format that obtains.
Then search interactive unit 240, is suitable to for each search word, extracts and meet pre-conditioned word from this search word
Section, extracts the value of this field from this search word, using the two dimensional character that is made up of the value of this field He this field as this
The querying condition of the specified format obtained after search word conversion.
Then search query unit 230, are suitable to travel through each sample data in sample database;For each sample data,
Travel through the two dimensional character comprised in this sample data, special if there is the two dimension identical with the two dimensional character in described querying condition
Levy, determine that this sample data meets described querying condition.
Fig. 3 shows the schematic diagram of the searcher of a kind of sample data.This sample
The searcher 300 of notebook data includes: Sample Data Collection unit 310, sample data processing unit 320, search interactive unit
340, search query unit 330 and tagged word phase library set up unit 350.
Wherein, Sample Data Collection unit 310, sample data processing unit 320, search interactive unit 340, search inquiry
Unit 330 has and the Sample Data Collection unit 210 shown in Fig. 2, sample data processing unit 220, search interactive unit
240, the corresponding identical function of search query unit 230, identical part does not repeats them here.
Tagged word phase library sets up unit 350, is adapted to set up tagged word phase library, and this feature field storehouse includes multiple tagged word
Section.
Sample data processing unit 320, is suitable to, for each sample data, according to tagged word phase library, travel through this sample number
According to the field comprised, the field of hit tagged word phase library is extracted;For each field extracted, from this sample number
According to the value of middle this field of extraction, this field form, with the value of this field, the two dimensional character that this field is corresponding;To extract
The characteristic set of two dimensional character composition corresponding to each field change as this sample data after the sample of specified format that obtains
Data.
Search interactive unit 340, is suitable to for each search word, according to tagged word phase library, travels through this search word and comprised
Field, by hit tagged word phase library field extract, from this search word, extract the value of this field, will be by this field
The querying condition of the specified format obtained after changing as this search word with the two dimensional character of the value composition of this field.
Specifically, tagged word phase library includes one or more feature field following: represent that data creation crosses appointment process
Field, represent that packet, containing the field of macrodoce, represents that data access crosses the field of appointed website, represents the word of addresses of items of mail
Section, the field of expression domain name, the field of expression IP address, the field of expression URL address.
Further, tagged word phase library sets up unit 350, is further adapted for every prefixed time interval, again collects spy
Levy field to add in described tagged word phase library, tagged word phase library is updated;Then sample data processing unit 320, enters one
Step is suitable to after property data base is updated, and re-executes the operation formatting each sample data collected.
It should be noted that the corresponding phase of each embodiment of each embodiment of Fig. 2-Fig. 3 shown device and method shown in Fig. 1
With, the most it is described in detail, does not repeats them here.
In sum, the present invention provide technical scheme by the sample data unified integration of each data source to sample data
In storehouse, during scanning for, the search word that client sends is converted to querying condition, according to this querying condition from sample
Database is searched sample data, the sample data found is returned to client as Search Results and shows.Foundation
This scheme, is integrated into consolidation form by the sample data of each data source, not by the number of each data source self in search procedure
Restriction according to form, it is achieved that the unified search of the sample data disposably meeting querying condition from the search of each data source connects
Mouthful, it is possible to search efficiency is greatly improved, reduces search cost of labor;And querying condition is based on client and sends out in search procedure
The search word sent generates, and this querying condition can be the search to common search characteristics, it is also possible to be to not pre-defining word
The search of section so that the search procedure of sample data has suitable motility, meets different types of search need.
It should be understood that
Algorithm and display are not intrinsic to any certain computer, virtual bench or miscellaneous equipment relevant provided herein.
Various fexible units can also be used together with based on teaching in this.As described above, construct required by this kind of device
Structure be apparent from.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use various
Programming language realizes the content of invention described herein, and the description done language-specific above is to disclose this
Bright preferred forms.
In description mentioned herein, illustrate a large amount of detail.It is to be appreciated, however, that the enforcement of the present invention
Example can be put into practice in the case of not having these details.In some instances, it is not shown specifically known method, structure
And technology, in order to do not obscure the understanding of this description.
Similarly, it will be appreciated that one or more in order to simplify that the disclosure helping understands in each inventive aspect, exist
Above in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes
In example, figure or descriptions thereof.But, the method for the disclosure should not be construed to reflect an intention that i.e. required guarantor
The application claims feature more more than the feature being expressly recited in each claim protected.More precisely, as following
Claims reflected as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
The claims following detailed description of the invention are thus expressly incorporated in this detailed description of the invention, the most each claim itself
All as the independent embodiment of the present invention.
Those skilled in the art are appreciated that and can carry out the module in the equipment in embodiment adaptively
Change and they are arranged in one or more equipment different from this embodiment.Can be the module in embodiment or list
Unit or assembly are combined into a module or unit or assembly, and can put them in addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit excludes each other, can use any
Combine all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed appoint
Where method or all processes of equipment or unit are combined.Unless expressly stated otherwise, this specification (includes adjoint power
Profit requires, summary and accompanying drawing) disclosed in each feature can be carried out generation by providing identical, equivalent or the alternative features of similar purpose
Replace.
Although additionally, it will be appreciated by those of skill in the art that embodiments more described herein include other embodiments
Some feature included by rather than further feature, but the combination of the feature of different embodiment means to be in the present invention's
Within the scope of and form different embodiments.Such as, in the following claims, embodiment required for protection appoint
One of meaning can mode use in any combination.
The all parts embodiment of the present invention can realize with hardware, or to run on one or more processor
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that and can use in practice
Microprocessor or digital signal processor (DSP) realize in the searcher of sample data according to embodiments of the present invention
The some or all functions of some or all parts.The present invention is also implemented as performing method as described herein
Part or all equipment or device program (such as, computer program and computer program).Such reality
The program of the existing present invention can store on a computer-readable medium, or can be to have the form of one or more signal.
Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or with any other form
There is provided.
The present invention will be described rather than limits the invention to it should be noted above-described embodiment, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference marks that should not will be located between bracket is configured to limitations on claims.Word " comprises " and does not excludes the presence of not
Arrange element in the claims or step.Word "a" or "an" before being positioned at element does not excludes the presence of multiple such
Element.The present invention and can come real by means of including the hardware of some different elements by means of properly programmed computer
Existing.If in the unit claim listing equipment for drying, several in these devices can be by same hardware branch
Specifically embody.Word first, second and third use do not indicate that any order.These word explanations can be run after fame
Claim.
The invention discloses A1, the searching method of a kind of sample data, wherein, the method includes:
Sample data is collected from each data source;
Formatting each sample data collected, each sample data after formatting stores sample database
In;
Receive the search word that client sends, this search word is converted to querying condition;
The sample data meeting described querying condition is searched from described sample database;
The sample data found is back to client be shown.
A2, method as described in A1, wherein, described collect sample data from each data source and include:
Reptile is utilized to crawl sample data from each data source;
And/or,
Utilize reptile to crawl daily record from each data source, utilize distributed document to process framework batch and resolve each data source
Daily record, from the daily record of each data source obtain sample data.
A3, method as described in A1, wherein,
Described each sample data to collecting formats and includes: each sample data collected be converted into
The sample data of specified format;
Described this search word is converted to querying condition includes: this search word is converted to the querying condition of specified format.
A4, method as described in A3, wherein, the described sample that each sample data collected is converted into specified format
Notebook data includes:
For each sample data,
Extract from this sample data and meet pre-conditioned field;
For each field extracted, from this sample data, extract the value of this field, by this field and this field
Value form the two dimensional character that this field is corresponding;
Obtain after the characteristic set of two dimensional character composition corresponding for each field extracted is changed as this sample data
The sample data of specified format.
A5, method as described in A4, wherein, the described querying condition that this search word is converted to specified format includes:
For each search word, extracting and meet pre-conditioned field from this search word, extracting from this search word should
The value of field, the appointment obtained after the two dimensional character being made up of the value of this field He this field is changed as this search word
The querying condition of form.
A6, method as described in A4 or A5, wherein,
The method farther includes: set up tagged word phase library, and described tagged word phase library includes multiple feature field;
Described extraction from this sample data meets pre-conditioned field and includes: according to described tagged word phase library, traversal
The field that this sample data is comprised, extracts the field hitting described tagged word phase library;
Described extraction from this search word meets pre-conditioned field and includes: according to described tagged word phase library, traversal should
The field that search word is comprised, extracts the field hitting described tagged word phase library.
A7, method as described in A6, wherein, described tagged word phase library includes one or more feature field following:
Represent that data creation crosses the field of appointment process, represent the packet field containing macrodoce, represent data access mistake
The field of appointed website, the field representing addresses of items of mail, the field representing domain name, the field of expression IP address, expression URL address
Field.
A8, method as described in A6, wherein, the method farther includes:
Every prefixed time interval, again collect feature field and add in described tagged word phase library, to described tagged word
Phase library is updated;
After described property data base is updated, re-execute what described each sample data to collecting formatted
Operation.
A9, method as described in A5, wherein, described search the sample meeting described querying condition from described sample database
Notebook data includes:
Travel through each sample data in described sample database;
For each sample data, travel through the two dimensional character comprised in this sample data, if there is with described inquiry bar
The two dimensional character that two dimensional character in part is identical, determines that this sample data meets described querying condition.
A10, method as described in A1, wherein,
Described sample database includes: distributed document processes the distributed file system in framework.
A11, method as described in A1, wherein,
The sample data found being back to before client is shown described, the method farther includes:
Obtain the data form adapting to client;
The form of the sample data found is converted to adapt to the data form of client;
The most described client that the sample data found is back to is shown and includes: by that find and through form
The sample data of conversion is back to client and is shown.
The invention also discloses B12, the searcher of a kind of sample data, wherein, this device includes:
Sample Data Collection unit, is suitable to collect sample data from each data source;
Sample data processing unit, each sample data being suitable to collect described Sample Data Collection unit carries out form
Changing, each sample data after formatting stores in sample database;
Search interactive unit, is suitable to receive the search word that client sends, this search word is converted to querying condition concurrent
Give search query unit;
Described search query unit, is suitable to search the sample data meeting described querying condition from described sample database
And return to described search interactive unit;
Described search interactive unit, is suitable to that the sample data that described search query unit finds is back to client and enters
Row is shown.
B13, device as described in B12, wherein,
Described Sample Data Collection unit, is suitable to utilize reptile to crawl sample data from each data source;And/or, utilize
Reptile crawls daily record from each data source, utilizes distributed document to process framework batch and resolves the daily record of each data source, from respectively
The daily record of individual data source obtains sample data.
B14, device as described in B12, wherein,
Described sample data processing unit, is suitable to be converted into each sample data collected the sample of specified format
Data;
Described search interactive unit, is suitable to be converted to the search word received the querying condition of specified format.
B15, device as described in B14, wherein,
Described sample data processing unit, is suitable to for each sample data, and from this sample data, extraction meets default
The field of condition;For each field extracted, from this sample data, extract the value of this field, by this field and this word
The value of section forms the two dimensional character that this field is corresponding;Characteristic set by two dimensional character composition corresponding for each field extracted
The sample data of the specified format obtained after changing as this sample data.
B16, device as described in B14, wherein,
Described search interactive unit, is suitable to for each search word, extracts and meet pre-conditioned word from this search word
Section, extracts the value of this field from this search word, using the two dimensional character that is made up of the value of this field He this field as this
The querying condition of the specified format obtained after search word conversion.
B17, device as described in B15 or B16, wherein, this device farther includes: tagged word phase library sets up unit;
Described tagged word phase library sets up unit, is adapted to set up tagged word phase library, and described tagged word phase library includes multiple spy
Levy field;
Described sample data processing unit, is suitable to, according to described tagged word phase library, travel through the word that this sample data is comprised
Section, extracts the field hitting described tagged word phase library;
Described search interactive unit, is suitable to according to described tagged word phase library, travels through the field that this search word is comprised, will life
Described in the field of tagged word phase library extract.
B18, device as described in B17, wherein, described tagged word phase library includes one or more feature field following:
Represent that data creation crosses the field of appointment process, represent the packet field containing macrodoce, represent data access mistake
The field of appointed website, the field representing addresses of items of mail, the field representing domain name, the field of expression IP address, expression URL address
Field.
B19, device as described in B17, wherein,
Described tagged word phase library sets up unit, is further adapted for every prefixed time interval, again collects feature field and adds
It is added in described tagged word phase library, described tagged word phase library is updated;
Described sample data processing unit, is further adapted for after described property data base is updated, and re-executes described
The operation that each sample data collected is formatted.
B20, device as described in B16, wherein,
Described search query unit, is suitable to each sample data traveling through in described sample database;For each sample number
According to, travel through the two dimensional character comprised in this sample data, if there is identical with the two dimensional character in described querying condition two
Dimensional feature, determines that this sample data meets described querying condition.
B21, device as described in B12, wherein, described sample database includes: distributed document process in framework point
Cloth file system.
B22, device as described in B12, wherein,
Described search interactive unit, is further adapted for the sample data found is back to client opening up described
Before showing, obtain the data form adapting to client;Be converted to adapt to client by the form of the sample data found
Data form;Sample data that is that find and that change through form is back to client be shown.