CN105205104A

CN105205104A - Cloud platform data acquisition method

Info

Publication number: CN105205104A
Application number: CN201510531172.6A
Authority: CN
Inventors: 张鹏
Original assignee: BEIJING BLTSFE INFORMATION TECHNOLOGY Co Ltd
Current assignee: BEIJING BLTSFE INFORMATION TECHNOLOGY Co Ltd
Priority date: 2015-08-26
Filing date: 2015-08-26
Publication date: 2015-12-30

Abstract

The invention provides a cloud platform data acquisition method. The method comprises the steps of integrating multiple query methods in a distributed environment, and providing a unified query interface for a user by taking non-structured query and structured data query as execution units; converting the query requests of the user into formats which can be recognized by the multiple member query methods; and finally returning the query results to the user according to a certain format. The cloud platform financial data query method provided by the invention can be used for overcoming the defects of the conventional structured data query in the aspect of flexibility and practicability, lowering the technical difficulty that non-specialized persons query a database, and can well utilize the value of business data.

Description

A kind of cloud platform data acquisition methods

Technical field

The present invention relates to finance data process, particularly a kind of cloud platform data acquisition methods.

Background technology

Finance data is that investor carries out investment decision, stock trader Tou Yan department carries out the important evidence studied, for corporate client and Tou Yan department provide timely, accurate, easy-to-use finance data to be the long-term and challenge of arduousness of relevant departments one of facing always.Along with the arrival of the rich informationization of network and large data age, comprised a large amount of structurings and unstructured information in current finance data, and increment is huge.While system for cloud computing science and technology level is developed by leaps and bounds, in order to avoid useful data message runs off, just need to set up corresponding database as carrier to store these data.But the data retrieval present situation under cloud computing environment is, the specification disunity of retrieve data, cause the understanding of retrieval of content different, the deviation of demand causes Functional Design lack of standardization, and the longitudinal direction directly affected between the superior and the subordinate's application is through; The management control effects that existing querying method changes newly increased requirement, demand is not obvious, in the change etc. of reply data structure extension, is difficult to the border expanding inquiry application.

Summary of the invention

For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of cloud platform data acquisition methods, in based on the finance data searching system of cloud computing, carrying out data retrieval and inquiry, comprising:

Multiple queries method under distributed environment is carried out integrated, using unstructured search and structured data query all as performance element, for user provides unified query interface; The inquiry request of user is converted to the form that multiple membership query method can identify, Query Result returns to user with certain form the most at last.

Preferably, in described unstructured search, provide resource management, Data Integration, index stores by the described searching system based on cloud computing; And build non-structured data query service system; Adopt Hadoop Open Framework structure, rely on ZooKeeper mechanism and carry out distributed coordination, cluster metadata and Set up-conservancy, retrieve layer provides index upgrade, index deletion, inquiry, participle, index database, external interface module; Data collection layer is provided infrastructures and the administration module of data resource; Levels interface for coordinating data interaction between two rank and Service delivery, with traffic format standard for according to the design carrying out index database; By artificial pretreated mode, document content is divided, generate the text chunk that different key term is corresponding, using the original input of setting up as index database, use the interface function that Servlet technology of increasing income provides, realize the foundation of index, interpolation, renewal, deletion and inquiry, form the inverted index that user inputs keyword-key term-document, and externally provide HTTP calling interface by the secondary development customized;

In described structuralized query, keyword query is applied to relational database, modeling is carried out to database structure, the mode of use figure carrys out the topological structure of characterization database, form structural data mode chart, Query Problem is converted into figure and inquires about problem, described structural data mode chart is a non-directed graph G=(V, E), wherein V represents the set on summit, the relation table of each vertex correspondence in database, and E represents the set on limit, every bar limit corresponds to a foreign key relationship between tables of data, and concrete query script comprises:

Step 1: create node concordance list, in described node concordance list characterisation of structures data pattern figure each summit comprise the index structure of key word, creation method is: each field of often row in tables of data, relation table is spliced into document, to the document extracting keywords, form the inverted index of keyword to table name, row name;

Step 2: according to keyword positioning relationship table, for the keyword of user's input, comes by query node concordance list the summit comprising this keyword in station-keeping mode figure;

Step 3: carry out data query centered by keyword; Expand centered by the summit that described step 2 generates, generate the data query pattern of candidate, each query pattern is the subgraph of structural data mode chart, and contains all keywords; The expansion of query pattern adopts the method for breadth first traversal, and process is as follows:

1) define queue Q and V, the Centroid of all generations is added in queue Q and V as originate mode;

2) from Q, pattern P is taken out, by the association mode { P of P ₁, P ₂..., P _nadd in queue Q and V, wherein association mode P _i(i=1,2 ..., n), meet the following conditions: 1. | P _i|=| P|+1, wherein | P _i| be P _icomprise the number on summit; 2. P _iv is not present in for connected graph;

3) travel through patterns all in Q successively, until Q be sky, choose meet following condition query pattern as Output rusults:

1. output mode needs to comprise all keywords;

2. leaf summit all comprises at least one keyword;

3. the number of vertex that output mode comprises should be less than predetermined maximal value S _max;

4) according to query pattern splicing construction query language (SQL) statement, SQL query statement is all spliced to each candidate query pattern, by concordance list described in user's keyword query, obtain table name and row name information write SQL statement, use SQL carry out data base querying and return Query Result.

Preferably, described finance data searching system comprises service server, application server, data server, integrated service device and each database; Wherein, service server carries out information retrieval by calling application server, and usage data information carries out Push Service; Application server carries out unified index and maintenance to data; Integrated service device is integrated structuring, unstructured data, adopts to look into heavy-duty machine system and data-pushing technology and realize the Classifying Sum of data and regular, and is shown as user by protocol interface and front end page and service server provides information service;

The finance data being dispersed in each Database Systems, file system and internet integrated by integrated service device, gathers and clean data, and by the Data Integration strategy based on business division territory, the Data Integration of separate sources main body formed data server; The main process of Data Integration service comprises: first inquiry request is delivered to data extraction module with XMLSchema form, data extraction module converts XML to SQL query statement, then data pick-up is carried out according to Query Result, finally the form that the result set extracted converts XML to is passed to integrated processing module, unstructured data also needs to change into XML format, then does integrated process and the data server that finally generation is unified by integrated processing module to XML document;

Utilize and look into heavy-duty machine system based on the text of paragraph topic, the subject information of text data is used to compare its similarity, realize the classification of the finance data for same subject and identical content, producing an eigenwert by each paragraph in text, is the characteristic value collection based on paragraph topic by a text representation; Calculated the similarity of two texts by the paragraph eigenwert comparing two texts, then think repeated text when last similarity exceedes setting threshold value, carry out looking into retry; The one-piece construction that these data look into heavy framework comprises: look into restructuring part, look into reconfiguration management, look into heavily interpretation of result three part; Wherein, look into restructuring part by semantic analytics engine for carrying out word segmentation processing to data content, characteristic value generator generates the eigenwert feature of data according to word segmentation result, the eigenwert of 64 is divided equally 4 groups by same rule and carries out index stores; In eigenwert comparison process, first carry out the dimensionality reduction of data calculating, and the data feature values Hamming distances calculating data characteristics value tag and eigenwert storehouse is more than or equal to the comparison result of 3; Look into reconfiguration management and it is investigated that heavy result carries out log recording for logarithm, and check that data look into heavy result;

In addition, the data delivery system in searching system adopts the propelling movement algorithm based on user behavior cluster to realize personalized data push service; By setting up binary relation mutually corresponding between user and data, utilize the similarity relationships of user behavior to excavate the potential interested object of each user, and then carry out personalized propelling movement; Data delivery system is made up of the user behaviors log logging modle of user profile, the model analysis module of user preference and propelling movement algoritic module three part; Wherein user behaviors log logging modle is used in the various actions information of each business contact point recording user, comprises the residence time of the page, clicks sequence, the personal information of content-browsing record and user and transactions history (deriving from centralized transaction/trading system), market browsing histories (source market system); The model analysis module of user preference is used for the analysis to User action log, the attribute of user's multi-angle is calculated and is marked, for each user sets up respective many attribute descriptions, and use professional knowledge and the numerous attribute ratings of Data Mining Tools to user to carry out cluster, the user with similar behavior pattern is flocked together; Push algoritic module be then utilize combinational algorithm from data server according to classify and grading user model for user calculates the client interests degree of each data in real time, and return wherein to carry out concentrating to business foreground and show.

The present invention compared to existing technology, has the following advantages:

The present invention proposes a kind of cloud platform finance data acquisition methods, overcome the drawback of traditional structure data query in dirigibility and practicality, reduce the technical threshold of layman's Query Database, utilize the value of business datum better.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the cloud platform data acquisition methods according to the embodiment of the present invention.

Embodiment

Detailed description to one or more embodiment of the present invention is hereafter provided together with the accompanying drawing of the diagram principle of the invention.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.Scope of the present invention is only defined by the claims, and the present invention contain many substitute, amendment and equivalent.Set forth many details in the following description to provide thorough understanding of the present invention.These details are provided for exemplary purposes, and also can realize the present invention according to claims without some in these details or all details.

Fig. 1 is the cloud platform data acquisition methods process flow diagram according to the embodiment of the present invention.The finance data searching system that the present invention is based on cloud computing mainly comprises with lower part: service server, application server, data server, integrated service device and each database.Wherein, service server carries out information retrieval by calling application server, and usage data information carries out Push Service.Application server has retrieval and index ability, for carrying out unified index and maintenance to data.Integrated service device possesses the ability integrated structuring, unstructured data, adopt and look into heavy-duty machine system and data-pushing technology and realize the Classifying Sum of data and regular, and be shown as user by protocol interface and front end page and service server provides information service.

Wherein, the finance data being dispersed in each Database Systems, file system and internet integrated by integrated service device, data gathered and cleans, and by the Data Integration strategy based on business division territory, the Data Integration of separate sources main body being formed data server.The main process of Data Integration service comprises: first inquiry request is delivered to data extraction module with XMLSchema form, data extraction module converts XML to SQL query statement, then data pick-up is carried out according to Query Result, finally the form that the result set extracted converts XML to is passed to integrated processing module, equally, unstructured data also needs to change into XML format, then does integrated process and the data server that finally generation is unified by integrated processing module to XML document.

Financial industry system data amount is very large, also very high to the security requirement of data.Hadoop framework uses distributed file system (HDFS) to store as low layer and supports; HDFS provides the mass data storage solution of a kind of high fault tolerance and high-throughput, and it does not shut down the characteristics such as dynamic capacity-expanding, data automatically detect and copy is that the large data access of platform and the high security of data provide solution route.The file block storage characteristics of HDFS makes carrying out the Distribution Algorithm of system can being relied on automatically to realize the migration of data block and the upgrading of capacity when power system capacity expands, and to delay machine or manual maintenance without the need to system.The data self replication strategy that HDFS has and data consistency means for automatic monitoring system meet the high security requirement of data.HDFS resource optimal allocation and many copies access mechanism have increased substantially the data read rates of system, and HDFS is the several times of conventional store scheme for the access performance of forms data block.

The HDFS data model storage of this platform is divided into Daas, PaaS, SaaS tri-layers from top to bottom successively.(1) DaaS (data and service layer) is mainly used in data storage and search, utilizes the features such as the dirigibility of HDFS, low delay, distributivity after regular, externally to provide data, services by the data of data server.(2) PaaS (platform and service layer) is mainly used in the access of data and file and supports secondary development, and unified certification is completed by ldap server, and platform adopts JDBC data access interface to be the difference that service server shields isomery DBMS.(3) SaaS (software and service layer) adopts client layer Intel Virtualization Technology to achieve centralized transaction daily record storage and analytic system and historical quotes data management and searching platform etc. and externally provides many tenants, extendible software service.

Finance data amount after searching system integration is very large, and have many data to be the processing and process carried out same data from different perspectives by different information announcing main body, study subject and industry media, thus platform is faced with that recall precision is low, there is the challenge of a large amount of repeated and redundant information in result for retrieval.In order to improve the efficiency and convenience that information uses, improving the Experience Degree of user, using the technology of large data processing, by full-text search, data look into heavily etc. means for user provide comprehensively, information retrieval service accurately.

At present, in finance data, unstructured information accounts for more than 80% of informational capacity, and the field search technology of traditional Relational DataBase has inadequate natural endowment, especially in the process for magnanimity unstructured information to process unstructured information.Utilization global search technology solves the process to unstructured information, based on the Lucene Development Framework of increasing income, by building text retrieval system to the customized development of Lucene core layer and relevant interface.

Searching system for core, functionally can be divided into index, query and maintenance three part with Lucene full text enquiring method.Index part is used for processing the data of database purchase, sets up index structure; The retrieval request that query portion receiving front-end system is submitted, searches index; Service portion then for increasing index, revising, the maintenance work such as deletion.The implementation procedure of whole searching system comprises: carry out pre-service to document; Carry out participle and create document index.For Chinese word segmentation, what Lucene adopted is that binary divides morphology; There is provided query function, the index namely utilizing Lucene to set up is inquired about.

The Lucene Development Framework that system adopts comprises Lucene corn module and customized development module two parts.Lucene corn module comprises index/searcher layer, accumulation layer and inverted index file layers, wherein, inverted index is used for being stored in the mapping of the memory location of certain word in a document or one group of document under full-text query, is the core technology that Lucene realizes fast query.Based on the customized development module on core layer, comprise lexical analysis layer, text resolution layer and application layer.Wherein, text resolution layer is resolved mainly through the document of various document resolver to different-format, thus obtains the text invention part of convenient operation; Text is then mainly divided into word and selects suitable word to set up index by lexical analysis layer, uses corresponding Chinese analysis device for needing during Chinese retrieval.

In order to obtain better retrieval effectiveness, system also need to every day warehouse-in all kinds of finance datas look into and heavily process.The efficiency looking into weight due to raising information is significant for the experience of the performance and user that promote searching system, present invention employs new looking into and weigh framework, propose a kind of text based on paragraph topic and look into heavy-duty machine system, the subject information of text data is used to compare its similarity, realize the classification of the finance data for same subject and identical content, look into heavy effect to improve further.Taken into full account the structure of text and the distribution situation of characteristic, produced an eigenwert by each paragraph in text, thus text can be expressed as the characteristic value collection based on paragraph topic.For same text, characteristic value collection based on paragraph topic comprises more information than single features value, these information can be amplified the otherness between text when calculating the Hamming distances of characteristic value collection, thus improve the accuracy rate judged text similarity.This overall step looking into weighing method comprises: the paragraph eigenwert extracting each paragraph according to the paragraph topic of text, then the paragraph eigenwert by comparing two texts calculates the similarity of two texts, then think repeated text when last similarity exceedes setting threshold value, carry out looking into retry.

The one-piece construction that these data look into heavy framework comprises: look into restructuring part, look into reconfiguration management, look into heavily interpretation of result three part.Wherein, look into restructuring part by semantic analytics engine for carrying out word segmentation processing to data content, characteristic value generator generates the eigenwert feature of data according to word segmentation result.The eigenwert of 64 is divided equally 4 groups by same rule to carry out index stores.In eigenwert comparison process, first will carry out the dimensionality reduction of data calculating according to drawer principle, and the data feature values Hamming distances calculating data characteristics value tag and eigenwert storehouse is more than or equal to the comparison result of 3.Look into reconfiguration management and it is investigated that heavy result carries out log recording for logarithm, and can check that data look into heavy result.

In order to promote Consumer's Experience further, searching system of the present invention has also built data delivery system, adopts the propelling movement algorithm based on user behavior cluster to realize personalized data push service.This personalized push is by setting up binary relation mutually corresponding between user and data, utilizing the similarity relationships of user behavior to excavate the potential interested object of each user, and then carries out personalized propelling movement, and its essence is also a kind of information filtering.

Data delivery system is made up of the user behaviors log logging modle of user profile, the model analysis module of user preference and propelling movement algoritic module three part.Wherein user behaviors log logging modle is used for the various actions information at each business contact point recording user, comprise the residence time of the page, click sequence, the personal information of content-browsing record and user and transactions history (deriving from centralized transaction/trading system), market browsing histories (source market system) etc., these information are data bases of subsequent analysis and data-pushing; The model analysis module of user preference is used for the analysis to User action log,

The attribute of user's multi-angle is calculated and is marked, for each user sets up respective many attribute descriptions, and use professional knowledge and the numerous attribute ratings of Data Mining Tools to user to carry out cluster, namely the user with similar behavior pattern is flocked together, this system is according to the risk partiality of user, condition of assets, to hold position distribution, brisk trade degree, profitability, investment instrument preference, life cycle, data use preference, data use multiple attributes such as history to establish corresponding classify and grading user data and use a model, effective foundation of this model is the difficult point of whole supplying system, push algoritic module be then utilize combinational algorithm from data server according to classify and grading user model for user calculates the client interests degree of each data in real time, and return N bar wherein and carry out concentrating to business foreground and show, pushing algoritic module is the core link of whole supplying system.

Based on above-mentioned searching system, the present invention proposes following data base concurrency acquisition methods, make full use of cloud computing advantage, by autonomous Design with the fusion of traditional keyword query mode implementing structured, destructuring two category information, shield the structured data pattern of bottom complexity, overcome the drawback of traditional structure data query in dirigibility and practicality, make the method effectively can reduce the technical threshold in layman's query traffic data storehouse, utilize the value of business datum better.

For financial application, Structure of need data and non-structured text simultaneously, the fusion of two category informations becomes a key problem.The key addressed this problem is to seek efficient information query method, thus realizes freely inquiring about two category informations.The querying method that the present invention proposes uses Meta Search Engine structure, by as a whole for the multiple queries method integration under distributed environment, for user provides unified query interface.The inquiry request of user is converted into the form that multiple membership query method can identify, via querying method management and running, the inquiry of specification is distributed to membership query method, and Query Result returns to user with certain form the most at last.In one-piece construction, destructuring, structured data query are all as the performance element of querying method.

The present invention utilizes keyword query mode to reduce the complicacy of business datum inquiry, makes user can obtain required Query Result quickly and easily.Adopt cloud computing and vertical querying method two class technology: on the one hand, provide the functions such as resource management, Data Integration, index stores by the above-mentioned searching system based on cloud computing; On the other hand, non-structured data query service system is built by traditional directory method basic module.On technology realizes, the design of searching system adopts Hadoop Open Framework structure, relies on ZooKeeper mechanism and carries out distributed coordination, cluster metadata and Set up-conservancy, improve performance and the extended capability of system; The integral layout of querying method comprises 3 levels, i.e. retrieve layer, data collection layer, levels interface.Retrieve layer provides the modules such as index upgrade, index deletion, inquiry, participle, index database, external interface; Data collection layer is provided infrastructures and the administration module of data resource; Levels interface is for coordinating data interaction between two rank and Service delivery.Because financial class data have stronger service feature and unified format standard, in the querying method that the present invention proposes with traffic format standard for according to carrying out the design of index database.In implementation, by artificial pretreated mode, document content is divided, generate the text chunk that different key term is corresponding, using the original input of setting up as index database.On this basis, use the interface function that Servlet technology of increasing income provides, realize the foundation of index, interpolation, renewal, deletion and inquiry, form the inverted index that user inputs keyword-key term-document, and externally provide HTTP calling interface by the secondary development customized.

In view of the ease for use of keyword query mode in unstructured data retrieval, the present invention proposes keyword query technology to be applied to relational database field, realizes the finance data data base query method based on keyword.

The method carries out modeling to database structure, and the mode of use figure carrys out the topological structure of characterization database, forms structural data mode chart, Query Problem is converted into figure and inquires about problem.Structural data mode chart is a non-directed graph G=(V, E), and wherein V represents the set on summit, the relation table of each vertex correspondence in database, and E represents the set on limit, and every bar limit corresponds to a foreign key relationship between tables of data.Concrete querying method comprises following link.

Step 1: create node concordance list, node concordance list characterize each summit in structural data mode chart comprise the index structure of key word, creation method is: each field of often row in tables of data, relation table is spliced into document, to the document extracting keywords, form the inverted index of keyword to table name, row name.

Step 2: according to keyword positioning relationship table.For the keyword of user's input, come by query node concordance list the summit comprising this keyword in station-keeping mode figure.

Step 3: carry out data query centered by keyword.Expand centered by the summit that step 2 generates, generate the data query pattern of candidate, each query pattern is the subgraph of structural data mode chart, and contains all keywords.The extended mode of query pattern adopts the method for breadth first traversal, and process is as follows.

1) define queue Q and V, the Centroid of all generations is added in queue Q and V as originate mode.

2) from Q, pattern P is taken out, by the association mode { P of P ₁, P ₂..., P _nadd in queue Q and V, wherein association mode P _i(i=1,2 ..., n), meet the following conditions: 1. | P _i|=| P|+1, wherein | P _i| be P _icomprise the number on summit; 2. P _iv is not present in for connected graph.

1. output mode needs to comprise all keywords;

2. leaf summit all comprises at least one keyword;

3. the number of vertex that output mode comprises should be less than predetermined maximal value S _max(being generally set as 5).

4) according to query pattern splicing construction query language (SQL) statement.SQL query statement is all spliced to each candidate query pattern, by concordance list described in user's keyword query, obtains table name and row name information write SQL statement.SQL is used to carry out data base querying and return Query Result.

For the query processing of text, participle is as the preposition pre-treatment step setting up index, if although it is realized also can achieving the goal with independent MapReduce operation, but owing to adding MapReduce task, the treatment cycle of whole operation will be increased, but also a lot of I/O can be increased operate, thus treatment effeciency is not high.Therefore, Chinese word segmentation pre-service is embodied as an auxiliary Map process by the present invention, by itself and the MapReduce task setting up the core Map of index and Reduce process and merge into a chain type, thus completes whole operation.

The foundation of text index is the core link of text-processing, and a good solution carries out distributed search by cluster exactly, and this just requires to set up distributed index.The foundation of index is well suited for adopting MapReduce programming model to realize, and the distributed index set up by MapReduce leaves in distributed system, for follow-up distributed search is provided convenience.

Set up the key issue that inverted index table is index, output after Text Pretreatment Map nonproductive task is set up the Map input of MapReduce task as index, the output of setting up the Map task of index is the character string of each Chinese vocabulary, the degree of correlation of index terms and document and position in a document.Index terms character string and its unknown in a document can be obtained by participle software package.Set up the Reduce task of Index process for the output information of Map task being integrated, thus form inverted index list file.

In HBase, data file is just opened in startup, and remains open mode in processing procedure, is therefore more suitable for real-time retrieval.By writing MapReduce operation, index file is loaded into HBase distributed data base, in an experiment, direct by original document importing HBase, because system is not continual acquisition data from network, but just can increase text data set at set intervals, so retrieve index file, this can improve effectiveness of retrieval further.

In sum, the present invention proposes a kind of cloud platform finance data acquisition methods, overcome the drawback of traditional structure data query in dirigibility and practicality, reduce the technical threshold of layman's Query Database, utilize the value of business datum better.

Obviously, it should be appreciated by those skilled in the art, above-mentioned of the present invention each module or each step can realize with general computing system, they can concentrate on single computing system, or be distributed on network that multiple computing system forms, alternatively, they can realize with the executable program code of computing system, thus, they can be stored and be performed by computing system within the storage system.Like this, the present invention is not restricted to any specific hardware and software combination.

Should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims

1. a cloud platform data acquisition methods, for carrying out data retrieval and inquiry in based on the finance data searching system of cloud computing, is characterized in that, comprise:

2. method according to claim 1, is characterized in that, in described unstructured search, provides resource management, Data Integration, index stores by the described searching system based on cloud computing; And build non-structured data query service system; Adopt Hadoop Open Framework structure, rely on ZooKeeper mechanism and carry out distributed coordination, cluster metadata and Set up-conservancy, retrieve layer provides index upgrade, index deletion, inquiry, participle, index database, external interface module; Data collection layer is provided infrastructures and the administration module of data resource; Levels interface for coordinating data interaction between two rank and Service delivery, with traffic format standard for according to the design carrying out index database; By artificial pretreated mode, document content is divided, generate the text chunk that different key term is corresponding, using the original input of setting up as index database, use the interface function that Servlet technology of increasing income provides, realize the foundation of index, interpolation, renewal, deletion and inquiry, form the inverted index that user inputs keyword-key term-document, and externally provide HTTP calling interface by the secondary development customized;

1. output mode needs to comprise all keywords;

2. leaf summit all comprises at least one keyword;

3. method according to claim 2, is characterized in that, described finance data searching system comprises service server, application server, data server, integrated service device and each database; Wherein, service server carries out information retrieval by calling application server, and usage data information carries out Push Service; Application server carries out unified index and maintenance to data; Integrated service device is integrated structuring, unstructured data, adopts to look into heavy-duty machine system and data-pushing technology and realize the Classifying Sum of data and regular, and is shown as user by protocol interface and front end page and service server provides information service;