In the epoch that kownledge economy arrives, " knowledge " (knowledge) more and more is subjected to paying attention to widely." knowledge " is a kind of synthesis of flowing property, comprising structurized experience, value and through the information (information) of literalization etc.And described information be will derive from these source book of crossing without finishing analysis (data) such as newspapers, magazine and website source of information through such as classification, to integrate and analyze the back resulting, can allow the people that sees understand wherein meaning.How the data with enormous amount accurately and apace is converted into Useful Information, or the knowledge of deep layer more, becomes the problem of industry extensive concern just day by day.
The world-wide web of constantly popularizing and developing particularly Internet (Internet) has become a kind of important tool of inquiring about and obtaining related data, but with the Internet fast query and obtain comprehensive various data, yet, various and the enormous amount of data type on the Internet, the user is difficult to find its needed data from great data as cloud and mist, for helping the mode of user with full blast, find the data of the demand of hitting, can extract the filtration data from Internet by existing keyword query gopher and search engine technique, but nonetheless, the Query Result quantity that finally obtains is still very huge, will be from hundreds of, even the analysis result that obtains being correlated with in the statistical study in thousands of documents, for the domestic consumer's not a duck soup that is not the analysis professional, and, because the restriction of present network bandwidth deficiency, see through a large amount of data of network download, can exist speed of download slow, expend the problem of more time of user, and be easier to the networking broken string and make data download interruption, make the user can't in time obtain required information.
See also shown in Figure 1, it is a kind of existing service mode of giving information and analyzing for the client, wherein client (figure does not go out) need see through Internet and inquire its needed data to a long-range raw data base 10 (also can be a website of depositing source book) on the computing machine of a client 3, and from this data of client 3 downloads, because the user inquires the source book that is some unprocessed mistakes, and this source book often quantity is comparatively huge, rely on artificial mode need spend the more time to accomplish correct processing and statistics to above-mentioned a large amount of source book, so relevant software vendor 2 provides the software systems of specific source book being done statistical study for the client specially, for example the software product IPAM System of AuriginTM company (address correlation is: http://www.aurigin.com), and the Patent data analysis software product P atentLabTM-II of Wisdomain company (address correlation is: http://www.wisdomain.com, http://www.delphion.com).When the client need analyze source book, must be in advance at relevant software vendor 2 places purchase or the relevant statistical analysis software of free download, after being mounted on the computing machine of client 3 then, utilize these software systems that source book is done corresponding analysis again, it is the method for servicing of a kind of typical client/server (Client/Server) pattern.To sum up, following three steps of the common palpus process of existing analysis pattern:
(i) client buys (or free download) analysis software from service provider;
(ii) buy and download the source book file;
(iii) the client does analysis with analysis software to source book voluntarily.
The deficiency of above-mentioned existing analysis mode is: at first, final demand and purpose that the client carries out analysis have nothing in common with each other, some client only is concerned about the result or the conclusion of analysis, and be indifferent to the process of analysis, perhaps, perhaps the client is not the professional of analysis, do not know how to inquire about its conceivable source book, do not understand how analysis of data can obtain the result that it is wanted yet, and the action of existing analysis is to be finished by client oneself, its possibly can't be correct carry out analysis, maybe can not from source book, obtain the correct analysis result that it is wanted; Secondly, continuous development along with the analysis technology, the client is for the quantity of analysis simultaneously, and the degree of depth of analysis requires more and more high, the client buys (or free download) from service provider analysis software may have new function increase or improvement behind the process certain hour, and the client must constantly upgrade to this software, and may have the corresponding cost generation, so existing analysis service mode can not dynamically and in real time satisfy client's requirements at the higher level; Once more, as previously mentioned, the client must buy a large amount of source book could begin analysis to this locality, when data quantity is very huge, the client will spend the long period to download this data, and need with taking bigger storage device capacity, but, sometimes the client only needs the result after this source book analysis, and analysis finish after this source book there is no more value, so the service mode of existing analysis has the phenomenon of waste customer resources (time, expense etc.).
For ease of understanding, below technical term involved in the present invention is done briefly bright:
Hyperlink (Hyperlink) refers to the navigation link of a document to the another one document, and usually, hyperlink can show with highlighted literal, when with this hyperlink of click, then can jump on the another one document that is linked.
Hypertext (Hypertext), a kind of information mechanism of overall importance, it sets up hyperlink with the different piece in the document by keyword, makes information be able to search for interactive mode.
Internet (Internet), general reference is interconnected a network that forms by a plurality of computer networks, it is in function and a catenet of forming in logic, refers to that specially the whole world is maximum, that open, is interconnected the computer network that forms by numerous networks.
Webpage (Web page) also claims the Web webpage, is a kind of hypermedia display page on the Internet, is generally write by html language (seeing aftermentioned for details), and it can be used as multimedia propagation mediums such as literal, figure, sound.
HTML (Hypertext Markup Language), a kind of literal translation formula language of writing the Web webpage, allow to include in the literal program code of definition font, outward appearance, figure and hypertext link in the html language, and utilize the characteristic of hyperlink, provide the user to browse the content of Web webpage with a kind of order.
URL (Universal Resource Locator) is used to specify the method for expressing of information position on the WWW of Internet service routine.For example www.uspto.gov/index.htm represents the home location of United States Patent (USP) and trademark office.
Browser (Browser), the client browser of Web service can send various requests to web page server, and hypertext information and the various multimedia data format of sending from web page server by the html language definition made an explanation, shows and play.
See also shown in Figure 2, capture the integrated stand composition of the proxy server of analyzing for online material of the present invention, this service system includes raw data base 10 (also can be a website of depositing source book), service supplier 2 and client 3, wherein be provided with local data base 23 at service supplier's 2 ends, it can be used for storing the source book of acquisition from raw data base 10.The service procedure of the proxy server that online material acquisition of the present invention is analyzed is described as follows: the request message that is at first sent analysis by client's (not shown) on the computing machine of client 3 to service supplier 2, after receiving this request message, service supplier 2 understands the inquiry message that automatically it is done suitable conversion process and generate standard format, then, service supplier 2 is sent to raw data base 10 with the standard format message, and carry out the data-searching inquiry by the query engine of raw data base 10, and then obtain meeting some source book of the querying condition of aforesaid standardsization, then the relevant field that acquisition is classified and is downloaded to local data base 23 done in the literal of this source book, do corresponding analysis by the data in 2 pairs of these local data bases 23 of service supplier and handle, at last analysis result is sent to client 3 and collect the service reward to the client.
As mentioned above, the proxy server that online material of the present invention acquisition is analyzed can be used for providing for the client a kind of agency service mechanism of tool surcharge, acts on behalf of the client and carries out data check, download and analysis on the line.
See also shown in Figure 3, it is the data check of the proxy server of online material acquisition analysis of the present invention, the system block diagram of downloading and analyzing, include source book website 1, service supplier 2 and client 3, wherein source book website 1 has a web page server 11 and a raw data base 10, storage has a large amount of source book in the raw data base 10, these source book can be answered extraneous query requests and be showed to the external world by web page server 11 with the form of web webpage (writing with html language), and it is extraneous anyly to have all addressable this webpage servomechanism 11 of computing machine of web browser function and inquire required raw data.Service supplier 2 includes a control treatment module 21, data analysis module 22 and a local data base 23, and client 3 includes a web browser 31.
After source book website 1 receives a certain data check condition, can obtain respective queries results web page 5 by accessed web page servomechanism 11, shown Query Result is several hypertexts 50 on this page, and each hypertext 50 is all with the corresponding detailed content webpage 51 of hyperlink mode.The above-mentioned this Query Result message of service supplier 2 control treatment module 21 automatic pick-ups, and calculating chart sum and required service remuneration, notice client 3 is to confirm to buy (seeing aftermentioned for details).
When receive the client subscribe really buy message after, control treatment module 21 promptly begins to download automatically and the pairing detailed content webpage 51 of analysis Query Result, see also shown in Figure 4, automatically download the process flow diagram of analysis for data, the step that it comprises is as described below, for the explanation that makes this flow process is easier to understand, illustrate in conjunction with an instantiation:
(a) obtain the Web webpage 5 of Query Result according to querying condition; See also shown in the 5th figure, be one in United States Patent (USP) trademark office (address correlation: http://www.uspto.gov) with the part figure of the resulting Query Result Web of the querying condition webpage 5 of " ICL/G06F ", the qualified patent of wherein being inquired about is that the mode with hypertext 50 shows that this hypertext 50 is corresponding with each concrete patent specification content in the mode of hyperlink.
(b) the HTML source code (source code) of acquisition Web webpage 5; See also shown in Figure 6ly, be the HTML source code fragment of the Web webpage 5 of Fig. 5.
(c) find hypertext 50 pairing HTML source codes, and obtain the pairing URL of this hypertext; See also the HTML source code of Fig. 6, wherein<AHREF=" http://patents.uspto.gov/cgi-bin/ifetch4? ENG+PATBIB-1999-2000+0+990662+0+1+165850+F+1+19984+1+ICL %2fg06f "/A〉(first row and second row) be the pairing URL of the hyperlink address of representing a hypertext 50.
(d) according to above-mentioned URL address, control treatment module 21 is opened corresponding detailed content Web webpage 51; See also shown in Figure 7ly, be the pairing detailed content Web webpage 51 of the URL in the above-mentioned steps (c), the page that linked of the hypertext 50 among the figure five just, and Figure 8 shows that the HTML source code fragment of the detailed content Web webpage 51 of Fig. 7.
(e) in the source code of detailed content Web webpage 51, search the respective symbols string according to predetermined Database field title, and related data is write in the database; For example one of the preset data literary name section of local data base 23 is " Inventor " (inventor), then when control treatment module 21 searches character string " Inventor " in the HTML of above-mentioned detailed content Web webpage 51 source code, then with the character string of the corresponding creator's name after this character string of automatic pick-up to the respective field content of local data base 23, in this example, control treatment module 21 will capture " Goodwin; David W. ", " Cohn; RobertS. ", " Lowney; Paul G. " and " Rubin; Norman " four character string to local data base 23 " Inventor " in the field; according to identical principle and step; other relevant field message of this source book can be captured to local data base 23, and forms 6 (as shown in Figure 3) of document record.
(f) in the HTML source code of Query Result Web webpage 5, seek next bar hypertext 50,, then get back to step (c),, then finish if do not have if having.Utilize said method, the source book acquisition that all hypertexts 50 of Query Result Web webpage 5 can be linked, conversion and storage are gone in the relevant field of local data base 23, thereby produce some document records 6.
Utilize above-mentioned method, can realize from long-range raw data base 10 the required source book of inquiry, and with this raw data by control treatment module 21 automatic pick-ups of the service supplier's 2 ends effect to the local data base 23.
See also shown in Figure 9, capture the process of exchange synoptic diagram of the proxy server of analyzing for online material of the present invention, in this mechanism of exchange, include the client 3 of tool web browser 31, service supplier 2 and source book website 1 with web page server 11.Need the client (figure does not go out) of data check Analysis Service can utilize the web page server 11 of web browser 31 access service provider 2 of client 3, and message is asked in input on the input request Web page 40, and this request message is sent it back web page server 11; Control treatment module 21 is handled this request message, converts it the query statement of standard format to, then this query statement is inquired about long-range source book website 1 as restrictive condition; The result that control treatment module 21 will inquire automatically (sum of source book and tabulation) message and corresponding cost generate the Web webpage 41 that an affirmation is bought automatically, the source book tabulation and the sum that on the Web webpage 41 that this affirmation is bought, have Query Result, and the all-in charge that calculates according to the analytical model of data sum and customer selecting, the computing formula of required expense can be: total expenses=(the data stroke count * unit price of analysis) * analytical model weights, wherein the analytical model weights are that total resources (working time and workload etc.) according to service supplier that analysis consumes 2 obtains through conversion, every kind of analytical model is corresponding to different the analysis content and the degree of depth, can select according to needs separately for the client, for just understanding, now lift an example explanation, shown in Figure 9 subscribe really to buy list three kinds of analytical models in the Web webpage 41 altogether for customer selecting: A, B and A+B, every kind of pairing analytical model weights of analytical model are respectively 1,1.2 and 1.5, if the data stroke count of being inquired about is 200, the unit price of every document is 3 yuan, the client is selected to be the B analytical model, and then the total expenses that calculates according to above-mentioned formula is: 200 * 3 yuan/* 1.2=720 unit.The client is clicking required analytical model and is determining to buy corresponding generation the in back and confirm to buy message.
Receive the client subscribe really buy message after, the source book that control treatment module 21 acquisition is inquired about also writes local data base 23, forms some documents records 6; By analysis module 22 above-mentioned some documents record 6 is done corresponding analysis according to the analytical model of customer selecting, and analysis result is produced an analysis report 7 automatically; At last the Web page 42 of this analysis report 7 is showed to the client that the client will pay corresponding cost to service supplier 2 according to the agreement that aforesaid affirmation is bought after receiving this analysis report 7, finish transaction this time at last.