CN102708114A - Method for realizing real-time online searching through mutually connected computer network - Google Patents

Method for realizing real-time online searching through mutually connected computer network Download PDF

Info

Publication number
CN102708114A
CN102708114A CN2012100285558A CN201210028555A CN102708114A CN 102708114 A CN102708114 A CN 102708114A CN 2012100285558 A CN2012100285558 A CN 2012100285558A CN 201210028555 A CN201210028555 A CN 201210028555A CN 102708114 A CN102708114 A CN 102708114A
Authority
CN
China
Prior art keywords
seller
information
price
buyer
explanation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100285558A
Other languages
Chinese (zh)
Other versions
CN102708114B (en
Inventor
谢越辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kaichuang Research Co., Ltd.
Original Assignee
LINGQIU CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LINGQIU CO Ltd filed Critical LINGQIU CO Ltd
Publication of CN102708114A publication Critical patent/CN102708114A/en
Application granted granted Critical
Publication of CN102708114B publication Critical patent/CN102708114B/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a method for realizing real-time online searching through a mutually connected computer network. Off-line database information of many seller sites of the mutually connected computer network is saved. The information comprises URL, searching table URL, domain description and seller description, and the seller description comprises universalized rule of product information organization method of each seller site. The information saved in the off-line database is used to process target product price comparison request parameters, the price comparison request is received from an online user or a buyer or/and a system. Then real price and product information are extracted from identified sites of the seller sites, wherein the extracted price and product information are emerged in a site native language form, and the extracted price and product information are displayed to a user.

Description

Carry out the method for real-time online searching disposal through interconnective computer network
The application divides an application, and the application number of its original bill application is 01819690.X, and the applying date is September 27 calendar year 2001, and denomination of invention is " the on-line intelligence information comparison agent device of the multilingual electronic data sources through the interconnected computer network ".
Computer program tabulation appendix
Please with reference to 10 pages the computer program tabulation appendix of being submitted to altogether.The material that is included in this appendix is hereby expressly incorporated by reference in the lump.
Technical field
Present invention relates in general to the robotization task on the WWW (world wide web); Relate in particular to robotization task about online buyer or user; Such as on the WWW, carrying out interaction or comparative shopping through an independent interface and multilingual seller; With the raising communication efficiency, and personalized shopping experience is provided.
Background technology
Since the mid-90 in 20th century, the WWW produced, the scale of internet had enlarged thousands of times.Nowadays people " interconnected ", and are not interactive with direct aspectant mode, but through virtual communication channel.This new technological revolution has fundamentally changed people's life style.
With WWW parallel development be " information-technology age ", it has brought various from the product information to the scientific paper to make us dizzy online information resource.Because ecommerce has utilized low cost that the internet gave and convenience, these factors cause the scale exponentially level of ecommerce to increase.
Millions of or more online seller is arranged on the WWW.Though current shopping comparison or price comparison search engine can retrieve from different online rivals according to online buyer or user's requirement and desired product and the relevant to a certain extent Search Results of price, this buyer or user might face boundless information ocean.Sometimes; This buyer or user have obtained the page of " search failure "; Reason is that search engine has been omitted the website that present internet connects the online multilingual seller in other country in (having 245 at present) country, and desired commodity of user or service are being sold by these countries.In addition, though be easy on the net obtain about product and seller's information, buyer or user were still paced up and down in whole stages of this purchasing process.
The potentiality that the internet is transformed into real global overall market with the pattern of current ecommerce major part so far fail to realize that electronic business transaction does not realize robotization yet.Shopping online is far from so simple, efficient and interesting.For searching needed product of online buyer and the businessman that is ready to sell such products & services, search engine and convergence directory service are not enough.And common online shopping process nearly all is artificial the completion, in order to obtain the thing that he or she will search for, requires the buyer to import full terms and keyword.Therefore, just in the face of can fearing of a task, he need collect and understand the information about product and businessman to the buyer of an expection, and these information are made decision, and input is at last bought and paid for information about.Situation is, too much information makes this user or buyer feel can't bear the heavy load, and they do not have so many time and expertise.
According to complicated degree, adopt now and implemented two kinds not too perfectly ways to be partly implemented in the robotization of line catalogue price comparison procedure, as follows:
(1) non real-time method
(2) method of real-time hard coded wrapper (hard-coded wrapper)
The non real-time method is the method for the simplest strike price comparison agent.Its enforcement has comprised and has artificially collected the information that is necessary on the net, writes out the file of HTML separately then for the project of each Search Results, so that this Search Results is able to visually demonstrate.
Above-described benefit is conspicuous---enforcement and search are rapidly easily.Although so many benefits are arranged, three main unsatisfactory shortcomings are arranged still.At first,, become expensive in the extreme, especially when the factor of the sustainable growth situation of considering the internet so safeguard a huge wrapper resources bank (wrapper repository) because price comparison is artificial the completion.Secondly, must drop into very big strength upgrades price and out of Memory.At last, need to store and coordinate the capacity of all above database of information should be very big.
The method of hard coded wrapper is that a kind of of non-real-time method substitutes in real time.With directly obtain in the non real-time method the sort of mode of project different be that real-time method is attempted the HTML page is summarised as a specific form.In order to realize the extraction task, then call a custom encapsulation device program that is called pcwrapHLRT (program abbreviation).Fig. 1 provides this program a relevant example partly, and one " while " circulation is arranged in this program.In this example, the algorithm after a wrapper is created is to be limited in target data on the page of a HTML (HTML) through a pair of separator (delimiter).This pcwrapHRLT program is moved owing to this website shows a unified format rule.Price is shown with the form of italic the product item with runic.PcwrapHRLT is through the character string { " special to the HTML file<b>, "</B>, "<l>, " "</l>" } scanned and moved, and discerned the text fragments that is extracted by these character strings.These word strings are regarded as Ii respectively by the pcwrapHRLT program again, ri, lp and rp.({ i p}) has represented that this character string delimit the boundary that is extracted the attribute left side to k ∈ to symbol lk, and rk then indicates the separator on the right.The attribute that the packed device of other possibilities extracts then is name of product, figure, condition and situation or the like.
When providing a HTML page, pcwrapHLRT number begins to scan full page from the banner heading row successively.Separator " < B>" is partly searched in non-scanning through at this page, and to check whether additional model and/or price are arranged right to extracting number in outer circulation.As long as the beginning of a model comes to light, interior circulation then is called to extract an appropriate page substring.
Seldom there is the website to announce their format convention.So one is used the deviser of the Information Collection System of pcwrapHLRT just to set up such wrapper for each information source is manual.Regrettably, the process of this hard coded is not only dull, and easy error, and reason is the coding that a common HTML page just possibly comprise thousands of row.And most websites all can their format convention of periodic variation, thereby upsets wrapper.
Another shortcoming of pcwrapHLRT is that search speed is not very fast, reason be proxy server have to receive one from user's request after this vendor sites of contact.Because this wrapper is automanual, thus extra management work must implement with the mode of manual analysis HTML page formatting, to confirm wrapper.
Summary of the invention
Because the aforesaid this problem that generally runs into; Based on a new internet strategy; A kind of method that substitutes as to artificial and semi-automatic operation is automatic processing-on-line intelligence price comparison proxy server; It can alleviate online burden of buying or selling the price comparison process of (auction or the like) catalogue, and simultaneously, it can also provide a navigational environment preferably with an interconnected internet close friend's interactive proxy server character image user interface (IACGUI).Become more " soft " when the e-procurement of the B-to-C of so-called the 4th generation whole world overall market framework-proxy server intermediary, C-to-C, B-to-B and auction and these patterns of G-to-B/C (government is to merchant/customer) make e commercial affairs (ecommerce) and m commercial affairs (Mobile business)-that is when more universal, this method will be very useful.Therefore, system of the present invention provides an environment preferably for the transaction of consumer-businessman.
In brief, on-line intelligence price comparison proxy server is exactly a kind of automatic on-line dealing assistant, and it is searched for numerous global online multilingual shops fast, searches out the chance of conclusion of the business then to each commodity.They also provide value-added (according to client's grade) commerce-network (Business-Web) service to online buyer or user.Such proxy server has very big attractive force, and reason is that they can alleviate user's artificial monotonicity of carrying out each operation in customer buying behavior's pattern.
By convention, communication is carried out with the webserver of online service in buyer/user interface through front end, and this interface provides a form that is used for typing commodity item to be searched that supplies user or buyer to make a report on.In case buyer or user present searching request, the webserver of online service is just inquired about its database to find the content that is complementary, and then Query Result is delivered to user's web browser.
In the present invention; User agent's device in the line directory price comparison process (on-line intelligence price comparison proxy server replace buyer/user (people) and move) is searched for commodity item classification and keyword to be looked into; And; For buyer/user's maximum benefit, the multilingual webserver that this user agent's device coexists in 246 networking countries that computer network on the WWW links carries out communication.Then, online seller's address sort that this user agent's device is found it is arranged, and passes through the summary of web browser a Search Results in this online user (end user) submission.
Use system of the present invention and make the shared market share multiplication of e commercial affairs (ecommerce), its benefit is very significant.The efficient of communication and effect can increase considerably, and simultaneously, still are online buyer for online seller no matter, can save time to the full extent and cost.The most important thing is that this buyer or user will obtain unprecedented, countless information sources and countless merchandise resourceses in the world, and immeasurable commercial opportunity.System and method of the present invention also will help the boundary on elimination time and aphasis, the demography, really makes the ecommerce globalization.In addition, the personalization of this user agent's device, lasting running, these characteristics of automatism also make them be able to adapt to well between two parties (mediate intermediary) buyer/consumer's behavior.In view of the above, the present invention will help to make whole purchasing process optimization and make present ecommerce that revolutionary change take place.
Therefore, an object of the present invention is to provide the price comparison method of a kind of improved online seller's product or service.
Another object of the present invention is seller's explanation (vendor description) of setting up online shop.
Another object of the present invention is to collect the data that comprise commodity sample and URL, and these data are used for training (training).
Another object of the present invention is before the search of carrying out online shop or seller's network address, to recapture training data.
Another object of the present invention is to utilize the information that from training data, obtains to collect the training webpage from online seller.
Another object of the present invention is from training data and collected training webpage, to produce seller's explanation.
Another object of the present invention is that the seller's explanation that produces is stored in the database.
Another object of the present invention is an interface to be provided so that it increases, revises and delete the seller of this system's support for the system manager.
Another object of the present invention is an interface to be provided so that it checks seller's information for system operator.
Another object of the present invention provides a price comparison method, and a client can initiatively carry out price comparison whereby.
Another object of the present invention is to resolve to useful data to HTML.
Another object of the present invention is that desired product/service is classified and filtered.
Another object of the present invention provides an independent interface, so that the online multilingual sellers different on WWW or the internet and the price of zones of different are compared.
In system of the present invention practical implementation one first user agent's device, with the pattern work of the meaning of one's words identification person's of knowing proxy server (SRLA).It learns the URL of seller's network address and territory thereof explanation with the conclusion acquisition method, conclude thereby implement an autonomous in real time encapsulation, and based on before by the system manager provide editor's or the training instance prepared summarize organization regulation about seller's network address.(in one embodiment, SRLA is attached to the explanation that Microsoft brand back-end SQL-compliant server or Microsoft Access database produce a seller and product, and each online shop only once.) this encapsulation concludes through the encapsulation of creating the instance that extracts in seller and the description of product from be stored in offline database in real time and accomplish.Then; This SRLA automatically arrives at the webpage of the main frame of long-range seller's network address with visit displaying instantiation sample through the internet according to the URL that is provided with these instances in real time apace; Make a report on a relevant search list according to domain name or product information intelligently then; And virtual ground " pressing enter key " is submitted a searching request to this network address therefrom.The result of page searching of the search index being made response both possibly comprise the successful precise information of search, also possibly comprise the content of search failure.These page; Having concrete seller is unique product and seller's explanation (unregistered with regard to also having of this existing registration of system), therefore is stored among seller's explanation of offline database (such as a SQL-compliant server or the Microsoft Access database) lining of being safeguarded by the system manager.Seller's URL, upgrading is preferably automatically upgraded in seller's explanation and other information once a day on time.
Second user agent's device of system implementation of the present invention is called as meaning of one's words identification buyer's proxy server (SRBA).When on the WWW, visiting different online multilingual vendor sites simultaneously, this SRBA utilizes the seller who had before been learned by this meaning of one's words identification person of knowing proxy server to explain and searches for a matching value.The product information that this SRBA provides with online user or buyer is intelligently made a report on a seller and is searched for form, and virtual ground " pressing enter key ".The seller then returns to this SRBA with result of page searching through the WWW, and its mode is such: page almost arrives with the page that other sellers return simultaneously.The page that (page of these returns is stored in an independent storer as sampling to this meaning of one's words identification buyer proxy server or cache locations is used for other SRBA afterwards) this SRBA explains to analyze these returns according to corresponding seller; Therefrom extract relevant information and data; With the data qualification of price and model, and through web browser to online user or buyer these pages of the screen display of form on client computer with formative summary.
According to the present invention, the method for the meaning of one's words identification person of the knowing proxy server that is used for a computing machine execution is provided, carry out a conclusion and know.This method comprises the training data of obtaining again to a concrete online seller, so just produces corresponding seller's explanation from interconnected computer network.This method comprises collects the training webpage, uses the training webpage that is provided, and uses the training data in seller's tabulation that is stored in that is provided.The training page that uses this training data and this to extract again, this method comprise that one concludes acquisition method, so that from the information that the training page from training data and extraction again extracts, produce the seller's explanation to concrete seller.
The invention provides a kind of method, be used for seller's explanation of being recaptured and be extracted in the storage of offline database, these sellers explain after a while will to discern buyer's proxy server (SRBA) used for the meaning of one's words.
The invention provides a kind of method, be used for online seller's the product or the price of service are compared.This method comprises: online user's initialization is to a request of specific product or service, and the before definite seller of meaning of one's words identification buyer proxy server utilization explains and creates a plurality of search inquiry parameters then.This method comprises that transmitting some requests gives different online sellers, preferably transmits simultaneously, and utilizes a resolver (parser) that is made up of this seller's explanation from the page that this online seller returns, to extract data.This method comprises that establishment/adjustment is stored and the data of being filtered with the HTML form by a meaning of one's words identification buyer proxy server, and purpose is that these data are presented to online buyer/user.
The invention provides a kind of for following purpose by implementation method, promptly through the analysis of this meaning of one's words identification buyer proxy server from the page that online seller returned to recapture useful data.This method also comprises: recapture seller's explanation from offline database, analyze the page that arbitrary online seller returns 246 current from the WWW networking countries, be used to the useful data of information gathering from this seller who returns explanation.
In one embodiment of the invention, have only after an online buyer registers interim member or long-term member as, above-mentioned functions could realize in this member's webpage.
The invention provides a kind of method, in order to selected all types of information is carried out the real-time online searching disposal through interconnected computer network.This method comprises many steps: be site groups a large amount of in an interconnected computer network explanation of going to the station to meet somebody, comprise in these websites each: (a) URL of this website; The URL search list of this website; (b) the versatility rule that how selected various types of information are organized on this website; (c) from the selected all types of corresponding website of information the sampled data that regains; The domain name explanation of (d) finding at this website; Receive the request of the information of particular type from an online user; Explanation identifies the website of the information that possibly contain this particular type according to website; Be each this website that identifies, utilize this website explanation to create the searching request of the information of this particular type; The searching request of being created is submitted to the website through identification; Receive Search Results from the website of this identification, and after from the Search Results of this reception, finding the accurate match content, extract the information information corresponding with this particular type, and show the information of being extracted to this user with the language of this website self.
More generally, the present invention relates to a kind of on this interconnected computer network the method for real-time online searching disposal.This method comprises the steps: (a) in an offline database, for a plurality of vendor sites are preserved the information from the interconnected computer network; This information comprises URL, search form URL, territory explanation and seller's explanation, and wherein this seller's explanation comprises that such as the product information about each website in this vendor sites be the such generality rule how to organize; (b), use the parameter of the price contrast request that is kept at the information processing target product in the offline database when when an online user and/or meaning of one's words identification buyer proxy server receives price contrast request; (c) extraction real time price and the product information in the website of identification from a plurality of vendor sites, price of wherein being extracted and product information occur with the form of this machine of website (original) language; (d) show price and the product information of being extracted to this user.
Description of drawings
Fig. 1 be in the existing real-time hard coded method for packing from a vendor web site recapture information the example of relevant portion of pcwrapHLRT program.
Fig. 2 is a versatility chart, and it has been described with the mode of WWW or the internet reciprocation between a preferred embodiment of the present invention, user agent's device of the present invention, user or buyer and online seller.
Fig. 3 is that this meaning of one's words identification person's of knowing proxy server (SRLA) utilizes the simplified flow chart 100 of training data with the summary that produces seller's explanation.
Fig. 4 is an explanation that can be included in the various information in seller's explanation according to of the present invention.
Fig. 5 is according to a data instance provided by the invention, and it can occupy in the field of this seller's explanation.
Fig. 6 is a flow process Figure 200, and how this figure concludes effective seller's explanation of knowing and producing the inter-network page or leaf of a versatility with regard to this meaning of one's words identification person's of knowing proxy server (SRLA) general picture is provided.
Fig. 7 provides the example of part of the arrangement of a Website page, its form with to browsing page that last netizen showed was identical, and comprise the corresponding HTML code that is used to generate or define this arrangement.
Fig. 8 provides the example of label, and according to the present invention, this label is used for confirming the pricing information and the detailed position of project of a training webpage.
Fig. 9 is the explanation to the versatility of the content of the label representative that in the training process of one embodiment of the present of invention, uses.
Figure 10 A and 10B provide the possible subsequent use separator example in the example of the training process that Fig. 5 describes in Fig. 9.
Figure 11 is the brief description that a webpage is copied the screen picture, and according to the present invention, this is copied the screen picture and adopts the navigation rule, has the network of a searchable catalogue and product domain name field, in order to easily to visit concrete desired database.
Figure 12 provides the brief description of the screen snapshot of a webpage, illustrates all items is adopted uniform rules, and all items is arranged with a kind of form of simple unanimity.In the framework of this page, comprised the Search Results of institute's Query Information, these Search Results all are able to format uniformly.
Figure 13 is the brief description of this same screen snapshot, and is shown in figure 12, and this figure has illustrated the application of vertical separation rule, and Search Results shows the products catalogue that the central portion branch between being positioned in is end to end arranged.
Figure 14 is a generality explanation according to the operation of the meaning of one's words identification person of knowing proxy server of the present invention.
Figure 15 A is a screen snapshot; What show is the Search Results of going up keyword " electronics " in vendor web site " www.800.com "; On this website; Each product is all summarized by simple place of matchmakers, such as its characteristic and function (left side of listed picture frame and middle part), also has relevant " tabulation price " and " your bid " information; These information all are presented at the right side of listed picture frame, and the intelligent price recognizer of this meaning of one's words identification person of knowing proxy server of the present invention can be distinguished at knowing in the process of seller's explanation.
Figure 15 B is the generality explanation of the running of meaning of one's words identification buyer proxy server of the present invention; It is the operation of knowing process a certain moment visit vendor sites " www.800.com " afterwards that the seller explains; Indicate design as seller shown in Figure 14 and become an integral body subsequently, with Figure 15 B show the same.
Figure 15 C is a process flow diagram 300; It has explained briefly how 20 and seller's explanations of the meaning of one's words identification buyer's proxy server (SRBA) among Fig. 2 carry out reciprocation; So that to one and even whole online multilingual seller who searches, respond online user/buyer's price comparison request.
Figure 16 is an example of playing the part of the person's of knowing interface screen of interactive proxy server role, and it can be used for obtaining training information so that use in the present invention.
Figure 17 is an example that is provided, and the information of wherein training has been submitted to this vendor web site " 1cache.com ".
Figure 18 is a synoptic diagram of this person's of knowing interface screen, and it can be used for showing that the seller explains the information of having known.
Figure 19 is a screen snapshot at this person of knowing interface, has wherein marked " seller's information " label, and through this label, seller's information just can be transfused to and retrieve.
Figure 20 provides the screen snapshot at the person of a knowing interface, in order to show the training example for a concrete seller of previous input.
Figure 21 is a screen snapshot at this person of knowing interface, and when being the file of " seller's explanation " when opening one, it is shown as response.
Figure 22 has explained according to the present invention and has selected knowing option that promptly, " obtaining one " option is shown and chooses, and this seller's who has been filled title is " 1cache.com ".
That Figure 23 shows is the result that seller " 1cache.com " is obtained.
Figure 24 explained solution that simple mode with information extraction limits formulistic encapsulation inductive problem.
Figure 25 provides the pseudo-code of " execHLRT ".
Figure 26 is a module with the straightforward procedure of pseudo-code coding, in order to obtain separator end to end.
Figure 27 A and Figure 27 B provide a table and a subroutine in detail for this learnHLRT program.
Figure 28 has explained according to one embodiment of the present of invention; How buyer or user exchange with server, so that on this server machine, move processing dll file (NextGen.dll) in this through an ASP (Active Server Page) file (NextGen.asp).
Figure 29 has explained a kind of mode, makes the communication between this user and this database server convenient more through this mode meaning of one's words identification buyer proxy server.
Figure 30 provides a detailed process flow diagram, and according to one embodiment of the present of invention, this figure has illustrated how to set up this SQL Sever database.
Figure 31 has explained this meaning of one's words identification buyer proxy server is how key to be passed to an online vendor web site virtually.
Figure 32 is a rough schematic view of main menu screen, representes that one is used for GUI of the present invention or interactive proxy server role buyer/Buyer's view (IACS/BI).It should be noted that in the upper right corner of this main menu screen and select for this user provides " channel " (being kind) of product.What the left side of screen also provided is quick search function.A frame has been designed on the right below it, wherein has a movable part indication online user from key entry how to use quick search option.The screen panel on the left side also provides a cover dialog box, logins with the permanent membership so that interim member is on probation.(please note that most inlet function of the present invention all is deactivated; Differentiated to effective up to user identity) in the lower left corner; Provide a cover to be connected to enter the mouth registered online seller's method of the present invention,, can observe a big message box that is marking " feedback " on the right side; This is used for the online user and imports annotation information through Email to mail server, preferably uses the EMAIL server of the Outlook Express brand of Microsoft.
Figure 33 simple declaration one be used for GUI of the present invention or buyer/Buyer's view screen; Wherein, Company is shown to respond the text icon of government to businessman, and this text icon is clicked in online buyer/user screen (not indicating) formerly.However; Should be noted that; This screen window can not play a role exactly; Reason is that current these companies or so-called government lead to the path of their site databases to the E-business service or the platform seller of businessman for the strictness of member's royalty right has limited these, and method is in the environment of whole closed circuit computer network, to insert an authenticating security interface.
Figure 34 brief description a screen display that is used for GUI of the present invention or buyer/Buyer's view; Wherein this user clicks in Figure 33 screen window after " Advanced Search " option, in the some selections that provided, the details about user-selected company is provided also.Note that, in this screen window, the slogan banner in the framework be positioned at five types the territory under, can see a information " senior agency is in service " with capitalization mark.In addition, the bottom of this screen window provides a user session frame, and this frame can be filled in, so that utilize the functional execution search of meaning of one's words identification buyer proxy server, and this functionally provide by the present invention.In addition; However; Should be noted that; This screen window can not play a role, and reason is that current this company or so-called government lead to the path of its site databases to the E-business service or the platform seller of businessman for the strictness of member's royalty right has limited these, and method is in the environment of whole closed circuit computer network, to insert an authenticating security interface.
Figure 35 simple declaration a screen that is used for GUI of the present invention or buyer/Buyer's view; Wherein, Company is shown to respond the text icon of businessman to businessman, and this text icon is clicked in online buyer or the user's screen (not indicating) formerly.However; Should be noted that; This screen window can not play a role; Reason is that current these companies or so-called businessman lead to the path of their site databases to the E-business service or the platform seller of businessman for the strictness of member's royalty right has limited these, and method is in the environment of whole closed circuit computer network, to insert an authenticating security interface.
Figure 36 simple declaration a screen that is used for GUI of the present invention or buyer/Buyer's view, wherein, provide this user after the screen window of Figure 35 is clicked " Advanced Search " option, about the details of user selected company from the selection that provides.
Figure 37 simple declaration a screen that is used for GUI of the present invention or buyer/Buyer's view, wherein, on this screen, shown selected item and explanation thereof, so that respond to select the user of territory A label.
Figure 38 simple declaration a screen display that is used for GUI of the present invention or buyer/Buyer's view, wherein, listed seller sells each item commodity in the A territory, responds the user who in Figure 37 screen window, clicks " Advanced Search " option.
Figure 39 simple declaration a screen display that is used for GUI of the present invention or buyer/Buyer's view, wherein, the result's that the characteristics of utilizing the meaning of one's words of the present invention identification buyer proxy server search for details is provided.Shopper/Buyer's view is responded the user, and this user provides a searching request through the search parameter interface, and this interface display is in the screen window bottom of Figure 38.
Embodiment
Referring to Fig. 2, a recapitulative chart is provided, in order to explain that at preferred embodiment 10, user/buyer 12 in the present invention, and the reciprocation between the online seller 14 through WWW/internet 16.
In the preferred embodiments of the present invention 10, the person's of knowing proxy server 18 (may also be referred to as the meaning of one's words identification person of knowing proxy server SRLA) and shopper's proxy server 20 (also can be referred to as meaning of one's words identification buyer proxy server SRBA) are provided.To offline database 24 path is provided with a server 22, is storing global multilingual seller information in this database.System manager 26 is selected vendor web site preparation/editor training data, and through server 22 they is stored in " the seller's tabulation " 27 of offline database 24.Then, this system manager 26 uses these training datas, utilizes the meaning of one's words identification person of knowing proxy server 18 to carry out " conclusion is known " from the training page, and this training page is recaptured from vendor sites through WWW 16." conclusion is known " generates seller's explanation, and the form of expression is seller's explanation (tabulation) 28, and this tabulation is stored in the offline database 24.
A user/buyer 12 can utilize the preferred embodiments of the present invention to recapture the appointed information about designated key, and method is to utilize meaning of one's words identification buyer proxy server 20 (SRBA).SRBA 20 is included in the seller of before having been known through utilization the information in 28 is described, handles a request from user/buyer 12.24 of information in seller's explanation allow meaning of one's words identification buyer proxy server 20 side by side to prepare and send search command through WWW 16 fully in a lot of vendor sites immediately.Seller explanation also allows meaning of one's words identification buyer proxy server 20 to handle the Search Results of receiving immediately, and will deliver to buyer/user 12 from the result that whole vendor sites search, and these Search Results have been filtered outside and incoherent information.
With reference now to Fig. 3,, flow process Figure 100 has explained the operation of the meaning of one's words identification person's of knowing proxy server (SRLA) 18 of one embodiment of the present of invention.In a preferred embodiment of the invention, the meaning of one's words identification person of knowing proxy server 18 is implemented with the mode that operates in the computer program on PC or the server.In step 110, the meaning of one's words identification person of knowing proxy server 18 is recaptured training data previous definition or that prepare early from " seller's tabulation " 27, and the place of this list storage is training data storehouse 24.Training data storehouse 24 preferably is in off-line state.
This training data comprises the one group of data that belongs to online seller, and from these group data, information will be learned.These data possibly comprise URL, territory explanation, outturn sample, the concrete information of attribute and other territories, shown in following right row:
Figure BDA0000134717110000141
According to the present invention, Fig. 4 provides type and an example of explanation of the name label of training of the present invention and " knowing " data.Fig. 5 is the illustrative example of the form of the reality " data element " known, and this data element results from the process that this seller's explanation knows, and is as shown in Figure 4, and is stored in the seller's explanation in the offline database, safeguarded by system manager 26.
Should " training " data preferably be stored in a SQL-compliant or the Microsoft Access database.So this has just increased the leeway that different sellers selects the data container.Usually, this training data is independent of this product domain name, the character that is write and online seller's the style that appears.This URL path in the training data is an exception, and it is asked to confirm uniquely different sellers.
Get back to Fig. 3,, forcibly stipulated an inspection, understand the meaning of one's words identification person of knowing proxy server 18 and whether need be familiar with more seller in step 120.If there is unsettled seller to be familiar with, the meaning of one's words identification person of knowing proxy server 18 will carry out step 130; Otherwise, know that dialogue will stop.In step 130, through using predefined training data, the meaning of one's words identification person of knowing proxy server 18 is visited the online seller of appointment intelligently, and this predefined training data meets with it.For the specific product of each appointment in this training data, the meaning of one's words identification person of knowing proxy server 18 is through searching this specific product of characteristic search of this vendor sites.Usually; The meaning of one's words identification person of knowing proxy server 18 is recaptured some pages of training data; That this training data will be learned from system of the present invention, or will from the manual input of system manager, learn; Such page is known as " the training page ", and they will be used to carry out the process that method of induction is known after a while.In the preferred embodiment, also comprise control data (that is this vendor sites lead to errors the training data of the page) in this stage.
Then, in step 140, process is known in computer program conclusion of execution on the training page that is obtained by the meaning of one's words identification person of knowing proxy server 18.The purpose that conclusion is known is the common explanation that obtains this website, and understands fully that it is the data of how organizing these product data and presenting this product according to certain logic to potential online client.This product of knowing is called " seller's explanation " this stage of 28-will be further specified and explained according to Fig. 6.
Then, in step 150, the meaning of one's words identification person of knowing proxy server 18 stores the result that this quilt is known, preferably is stored in SQL-compliant or the Microsoft Access database 24.(the seller's information that is stored in offline database 24 perhaps " seller's explanation " will be 20 uses of online meaning of one's words identification buyer's proxy server after a while).After storing step 150 was accomplished, the meaning of one's words identification person of knowing proxy server 18 is got back to step 120 will know that to have judged whether more seller if having, 130 to 150 step is repeated.Otherwise, know termination of procedure.
Process is known in seller's explanation
With reference now to Fig. 7,, seller's explanation knows that process will be adopted a kind of simple mode of information extraction and the training page embodiment of a simplification by explanation in further detail.The left side of Fig. 7 shows the arrangement of this model and pricing information, as it to shown in the potential customers that browse vendor sites.HTML (HTML) coding that produces this data ordering has been indicated on the right side of Fig. 7.For example, the right side first three rows (3) that is positioned at confirms that this sign indicating number is a HTML, and title-" the simple products catalogue " of these row is provided, and indicates the beginning of this information that will be shown.Fourth line (4) provides the content of text of the title of form-" MD price ".The 6th row (6) and the 7th row (7) then provide content of text according to the row name of " model " and " price (dollar) " respectively.Eighth row to the 11 row (8-11) then provide model and pricing information.Remaining row is confirmed following message: the end that the row of form end, form and this products catalogue are whole.
At first, encapsulation function is that the training page that is given produces a cover label.A label is used to confirm the position in the training product information of the training page.For the simple products training page shown in Figure 7 is described, Fig. 8 has described a cover label that is produced by the meaning of one's words identification person of knowing proxy server 18.Indicating of label among Fig. 8, the simple products catalog page of Fig. 7 comprise four (4) " tuples ", and each tuple comprises " project " value and " price " value again.Each value is represented by a pair of integer.
Consider first logarithm value, < 174,180 >, the attribute that these integers indicate first tuple is exactly the said substring between 174 and 180, and promptly character string ' HM381MD ' is between position 174 and position 180.In this example applied, the quantity of the character that the meaning of position is promptly counted from the starting point of appointment is such as the beginning from a page, or the end of a page " head ".Space between the text character counts according to the position of a character.Controlling chart 7 can be seen, the letter " H " in the word string " HM381MD " from first trip symbol " < " comes across 174 character positions, and " D " in this word string comes across character position 180.Likewise, last " tuple " or integer show that to < 356,361>last price attribute comes across between character position 356 and 361, and confirm word string " 399.95. ".Be appreciated that.In this example, though character position is used for confirming " label " that other standards also can be used for design according to the present invention and use.Other standards can be used with the mode that the inventive method is used.For example, controlling chart 7 once more, should consider, the meaning of one's words of Fig. 2 identification person of knowing proxy server 18 to model with " your price " although be that automatically value assigned-it shows as them the appearance that is appeared in the following formula to four electronic products:
So " if b " representative beginning, " e " represents end, confirms that so the array of the second tuple position has just comprised word string b_, i, that is the numerical value of model " M, " beginning, on the contrary word string e_, i are exactly the numerical value of model " 0. " ending.Similarly, just be appreciated that the present invention makes the mark robotization be able to implement, method is to launch a tentative search of modularization, and this search is based on the standard relational data model.The present invention includes an item recognition device and an intelligent price recognizer, wherein, a tuple is times without number as the vector <b2, i, b2, p>in two word strings.Word string b_, i are the values in the item attribute, and word string b_, p are the values in the price attribute.Therefore, attribute representative row, and tuple representative row." b " and, " " between this digital value " b2, i " mean the calculating (i.e. mark) of a position-position numerical value of second row; Therefrom the encapsulation of knowing that seller explanation comes into force conclude call during; Be able in real time, automatically, synchronous operation ground carries out, and also confirms to give whole Ppc (the products catalogue page-be the page of " P " be the webpage that comprises the information that requires) labelled and webpage (in this example, i.e. www.800.com) that no matter comply with this vendor sites form is this machine (native) character string with which kind of language simultaneously; Perhaps with natural language, or with HTML; XML, cXML, Java or the like coding.
Content tab more generally as shown in Figure 9 of the training page.In first, by labels with information be able to the affirmation, in this embodiment, product and price are target information to be searched.In secondary series; Some inlets that should " tuple " to should " label "-just like < PRODUCT LEFTDELIMITER (product left separate sign); PRODUCT RIGHT DELIMITER (the right separator of product)>and < PRICE LEFT DELIMITER (price left separate sign), PRICE RIGHTDELIMITER (the right separator of price) >.
After system administration manager is carried out and to be known that system is once, at one group of possible subsequent use separator of editor in case with Figure 10 A and Figure 10 B in possible this organize in the synchronous process of the editor of subsequent use separator, it just the seller from offline database recapture the training page in tabulating.Then, it utilizes other one group of training page synchronously automatically (mark) executive real-time, synchronous calculating, and comprises positional value, and is the same as embodiment recited above.Two subsequent use group common factor produces an effective subsequent use set, and the meaning of one's words identification person of knowing proxy server 18 is just selected one as seller's explanation from this effective subsequent use set then.
With reference now to Fig. 2,, flow process Figure 200 has explained an embodiment of the meaning of one's words identification person of knowing proxy server 18 of the present invention.The method has been used 3 environment property rule, arrange the layout of the product details (explanation) that shopping website provides with this rule, and this website permission information extraction goes on with being independent of this territory.But this systematicness comprises navigation rule, uniform rules and the vertical separation rule that has search index.
For the navigation rule, online shop or vendor sites are assigned to consumer and commercial buyer inquiry service are provided.So nearly all online seller provides the index that can search for to be beneficial to visit concrete specified database.Use the searchable format of vendor sites to make the meaning of one's words identification person of knowing proxy server 18 be able to summarize the form of multilingual homepage and web page format property.Figure 11 is a reduced graph, but a homepage that has search index and product territory field is described.
For uniform rules, though with regard to the product details in online seller or shop catalogue form, differ greatly, any given online seller can be according to all items explanation of arranging of simple and form unanimity.Figure 12 is a webpage screen picture reduced graph, and it has explained the design form of consolidation form Search Results.So, can find out clearly that each Search Results that is listed is beginning with " model (MODEL NUMBER) " word string all, its next providing " description of product (PRODUCT DESCRIPTION) ".In addition, " price 1 " and " price 2 " is positioned at the right side of " description of product ".
Figure 13 provides the reduced graph of a same search result screen image, has wherein explained and has adopted vertical isolation rule to show products catalogue.This vertical format can be divided into several kinds at beginning, content and the end of document.
Performance as the information fundamental mechanism-website architecture of internet initial design thought, online seller's description of product and technical skill all makes up with the mankind's the purpose that is used for.This is conspicuous in the application of inquiry mechanism and the outputting standard that specifically is applicable to human direct control.Online seller observes these rules, and reason is that these rules are achieved the online sales to buyer and shopping people.Though dare not say that the so a kind of instrument that guarantees to make online shop to let people be able to carry out expediently the network navigation will make people feel warm and be suitable for so that let a kind of intelligence software proxy server grasp; But the comparison of the on-line intelligence information of multilingual electronic data sources-that is this system of the present invention, design in line with making full use of these rules.
According to the present invention, the structure of encapsulation knows that through a conclusion process accomplishes.This method is to carry out reasoning through the appearance page or leaf to seller's webpage to know seller's encapsulation.In method base reason of the present invention, " instance " be the corresponding seller page (instance), corresponding its relevant content of the label of a page, the then encapsulation to making up of supposition so.
In addition, according to the present invention, in conjunction with the encapsulation class that can effectively know, such as HLRT encapsulation class.
And, be able to use well in order to ensure this method, when training data demonstrates very high interference, adopted anti-perturbation technique.Such as, in Figure 15 A, to the screen snapshot example of www.800.com, an intelligent price recognizer just can be distinguished " tabulation price " and " your bid ".This object that is identified is identified with that and marks full page.Consider that a recognizer is used for product item, another is used for price, and the result of affirmation has just produced a mark mode, and mark comprises the page that this generic attribute is right in this way.
In fact, the seller always wants through using unified outward appearance to come to create for all types of products a kind of sensation of unanimity.For one thing, a seller is to provide a kind of MD product information with the same form of DVD product.Should rule through utilizing, each product all is able to explanation roughly with same form.
The meaning of one's words identification person of knowing proxy server 18 among Fig. 2 has only been known an encapsulation from the concrete territory of a sample; And attempt this territory is applicable to every other territory (every other products catalogue all has the diverse project that exists); These territories are on all the other websites of 245 networking countries on the WWW, organize with online classification with the form of unanimity.So; This meaning of one's words identification person of knowing proxy server 18 will be safeguarded a nomenclature of upgrading global product database completely among Fig. 2 of the present invention; Neither on the DBM of a commensurate structure query language, encode simultaneously; Again need not be to the Access of Microsoft database artificial each product of input in each domain name, it is feasible doing like this.
Continue Fig. 6 is discussed, in step 210, the meaning of one's words identification person of knowing proxy server 18 has produced a whole set of label and has showed the content of training the page.In other words, the method base of labeling position value reason is the position that will train the training product confirmation of the page for this.And, the meaning of one's words among Fig. 2 identification person of the knowing proxy server 18 generation label that automatically is in operation in real time, this label has comprised positional value, as follows:
<math> <mrow> <mi>L</mi> <mo>=</mo> <mfenced open='{' close='}'> <mtable> <mtr> <mtd> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>></mo> <mo>,</mo> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>></mo> </mtd> </mtr> <mtr> <mtd> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>2</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>2</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>></mo> <mo>,</mo> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>2</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>2</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>></mo> </mtd> </mtr> <mtr> <mtd> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>3</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>3</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>></mo> <mo>,</mo> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>3</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>3</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>></mo> </mtd> </mtr> <mtr> <mtd> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>4</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>4</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>></mo> <mo>,</mo> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>4</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>4</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>></mo> </mtd> </mtr> </mtable> </mfenced> </mrow></math>
In order to further specify, please see the right row of Fig. 8.
In the step 220, the meaning of one's words identification person of knowing proxy server 18 a whole set of possible subsequent use " seller's explanation " of label output that utilization is relevant on the page of recapturing is carried out to conclude and is known.Because these spare contents are to produce from the concrete training page through concrete training data, these spare contents can not be invalid to these pages.However, if these spare contents are effectively in the whole website of seller, stride the page and confirm that (validation) will be performed deriving seller's explanation of versatility for one so, and this explanation all will be effective in whole website.
In step 240, a seller explains validator (VDV) to other one group of training page, possible subsequent use seller's explanation (in Fig. 3 step 130, recapturing) of demonstration validation.If it is satisfactory that seller explains, this knows that process will stop, and like step 250, otherwise proof procedure will utilize remaining spare contents and circulate through step 230,240 and 250, continue from some sellers' explanations, to select.A standard of seller's explanation " satisfaction " is that each spare contents quantity that is able to continuously the training page of analysis is identical with the previous training page with character.If one is analyzed the spare contents that the training page has varying number continuously, will analyze another training page so.
Figure 10 A and Figure 10 B are the examples that the seller explains spare contents, and the beginning (head) and the ending (afterbody) of left and right sides separator, pricing information and a training page of commodity item explanation are provided.Fig. 5 provides the example of " seller's explanation ", and it has comprised being the separator that the beginning of an example page, ending, project and pricing information are confirmed.
The training data that is used for a concrete seller is preferably by these system manager's 26 compilations.As with further explain; For the system to the present invention adds a concrete seller; Seller's title of corresponding training example, seller URL, the URL of institute's rendering tables, numeric field data all is provided and be stored in the offline database 24, and this database can be a Microsoft Access database.This seller's title will be the primary key content of record.Artificial encapsulation input also can be provided as a selection.In order an accurate data set to be provided to the training example; System manager or other people that prepare the training example data will have a wide range of knowledge aspect being provided with in webpage URL address, domain name, will have a wide range of knowledge and can confirm that information type just seems very important to know as purpose the native language that is used for any multilingual vendor web site of handling, and these data sets will greatly improve conversely by the meaning of one's words identification person of knowing proxy server 18 and prepare in real time, automatically and be in operation and produce the precision and the efficient of seller's information of knowing when the seller explains.This people need not understand too many coding.
In case seller's information is provided, keeper 26 just can move the meaning of one's words identification person of knowing proxy server program to each seller.The keeper to a seller move the meaning of one's words identification person of knowing proxy server 18 once after; So in the option of any requirement of operation; Like Figure 16 to 23, he or she are just through knowing that program is along with the interactive proxy role person of knowing interface (IACLIs) screen picture carries out network navigation length by length.At last, " seller's explanation " result of recapturing of the return page of the consequent training of hanging oneself will be stored in the offline database 24, and this database can be a Microsoft Access database for example.Delete/eliminate a concrete seller, the keeper can be directly from " seller's tabulation " perhaps deletion record " seller explains tabulation ".Revise/edit a concrete seller, " the seller's tabulation " that the system manager can be from database be amendment record in " seller explains tabulation " perhaps.
In brief, the meaning of one's words identification person of knowing proxy server 18 of the present invention has generated seller's explanation, is unique to a concrete seller.Seller explanation is exactly should cover generality rule, and this overlaps seller of regular guidance like its product information of concrete format organization how.So, the present invention is encapsulated the data that construction system is imported, be the behavior sample that encapsulation is known in essence.Under this pattern, the encapsulation framework has become the process based on the foundation encapsulation of its behavior sample of rebuilding.
Methodology example through a simplification in Figure 14 of the meaning of one's words identification person's of knowing proxy server (SRLA) 18 is summed up.In the step 1; Article two, information is fed to this system so that carry out this encapsulation conclusion: URL of (1) this vendor web site (as: http://www.800.com) and the explanation of (2) territory, and it has comprised the training example in some concrete territories; For example; It possibly be " electronic product " and a domain name record that is called " Sony HM381MD " that a domain name is described, and this promptly is used to make a report on the model that the seller searches for form.In step 2 and 3, URL address and the domain name/model that the meaning of one's words identification person of knowing proxy server 18 utilizes this training example automatically from the internet to this vendor web site.To concrete example, this person's of knowing proxy server goes to the www.800.com webpage according to the URL address that provides in the step 1.It will make a report on necessary product information (for example territory explanation-" electronic product " and " HM381MD ") in relevant search form then.At last, it will " be presented " the search form and once search for and await a response with request.
About step 4, a page is returned according to this search criterion.This possibility of result is the page that a successful area the Related product explanation, also possibly be a failure page.Can notice, returned content important in the page and be HTML coding, the project description, item price and about the information position of HTML code.
In step 5 and 6, result of page searching is returned the person's of knowing proxy server 18 so that analyze through the internet.In step 7, this analysis that is performed is called " encapsulation is concluded ", and wherein this page is summarized as the rule of a whole set of layout and form, and this rule is followed for the seller and presented its description of product with a reasonable manner according to this.With these rules, in meaning of one's words identification buyer proxy server operation of the present invention, when a user/buyer searched for some product informations the same territory on this vendor sites, buyer's proxy server 20 can extract product information from same seller.
It is understandable that according to the present invention, the program of the meaning of one's words identification person of knowing proxy server will be activated to respond each seller, seller's explanation is asked to from it just.Because the information separator method that the present invention uses; Seller explanation can obtain from any vendor sites with any language-briefly; Though present user's language can be the form of concrete this machine character string; To the information that is asked to, the basic coding that can be confirmed to be separator remains unchanged, and no matter what form is this machine character string of this language be.In other words, the information of seller explanation will obtain from the vendor sites of the native language of a use.There is no need native language translation is a kind of standard language.And, is compiled with the basic coding of the programming language that is used for this website because be used for the subsequent use separator through confirming of each vendor sites, so search does not need the employed various programming language of website to be searched just can accomplish successively.This just allow meaning of one's words identification buyer proxy server 20 in multilingual and multiple territory (products catalogue) basic enterprising line search, and have nothing to do with any programming language.
Referring to Figure 15 C, process flow diagram 300 has been explained an embodiment of meaning of one's words identification buyer proxy server 20 of the present invention.In step 310, the user/buyer 12 of buyer's proxy server 20 from Fig. 2 accepts a request, and this user/buyer hopes the price of a product is compared.In step 310, buyer's proxy server 20 is also preferably set up with an ActiveX assembly and is connected so that communication with buyer/user 12.User 12 must provide at least one parameter, and this parameter can comprise, for example, and the online seller of scope, target or the criteria for classification of the title of target product, the price that requires.Check in step 312 check in internal memory perhaps in the cache memory whether any " hits (hits) " is arranged, this internal memory or buffer memory comprise desired information in the vendor sites of confirming; If buyer's proxy server will arrive step 370 so that the target information of being extracted is classified.Buyer's proxy server then proceeds to step 380, and with the page that from the target information of HTML form, bears results, then in step 390, buyer's proxy server will be shown to on-line purchase people/end user to page.
If do not find any hits in step 312, step 320 will activate purchaser's proxy server 20, use input parameter to extract seller's explanation from offline database 24.These " seller's explanations " are that the meaning of one's words identification person of knowing proxy server 18 is confirmed in the process of knowing seller's descriptive information.In step 330, the buyer acts on behalf of 20 devices will work out the different online seller that new user's request visits affirmation in " seller explains tabulation ".Data during the new user request of being worked out will be explained with the seller based on the given parameter of user.If to N the online seller request of sending (such as the product type request), meaning of one's words identification buyer proxy server 20 is just worked out N new request so.
Meaning of one's words identification buyer proxy server 20 uses seller's explanation to obtain pricing information in real time from vendor sites.Buyer's proxy server 20 uses seller's URL and seller's name word access vendor sites, and this seller's URL and seller's name are included in the information that constitutes seller's explanation.The search form URL that in seller's explanation, also comprises the seller.In step 340, after having access to vendor sites, meaning of one's words identification buyer proxy server 20 is filled in the seller based on user's new request " virtual " and is searched for form, and " virtual " presses enter key and send it.Online seller to each affirmation will carry out this work.
As stated, seller's explanation of leaving in the offline database 24 comprises a field, the information that this field provides the seller to search for form URL, such as " http://www.onlineshop.com/search.asp? item=. ".This meaning of one's words identification buyer proxy server 20 uses user's input parameter and search form URL to come to generate new HTTP request for the online seller of each identification.Such as, if the user need buy one " hard disk ", the new request that this meaning of one's words identification buyer proxy server 20 generates is following:
“http://www.onlineshop.com/search.asp?item=harddisk,”
Meaning of one's words identification buyer proxy server 20 will send to online seller to this HTTP request, just look like that directly the transmission request is the same for user oneself.If the seller of N identification is arranged, will to start N thread be that the seller of each identification fills in the search form to meaning of one's words identification buyer proxy server 20 so.Meaning of one's words identification buyer proxy server 20 is preferably handled each online seller's search list index concurrently, fills in the back and sends searching request.
In step 350, meaning of one's words identification buyer proxy server 20 will be waited for the response from online seller in specific time or user-defined time.If overtime, meaning of one's words identification buyer proxy server 20 forwards step 370 to; Otherwise enter step 358 and further handle the search result data that is received with 360.
At wait timeout in the time, meaning of one's words identification buyer proxy server 20 collect search from the online seller of difference to request responding.In step 358, meaning of one's words identification buyer proxy server 20 receives from online seller's Search Results response, and it is stored in the buffer memory or internal memory of server 22.In step 360, interested data are extracted from the response that receives.Meaning of one's words identification buyer proxy server 20 extracts desired datas, use leave in the seller explain 28 or offline database 24 in seller's descriptive information.For example, this seller's explanation comprises the field of the code of the identification left side and the right wrapper.At first, meaning of one's words identification buyer's proxy server (SRBA) 20 will use left side wrapper information to come the starting position of location response page valid data.Then, will be extracted and store into internal memory in the data of the extracting position of target data (explaining that by the seller list information confirms).(note, the information in seller's explanation be the meaning of one's words identification person of knowing proxy server 18 during the knowing of Fig. 3 and Fig. 6, obtain).Repeat to extract target information up to this page end.
In leaching process, the description of product and product price will be extracted.The seller's descriptive information that should be appreciated that 18 definition of the meaning of one's words identification person of knowing proxy server is that the territory is independently with multilingual relevant.Such as; The platform of supposing online user or buyer uses Windows98 operating system; Operation language version B (or its platform preferably moves " B " version that personal web server has been installed on the English professional version of Windows2000 and/or its platform), Microsoft Internet Explorer will point out user's download " B " language software for display when the user signs in to porch of the present invention.On-line purchase person or user " A " with the input of this machine character of " B " language as the product type that is input to the key word in the text box that porch of the present invention provides after; The meaning of one's words among Fig. 2 identification buyer proxy server 20 will use the sample data described in advance (sample data is to induce the seller who extracts after knowing to explain at wrapper in real time) execution data extract, and this sample data comprises the product type character string of writing with language " B ".These sellers explain that leaving the relevant seller of offline database 24 (preferably Microsoft Access database) in explains in the predefined data structure of tabulation.Data extract comprises simultaneously search " hits ", and--it comprises price, explanation, resides in the internal memory or buffer memory of server 22 with the Search Results relevant product information----of front; Use this machine character string of " B " language of product type; This character string is imported by the user; Like Figure 15 C, step 312.Because " B " language character string is specific native language, so all will arrive the vendor sites of confirming to any " hits " that finds, this website uses native language " B ", and the character string write of useful " B " language.
The step 7 of Figure 14 again; During knowing process; The meaning of one's words identification person of knowing proxy server 18 among Fig. 2 is after online seller is known encapsulation, and the seller that database 24 (preferably Microsoft Access database) or database server 22 (preferably Microsoft SQL-complaint database server) are left in seller's explanation in explains in the tabulation.Because as far as meaning of one's words identification buyer proxy server; Extract seller's declarative data according to on-line purchase person's request is poor efficiency at every turn; So when meaning of one's words identification buyer proxy server 20 set that request search-coupling-extraction is expected for the first time, preferably from offline database 24 or server 22, extract seller's explanation.After this, seller explanation will be left in internal memory or the buffer memory so that extract and use immediately in other requests from now on, and other requests are used for same or new online user from this meaning of one's words identification buyer proxy server.
Seller in internal memory or buffer memory explains and upgrades once automatically preferred every day.
In other words, meaning of one's words identification buyer proxy server 20 can use the data based different territory in seller's explanation to come the localizing objects data with different language.This is because as far as certain specific seller, although language possibly change, the bottom code of corresponding target information can not change.Since three " format standard " leading most of vendor sites, such as B-to-C, C-to-B, the online warehouse of C-to-C etc., form that the not same area of vendor sites is same with the use of unanimity and bottom code provide target information, such as the project description and price.
Therefore, to each search response that returns, meaning of one's words identification buyer proxy server 20 will utilize the seller that the execution data extract is described.If the time exceeds, then meaning of one's words identification buyer proxy server 20 will jump to step 370, Figure 15 C.In step 370,20 pairs of meaning of one's words identification buyer proxy servers are never classified according to user-defined criteria for classification with the data that online seller extracted.If the user does not have the defining classification standard, default will be the price of product.In addition, criteria for classification can be the best price that identification is found, and only there being the seller's information that contains best price to offer user/buyer's (certainly, also can use other ordering rule).
After ordering is accomplished, meaning of one's words identification buyer proxy server 20 will arrive 380.In step 380, meaning of one's words identification buyer proxy server 20 is based on the data generation html page through filtering and classifying from step 370.In step 390, the 20 response user requests of meaning of one's words identification buyer proxy server, the ActiveX assembly that uses the front to set up offers the user with the html page that is generated as " result " page.
If do not have overtimely in step 350, meaning of one's words identification buyer proxy server 20 will jump to step 358.In step 358, to internal memory or buffer memory, being convenient to make an immediate response, same user/buyer further newly asks meaning of one's words identification buyer proxy server 20 or new user/buyer's request with the page data storage of being inquired about.After step 358; Meaning of one's words identification buyer proxy server 20 will jump to step 360, and wherein it extracts target information from the page of being inquired about, in step 370 pair result's classification; And according to user-defined contingency table to never with the data qualification of the extraction that online seller recaptured; Generate html page in step 380 according to step 370 data of filtering and classifying then, and the ActiveX assembly of before step 390 is used, setting up at last, response user/buyer.
The default language of meaning of one's words identification buyer proxy server 20 is an English.Through acquiescence, when receiving the user when asking, meaning of one's words identification buyer proxy server 20 will forward all sellers to.When response was returned, meaning of one's words identification buyer proxy server 20 used the seller of having been known by the meaning of one's words identification person of knowing proxy server 18 to explain invalid result is filtered out.
In another embodiment of the present invention, the seller can be according to user's local condition classification, so that user 12 can select " Advanced Search " to search seller's sorted group.
The method that the present invention adopted is multilingual in essence.When the meaning of one's words identification person of knowing proxy server 18 is known vendor sites, thisly know and to carry out according to the native language of this website.The result who is recaptured is the native language of this website, and this result is used to constitute seller's explanation.Therefore, ask when online user/buyer 12 sends in Figure 15 C step 310 with specific native language, meaning of one's words identification buyer proxy server 20 in step 312, is searched " hits " with the character string of the former state of using user's input in internal memory or buffer memory.Because character string will be represented with specific native language,, and same character string is arranged so the vendor sites of any " hits " that find identification will be used same language.According to the method, should be appreciated that in the present invention and do not need " translation " step to convert the searching request of a certain language into " standard " language.Through using the searching request of native language, the mistake in translation process just can be avoided with fuzzy.
The computer program module that the developing instrument with database server that in the preferred embodiment of system of the present invention, adopts (preferably Microsoft SQL-comppliant database) is set up is a standard; The present invention can use any relational database; Such as from the Oracle company of California Redwood Shores, from the sql database server of the Sybase company of California Emeryville etc., and other databases of supporting ODBC.As stated, as far as preferred embodiment of the present invention, it is crucial that multithreading is searched for simultaneously.In this respect, use Windows NT 4.0 platforms (product of Microsoft) that this multithreading ability can be provided.
Referring now to Figure 16 to 23,, these figure illustrated in detail write and prepare the data that the seller explains in training data and " seller explains tabulation " 28 in " the seller's tabulation " 27.Figure 16 is the actual screen of the interactive proxy role person of knowing interface (IACLI), and it can be used to obtain to be used for training information of the present invention.Use " increasing seller (Add Vendor) " the corresponding screen of label (tab) that shows with Figure 18; Provide a data input point to be used for importing the information that is obtained by the system manager, these information are browsed vendor web site " 1cache.com. " the system manager and are obtained afterwards.An example is provided in Figure 17 like this, and wherein these relevant sellers' " 1cache.com. " information is transfused to.This information comprises the content shown in the right hurdle of following table:
Figure BDA0000134717110000281
After this, above-mentioned information is stored in the seller and tabulates 27, as the training data in the offline database 24.Be noted that the training instance of having imported is the tabulation of specific products, these data will be by real-time search, to its training page or leaf of vendor sites acquisition of identification in training process.After this " seller's explanation " could be from the training page that returns " knowing " to.
Then, information is displayed on the screen, and is shown in figure 18, corresponding with " seller's information (Vendor Information) " label." seller's information " screen interface (shown in figure 19) provides " search (the Search) " function to this seller's title.Through input seller title, press " search " button, seller's information of the seller who is imported will be extracted from offline database 24, and show.On this " seller's information " screen, note " wrapper " field-----" head (Head) ", " afterbody (Tail) ", " project left side delimiter ", " the right delimiter of project ", " price left side delimiter ", " the right delimiter of price "---be empty.These wrapper fields are still waiting " knowing ".
Figure 20 provides the screen snapshot at the person of a knowing interface, the training example of importing for certain seller before being used to show.This screen will show behind the file of opening " training data " by name." seller's information " screen at the person of knowing interface has a function of search, is used for search " training data ".Use the training data function of search, the system manager need import seller's title and by " beginning (Go) " button.Can be presented on the screen for the training data inventory of specifying seller's input in the past.Should note; " training data " interface also provides other functions, such as " adding (Add) " (adding more examples), " deletion (Delete) " (deletion training example), " editor (Edit) " (editing certain training example), " preserving (Save) " (the example inventory is saved as current state) and " cancellation (Cancel) " (change that cancellation has been imported).
See Figure 21 now.Behind the file of opening one " seller's explanation " by name, the screen snapshot at the person of knowing interface can be presented on the screen.This interface begins the process of " knowing " seller explanation; And make the system manager can select to know description, or know description in the frame that provides to " one " seller according to seller's title that the keeper imports to " owning " seller (its training data is transfused to).
Figure 22 shows that " knowing one (Learn One) " option is selected, and seller's name of being imported is called " 1chache.com ".When system with after the WWW is connected; Press " beginning to know (Learn Now) " button and act on behalf of (SRLA) 18, and use example of knowing to this seller's appointment in the training example is known on its website about specifying seller's " 1chache.com " information in real time to start the meaning of one's words identification person of knowing.
Knowing/after training process is accomplished, the training of returning from its network address or know that example results will be displayed on the screen at the person of knowing interface (shown in figure 23).In addition, show this information, the system manager can use the function of search on Figure 19 seller's information screen, also presses search button with input seller's title (being " 1chache.com " in this example).Figure 23 shows the result of knowing to seller " 1chache.com " simultaneously.Should notice that " wrapper " field accomplishes now.In addition, " head " displayed value of this page is " 5230 ".Value " 5230 " can be confirmed position or other information about the position of row and character." afterbody " shows the position of the project of following delimiter sign:
“D></TD><TD></TD></TR></”
For project description information, left delimiter is identified as following character string:
“G?SRC=/Lmg/trans+1X1.gifBORDER-0WID…”
The right delimiter of the project description is identified as character string: "</b>"
The left delimiter of price is identified as following character string:
“</b></A></TD><TD?ALIGH=right><FON…”
At last, the right delimiter of price is identified as character string: " T "
Although the character string of Figure 23 item left side delimiter and price left side delimiter is blocked owing to the static demonstration at the person of knowing interface seems; Know; Act on behalf of all characters in the left delimiter characters string of 18 signs by the meaning of one's words identification person of knowing and to be existed the seller to explain in 28 (preferably to be stored in the Microsoft Access database), and will use by meaning of one's words identification buyer proxy server 20 afterwards.
Now will from the angle of " notion proves (proof-of concept) " more particularly predicate consciousness do not know that the person acts on behalf of 18 basic method of application.
Key concept
Shown in figure 24, the wrapper inductive problem is designed to a simple information extraction model.
Shown in figure 24, PAGE P is the webpage that comprises information needed.P is the character string on some character.Generally speaking, character is an ascii character-set, and PAGES is a html file.By way of example, the very simple page of the Fig. 7 shown in the front for obtaining from seller's network address.According to " mark term (labeling terminology) ", this page or leaf will be called as Ppc (the products catalogue page).Note that enlightenment or inspiration that method of the present invention is used by HTML, but and do not rely on the application of HTML.For example, the page possibly be natural language text or the text that conforms to the XML standard.
System adopts the relational data model of standard.With two relevant particular features of each product record is project and price, the wherein title and the model of " project " representative products, and " price " represented the price of product.
" tuple (tuple) " is the vector of two character strings < Ai, Ap >.Character string Ai is the value of " project " attribute, and character string Ap is the value of " price " attribute.In view of the row in attribute representation's relational model, " tuple " expression row, therefore as shown in Figure 8, include in the products catalogue page example among Fig. 7 four " tuples ", first is < `HM381MD`, `399.95` >.
The content of the page is a group " tuple " that it comprised.Such as, although enough text strings notes are arranged, because therefore not restriction of page length uses a kind of clearer, more succinct content of pages expression way." label " of the page is to be used for representing the content of pages relevant with a group mark of the page, rather than is used to itemize attribute.
For example, " label " Lpc that is used for simple products catalog page (Ppc) is shown in one row of Fig. 8 right side.
" label " Lpc representes that the simple products catalog page includes four " tuples ", and each " tuple " is made up of the value of project and price.Value of a pair of integer representation is like first group < 174,180 >.The attribute of these first tuples of integer representation is the substring between position 174 and the position 180, i.e. character string " HM381MD ".Character string through controlling chart 7 right sides can find that these integers are with " position of the character of < " is corresponding since first row " < HTMI>".Equally, last of the 4th " tuple " occurs between 356 and 361 the price of integer < 356,361>expression last attribute, i.e. character string `399.95`.
More often than not, the content of page P can be represented by label L.
<math> <mrow> <mi>L</mi> <mo>=</mo> <mfenced open='{' close='}'> <mtable> <mtr> <mtd> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>></mo> <mo>,</mo> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>></mo> </mtd> </mtr> <mtr> <mtd> <mo>&amp;CenterDot;</mo> <mo>&amp;CenterDot;</mo> <mo>&amp;CenterDot;</mo> </mtd> </mtr> <mtr> <mtd> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>></mo> <mo>,</mo> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>></mo> </mtd> </mtr> <mtr> <mtd> <mo>&amp;CenterDot;</mo> <mo>&amp;CenterDot;</mo> <mo>&amp;CenterDot;</mo> </mtd> </mtr> <mtr> <mtd> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>></mo> <mo>,</mo> <mo>&lt;;</mo> <msub> <mi>b</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>p</mi> </mrow> </msub> <mo>></mo> </mtd> </mtr> </mtable> </mfenced> </mrow></math>
For the page that has only " tuple ", adopt following label:
L={<<b 1,i,e 1,i>,<b 1,p,e 1,p>>}
Label L is used for the content encoding of page P.The page comprises | and L|>0 " tuple ", each tuple has two attributes, i.e. project and price.Integer 1<m<| " tuple " in the L| representation page.Each is used for the coding of an item value to <bm, i, em, i >, and each is the coding of a price value to <bm, p, em, p >.In P, value bm, i is the beginning flag of an item value among the mth " tuple "; Value em, i is the end mark of an item value among the mth " tuple ".Equally, in P, value bm, p is the beginning flag of a price among the mth " tuple "; Value em, p is the end mark of a price value among the mth " tuple ".The item attribute of mth " tuple " appears between <bm, i, em, i >, and the price attribute of mth " tuple " appears between <bm, p, em, p >.So the logarithm value in the example shown in Figure 8 <b2, i, e2, i >=< 229,234>is the coding of second " tuple " second (project) attribute in the simple products catalogue of the page among Fig. 7.
As implied above, wrapper W is the function from the page to the label; The result who calls wrapper W on note W (P)=L representation page P is label L.In this extraction aspect, wrapper is a step arbitrarily.
A wrapper classification comprises one group of wrapper.Can see that following the wrapper that the present invention adopts is a HLRT wrapper classification.
Through above to term explanation and to the description of method therefor of the present invention, will explain further now how the person of knowing knows the encapsulation of seller's products catalogue page.
From directly perceived, input the present invention knows that the content in the system is products catalogue page sample and relevant " label " thereof.In this point, suppose to identify and provided above-mentioned " label ".Literary composition will carefully be illustrated as the method that sample page generates label in the back.Be output as encapsulation W ∈ W.In ideal conditions, W is the suitable label of all sample page outputs.But generally can not make this type of assurance, therefore (spirit of knowing according to conclusion), requiring W is that the training sample of one group of appointment generates correct label.
For finding solution, encapsulation inductive problem (about particular level W) is described below:
Input: one group of training example ε=..., < Pn, Ln>... }, wherein each Pn is a page, each Ln is a label;
Output: encapsulation W ∈ W, so that make each < Pn, Ln>∈ ε can realize W (Pn)=Ln.
HLRT wrapper class
Like above explanation, one of pcwrapHLRT program representation " programming letter abbreviations speech (programming acronym) "-adopt delimiter, right side delimiter and the afterbody delimiter in head delimiter, left side from seller's products catalogue, to extract relevant product information and price.Head-left side-right side-afterbody (Head-Left-Right-Tail) HLRT wrapper class (wrapper class) is to make the formative a kind of mode of this abb..Program described in Figure 25 " execHLRT " is the universalization form of pcwrapHLRT, and the permission delimiter is an arbitrary string, rather than be used in the past the particular value that pcwrapHLRT uses "<b>", "</B>" etc.
Be html tag entirely although note that the delimiter in this example, the method that the present invention adopted is not limited to operate with html tag.In addition, text can not be HTML fully.Therefore, a dollar sign " " can be for effective left side delimiter of price, such as " ".
The execHLRT routine is explained the method for work of HLRT encapsulation.In the past, W (P) was for encapsulating the label that W generates through on page P, calling.When W was the HLRT encapsulation, routine execHLRT was a program (beginning from W and P) that is used for confirming W (P).
The left side delimiter of the value representation item attribute of li and ri and right side delimiter, and lp and rp represent the right side delimiter of price attribute, h and t be the head and the afterbody of representation page respectively.(note that h representes line number but not character string.Such as, if h=100, then 100 a behavior head of the page.Meaning of one's words identification buyer acts on behalf of 20 can skip these row immediately when searching products).For example, if use h=7, li="<b>, " ri="</B>, " lp="<l>, " rp="</l>" and t="</TABLE>" parameter call execHLRT, the effect of execHLRT is the same with pcwrapHLRT so.
More often than not, any HLRT of seller's network address encapsulation is equal to (h, li, ri, lp, rp, a t) vector, and any this type of vector all can be interpreted as a HLRT encapsulation.If they are identical, and note (h, li, ri, lp, rp t) is used as through with specifying delimiter partly to assess the simple expression way of the HLRT encapsulation that execHLRT obtains.
Because HLRT is vector (h, li, ri, a lp; Rp, t), so, HLRT among Fig. 7 and Fig. 8 encapsulation conclude example be at example page and label thereof one group of ε=... (P; (Pn, Ln) ...) and the basis on four (4) individual delimiter characters string (h, the li that confirm; Ri, lp, rp, in t) one.More accurately, solve following constraint satisfaction (satisfying) problem:
Variable: the head delimiter of page P: h
The afterbody delimiter of page P: t
The left side delimiter of item attribute: li
The right side delimiter of item attribute: ri
The left side delimiter of price attribute: lp
The right side delimiter of price attribute: rp
The territory: each delimiter all is an arbitrary string, except the head delimiter;
Constraint condition: for each < Pn, Ln>∈ ε, W (Pn)=Ln, wherein HLRT wrapper W=(h, li, ri, lp, rp, t),
To describe the method for operating of learnHLRT now, it will solve above constraint satisfaction problem.
Subsequent use delimiter
The territory that at first will understand the delimiter variable can receive the strictness constraint of example ε.At least, delimiter will be the substring of example page.That can also do certainly is better.According to example (Ppc Lpc) can find out, rp (the right side delimiter of price attribute) be necessary for "</l></TD></TR>" prefix, wherein<img file="BDA0000134717110000341.GIF" he="67" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="98" />Represent a new-line character.The meaning of " prefix " is the combination of character from the rightmost character beginning character of character string string, such as ">", " D>", " TR >, " etc.
Notice that if rp is not the prefix of this character string, each encapsulation that has this delimiter is with not extracting " 399.95 " encoded attributes as Ppc the 4th " tuple " at least so.Therefore rp subsequent use (candidate) value for "</l></TD></TR>" all prefixs.These subsequent use delimiters are shown in Figure 10 A.
In detail, the backed-up value of simple products catalog page delimiter produces as follows:
The backed-up value of li and lp
Suppose that lp is the left side delimiter of price attribute.Can be with reference to " the HM381MD before the price among Fig. 7</B></TD><tD><i>" and " MD2070</B></TD><tD><i>" etc. segment.Can find out according to these segments, lp must be "</B></TD><tD><i>" suffix.Therefore, the backed-up value of lp is 16 non-NULL suffix of this character string.In Figure 10 A, can see these backed-up values.So-called " suffix " refers to from the leftmost character of character string and begins the combination of character this character string, such as "<", " /><", " /><B ", "</B>" etc.
Delimiter li is more complicated, because the character string before first attribute appears between first attribute and last attribute of front " tuple ", and between the head and first " tuple " of the page.The character string of in example, being considered is very clear for " < TR>< TD>< B>" and
Figure BDA0000134717110000342
, and li is the suffix of this character string.Therefore, the backed-up value of li can produce through the suffix of enumerating this type of segment.
As summary, can reach a conclusion, (ε), the backed-up value of delimiter li and lp is to generate through the suffix of enumerating the shortest character string that occurs in each example item attribute and price attribute left side for i, p for example set and written candsl.(of epimere, the exemplary projects attribute is special a bit.Must enumerate between the adjacent tuple or first tuple shortest character string before).Such as, if ε=(Ppc, Lpc) }, so:
<math> <mfenced open='' close=''> <mtable> <mtr> <mtd> <msub> <mi>Cands</mi> <mi>l</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>&amp;epsiv;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>{</mo> <mo>&lt;;</mo> <mo>/</mo> <mi>I</mi> <mo>></mo> <mo>&lt;;</mo> <mo>/</mo> <mi>TD</mi> <mo>></mo> <mo>&lt;;</mo> <mo>/</mo> <mi>TR</mi> <mo>></mo> <mo>&amp;DoubleDownArrow;</mo> <mo>&lt;;</mo> <mi>TR</mi> <mo>></mo> <mo>&lt;;</mo> <mi>TD</mi> <mo>></mo> <mo>&lt;;</mo> <mi>B</mi> <mo>></mo> <mo>,</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>}</mo> </mtd> </mtr> </mtable> </mfenced></math>
Cands<sub >l</sub>(p,ε)={</B></TD><TD><I>,
....}
The backed-up value of ri and rp
The generation method of right side delimiter backed-up value is similar with the left side delimiter, but has two differences.At first, the character string of being considered occurs in the right side (rather than left side) of suitable attribute.The second, ri and rp are necessary for the prefix (but not suffix) of these character strings.For example, in simple products catalogue example, delimiter ri be necessary for character string "</B></TD><tD><i>" prefix, and rp be necessary for "</I></TD><tR>" and<img file="BDA0000134717110000352.GIF" he="68" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700" />Prefix.
Particularly, shown in example set ε-written candsr (k, ε) in, the backed-up value of right side delimiter is to produce through enumerating the prefix that appears at the shortest character string on attribute k right side in each example.(as stated, li is special circumstances.Equally, rp also is special circumstances.) cited be to appear between adjacent " tuple " or the prefix of last " tuple " shortest character string afterwards.For example:
Cands<sub >r</sub>(i,ε)={</B></TD><TR><l>,
....}
<math> <mfenced open='' close=''> <mtable> <mtr> <mtd> <msub> <mi>Cands</mi> <mi>r</mi> </msub> <mrow> <mo>(</mo> <mi>p</mi> <mo>,</mo> <mi>&amp;epsiv;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>{</mo> <mo>&lt;;</mo> <mo>/</mo> <mi>I</mi> <mo>></mo> <mo>&lt;;</mo> <mo>/</mo> <mi>TD</mi> <mo>></mo> <mo>&lt;;</mo> <mo>/</mo> <mi>TR</mi> <mo>></mo> <mo>&amp;DoubleDownArrow;</mo> <mo>&lt;;</mo> <mi>TR</mi> <mo>></mo> <mo>&lt;;</mo> <mi>TD</mi> <mo>></mo> <mo>&lt;;</mo> <mi>B</mi> <mo>></mo> <mo>,</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>}</mo> </mtd> </mtr> </mtable> </mfenced></math>
The backed-up value of head and afterbody
Similar analysis is applicable to the delimiter of head and afterbody." head " refers to that the prefix of the page before appears in first item attribute.Be noted that at this " head " is represented as a character string.When reality is carried out an encapsulation, in order to improve the performance of invention, the most handy integer representation " head ", like this, when client or buyer when using wrapper to seek product information, they can not see that content just skips the head of the page.To convert the head character string into an integer, as long as confirm the line number that the head character string is striden.
The method of confirming " afterbody " delimiter is closely similar with the method for confirming right side delimiter li and Lp.The afterbody backed-up value is the suffix of the character string after last price attribute of the page.
<math> <mfenced open='' close=''> <mtable> <mtr> <mtd> <mi>Cands</mi> <mrow> <mo>(</mo> <mi>head</mi> <mo>,</mo> <mi>&amp;epsiv;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>{</mo> <mo>&lt;;</mo> <mi>HTML</mi> <mo>></mo> <mo>&amp;DoubleDownArrow;</mo> <mo>&lt;;</mo> <mi>TITLE</mi> <mo>></mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>PRICE</mi> <mrow> <mo>(</mo> <mo>$</mo> <mi>US</mi> <mo>)</mo> </mrow> <mo>&lt;;</mo> <mo>/</mo> <mi>TH</mi> <mo>></mo> <mo>&lt;;</mo> <mo>/</mo> <mi>TR</mi> <mo>></mo> <mo>,</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>}</mo> </mtd> </mtr> </mtable> </mfenced></math>
<math> <mfenced open='' close=''> <mtable> <mtr> <mtd> <mi>Cands</mi> <mrow> <mo>(</mo> <mi>tail</mi> <mo>,</mo> <mi>&amp;epsiv;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>{</mo> <mo>&lt;;</mo> <mo>/</mo> <mi>I</mi> <mo>></mo> <mo>&lt;;</mo> <mo>/</mo> <mi>TD</mi> <mo>></mo> <mo>&lt;;</mo> <mo>/</mo> <mi>TR</mi> <mo>></mo> <mo>&amp;DoubleDownArrow;</mo> <mo>&lt;;</mo> <mo>/</mo> <mi>TABLE</mi> <mo>></mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>&lt;;</mo> <mi>HTML</mi> <mo>></mo> <mo>,</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>}</mo> </mtd> </mtr> </mtable> </mfenced></math>
The independence of delimiter
According to these backed-up values of each delimiter, be used to know that the module with the straightforward procedure of pseudo-code establishment of these two delimiters is shown in figure 26.
Because this module is according to the operation of proportional time of the quantity product of each delimiter backed-up value, and because each delimiter can have many backed-up values, the time of execution maybe be slow.
Through making ri, lp, rp is separate, can realize more effective processing.In addition, for a specific delimiter, whether a backed-up value is effectively under any circumstance all irrelevant with any other delimiter.For example, need not analyze other delimiter can judge "</B>" whether be applicable to ri.
Be this independence of correct maintenance, can be with reference to the execHRLT program.Every bit in its implementation, execHRLT will be searched for the input page P of each delimiter ri, lp and rp.If in search, do not confirm the tram in P, then the label of execHRLT output will be wrong.But can these search be returned correct option and only depend on delimiter and example page, and do not depend on other delimiters.
Change a kind of saying, (ri, lp rp) are chosen as certain delimiter, and under any circumstance this backed-up value can not lose efficacy, and no matter what the backed-up value of other delimiter is as long as certain specific backed-up value.This inversion saying also can let the people produce such intuition: how careful no matter if a backed-up value is invalid, have no way to repair it, be when selecting backed-up value for other delimiters.Note that this independence can be guaranteed; It is not only the intuitive judgment of adopting for ease of knowing.
The importance of noting this point is, can know three delimiter ri, lp and rp respectively.In pseudo-code, can know as follows:
1. generation trim set;
2. select an effective backed-up value for each delimiter.
This method is than fast many of the program among Figure 26: it be according to each delimiter backed-up value quantity with (rather than product) proportional time operation.
Yet, be also noted that be not all delimiters all be separate.On the contrary, for delimiter h, t and li, whether a concrete character string effectively depends on the selection of other two delimiters for a delimiter in three delimiters.For example, " whether < B>" effective to li? The selection of h and t is depended in answer.If h="<hTML>", so "<b>" for li is not an effective delimiter because execHLRT will can not skip irrelevant overstriking text "<b>A Simple Product Catalogues</B>".On the other hand, if h="</TH></TR>", so li="<b>" can not have problems.Equally, li and t interact: if t="</HTML>, then li="<b>" be can not be received, if but t="</TABLE>Then can be accepted.Like this, must consider the backed-up value of three delimiter h, t and li simultaneously.So need enumerate h, all combinations of t and li, and select effective combination.
The validity of backed-up value
This second step of improving one's methods relates to accurate description makes the effective condition of delimiter backed-up value.
At first consider delimiter ri and rp.Confirm the beginning flag of certain example of attribute in this method after, this method attempts to confirm the end mark of this attribute instance (instance).Therefore the backed-up value " u " of delimiter ri or rp must satisfy two constraint conditions:
Constraint condition C1: in any example page, " u " must not be the substring of any one attribute instance.
Constraint condition C2: in each example page, " u " must be the prefix of the text of appearance after each attribute instance.
If the backed-up value of delimiter ri or rp " u " is violated these constraint condition, each encapsulation will be invalid for an example ε at least so.If violate constraint condition C1, attribute k will be too short so; If violate constraint condition C2, it will be oversize so.
Generally speaking, for one group of specific example ε,, notice that conditions must be fulfilled if make certain backed-up value " u " effective to delimiter ri or rp.These conditions will be called as effectively (u, r, ε).Can find out that for example set ε, only satisfy under the situation to the constraint condition C1 of delimiter ri and rp and C2 at backed-up value " u ", (u, r ε) just understand continuous and effective to r.Get back to example,, can find if effectively (validr) test is used for the backed-up value that candsr generates:
Right side delimiter for item attribute:
validr(</B></TD><TD><I>,i,ε)=TRUE
.....
Right side delimiter for the price attribute:
<math> <mfenced open='' close=''> <mtable> <mtr> <mtd> <mi>validr</mi> <mrow> <mo>(</mo> <mo>&lt;;</mo> <mo>/</mo> <mi>I</mi> <mo>></mo> <mo>&lt;;</mo> <mo>/</mo> <mi>TD</mi> <mo>></mo> <mo>&lt;;</mo> <mi>TR</mi> <mo>></mo> <mo>&amp;DoubleDownArrow;</mo> <mo>&lt;;</mo> <mi>TR</mi> <mo>></mo> <mo>&lt;;</mo> <mi>TD</mi> <mo>></mo> <mo>&lt;;</mo> <mi>B</mi> <mo>></mo> <mo>,</mo> <mi>p</mi> <mo>,</mo> <mi>&amp;epsiv;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>FALSE</mi> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> <mo>.</mo> <mo>.</mo> </mtd> </mtr> </mtable> </mfenced></math>
Constraint condition to lp
ExecHLRT program search delimiter l pDelimiter l pBacked-up value " u " must satisfy two constraint conditions:
Constraint condition C3: in each example page, " u " must be the correct suffix of the text of appearance before the instance of each attribute k.
If violate this condition, then each wrapper will not be inconsistent with example ε.At least will have among the beginning flag bm that calculates by execHLRT and the p one incorrect, so extremely greater than right value, still do not define less than right value, this depends on how " u " violates this condition.
Generally speaking, according to one group of specific example ε,, notice that conditions must be fulfilled if make certain backed-up value " u " effective to delimiter lp.These conditions will be called effectively (u, l, ε).Can find out that for C, only satisfy under the situation to the constraint condition C3 of delimiter lp at backed-up value " u ", (u, l ε) just understand continuous and effective to validi.Get back to simple products catalogue example Ppc, can find:
validl(</B></TD><TD><I>,p,ε)=TRUE
....
Confirm h, whether backed-up value Uh, Ut and the Uli of t and li is suitable for, and should adopt following constraint condition to confirm:
Constraint condition C4:Uh must be the correct suffix of each page head part.
Constraint condition C5:Uh must be the correct suffix that occurs the head part of each page of back at first Uh.
Constraint condition C6: in any page, Ut must not appear at from h and occur for the first time between the appearance of li subsequently.
Constraint condition C7:Ut is necessary for the substring of each page afterbody.
Constraint condition C8:Uli must not occur before the t in each page afterbody.
Constraint condition C9: in each page kind, Uli must be the correct suffix of text between " tuple ".
Constraint condition 10: in any page, Ut must not occur before the Uli in the text between " tuple ".
HLRT concludes
Under above-mentioned background, will tell about program learnHLRT now.Provide a detailed procedure table and relevant subroutine among Figure 27 A and the 27B.
Obtain training data
In above description of the invention, supposed that the meaning of one's words identification person of knowing acts on behalf of 18 and can use the training data storehouse.That is to say, one group of ε=..., < Pn >, Ln>... } the training example exists, and wherein each Pn is a page, and each Ln is a label.To further understand the mode that the person of knowing 18 uses the training example, can be referring to Fig. 7 and Figure 13.
As previously discussed, in the shopping/purchase stage, meaning of one's words identification buyer proxy server 20 can be carried out five kinds of different functions.These five kinds of functions are described below respectively:
(1) adopts like the described modularization heuristic search method making of Figure 15 C label (LabelOracle).These are called as recognizer (recognizer): a kind of is the item recognition device, and another kind is intelligent price recognizer.
(2) because meaning of one's words identification buyer acts on behalf of 20 to recapture the efficient of seller's declarative data not high when each buyer from network or user receive request; If these to be meaning of one's words identification buyer act on behalf of 20 requests for the first time to a required group profile search for-mate-extract, only from database (preferably Microsoft Access database) or with SQL-compliant database server 22 these type of declarative descriptions of acquisition.Seller's explanation will be stored in storer or the cache memory then, retrieve use more apace so that the identification of the meaning of one's words afterwards seller acts on behalf of in 20 requests.
(3) seller in storer or the cache memory explains preferably and upgrades once automatically every day.
(4) system among the present invention can produce many threads, and visits several meaning of one's words identification buyer agency simultaneously, to contact online seller's network address of various appointments through the WWW.The use of this multi-threading preferably is based upon on the DCOM technical foundation of Microsoft company.Each meaning of one's words identification buyer agency can fill in seller's search list and press " carriage return " with virtual mode with aptitude manner according to the product information that buyer or user provide.
(5) on the other hand; Through accelerating seller's time for replying and the Search Results that returns being distributed to different storage addresss by multithreading; Meaning of one's words identification buyer acts on behalf of 20 and can solve the heavy problem of web service amount, and the whole process of client/buyer's online purchase is being dominated in the WWW at present.
Training page Pn
Obtaining the training page relates to vendor web site is carried out sample query.For example, Figure 12 explanation is from inquiring the example page outward appearance of network address (for example http://www.800.com).
To carry out more detailed description according to the algorithm of modularization heuristic search method making recognizer (index is signed (LabelOracle)) to being used for now.Recognizer is searched for the instance of a particular community on the page.Such as the sample page on Figure 12, all " projects " that the item recognition device will be confirmed to comprise on the page are such as product " HM381MD, ", " MD2070, " and " MD203 ".The degree of intelligence of recognizer should be enough to eliminate to be disturbed.
Give one example, like the example among Figure 12, intelligent price recognizer should be able to be distinguished " price 1 " and " price 2 ", and which is " a tabulation price " again, and which is " your bid ".Then the instance of being discerned is confirmed with the sign full page.Be used for " project " and another recognizer is used for " price " such as a recognizer,, be used to identify the page that includes these paired attributes through confirming to produce a LabelOracle.
Know in advance at the item recognition device under the situation of all items that identification " project " is a simple pattern matching problem just.Yet this is infeasible, because this needs a big project name/model list.In addition, safeguard that so big project database cost is higher.Therefore can not guarantee that such project name/model list inventory is complete, up-to-date.
Fortunately, the seller attempts through adopting unified outward appearance to produce a kind of sensation of unanimity to all products.Such as, the seller provides minidisk (MD) form that information adopted identical with the form that is used for the DVD product.By this systematicness, can suppose that all products all describe with same format.
This invention is only known encapsulation from the special domain of example, and attempts to make this territory to be suitable on Global Internet using with all of consolidation form tissue other territory of foreign languages.In the preferential embodiment that selects, the training example only comes from a territory, such as the MD territory on seller's network address.The recognizer that generates in this case only need be discerned the certain products territory, such as MD.In this manner, can keep one through upgrade fully, particular items title-domain catalogue.
The present invention confirms " price " through the calling module heuristic search.Such as a dollar sign ($) is always arranged before the price; Price often is a floating number etc.If find a project a plurality of prices are arranged, correspondingly extract keywords such as " your price ", " our price ", " tabulation price ", " original price " again.
The detailed step in shopping stage
Like above simple description, the working mechanism of meaning of one's words identification buyer proxy server 20 is shown in Figure 14 and Figure 15 A-15C.Control flow is divided into (8) the individual steps of eight shown in the figure.
(1) step
After a user need to confirm certain specific product or service; The present invention can provide a door (inlet); Through door and through interaction-agency-character graphic user interface (IACGUI; Be commonly referred to as interaction-agency-client role/purchaser interface) import product demand information to reach the product search purpose; This method can obtain better, result more reliably sooner, rather than the seller's network address through browsing various language on the WWW is with manual mode searching products information and price one by one.
The description of product that searches is stored among the member variable-m_ProdDesc-of the SRBA 20.Search also allows the user to pass through " Advanced Search " functional specification agency's working method, and above-mentioned functions provides optional parameter, such as the seller that should select, overtime (restriction), Price Range, any manufacturer, keyword etc.
(2) step
For example; Suppose online purchase person and user's platform uses is Windows 98 operating systems and is moving language version " B " (or preferably Windows 2000 professional versions of its platform operation english version and/or personal network's server software that its platform is equipped with " B " version); After it gets into door of the present invention, Microsoft Internet Explorer (IE browser) will point out it to download language " B " software for display.When online purchase person or user " A " with the text box of this machine character in portal website of the present invention of language " B " in product type of input as keyword after; Meaning of one's words identification buyer proxy server 20 among Fig. 2 will utilize the sample data of describing in advance (data of in seller's explanation, recapturing after encapsulation was in real time concluded and known in the past) to carry out data extract, include this machine of language " B " character string in the sample data.These sellers explanation is stored in seller in the database 24 (being preferably Microsoft Access database) and explains in the tabulation in the predefined data structure.Data extract relate to adopt simultaneously this machine of accurate language " B " character string that the user imports from before Search Results search " hits ", comprise price, explanation and the information (as Figure 15 C 312nd go on foot shown in) relevant with product.Above-mentioned information resides in the storer and cache memory in the server 22.Because language " B " character string is a kind of specific native language, any " hits " found will be applicable to the appointment seller network address of using native language " B " and language " B " character string being arranged.
Shown in the 7th step of Figure 14; In knowing process; When the meaning of one's words among Fig. 2 identification person of knowing proxy server 18 after vendor is known encapsulation from network, the seller that seller's explanation is stored in offline database 24 (being preferably Microsoft Access database) and database server 22 (being preferably the database server with Microsoft SQL-complaint) explains and tabulates in 28.Because in that buyer or user receive and when request meaning of one's words identification buyer act on behalf of 20 pairs of sellers' explanations to carry out effectiveness of retrieval not high from network at every turn; If this be meaning of one's words identification buyer proxy server 20 request for the first time to a required group profile search for-mate-extract, from database 24 or server 22, retrieve this class description only.Seller's explanation will be stored in storer or the cache memory then, so that retrieve use more apace among the identification of the meaning of one's words afterwards buyer proxy server 20 same users of request or the new user.
Seller in storer or the cache memory explains and upgrades once automatically preferred every day.
(3) step
Seller's explanation that utilization is recaptured, the system among the present invention can produce many threads, and visits several meaning of one's words identification buyer proxy server simultaneously, with the online seller's network address through WWW each appointment of contact.
(4) step
The use of this multi-threading preferably is based upon on the DCOM technical foundation of Microsoft company.Each meaning of one's words identification buyer proxy server can be filled in seller's search list and press " carriage return " with virtual mode with aptitude manner according to the product information that buyer or user provide.
(5) step
Each seller returns a result of page searching then, includes information or error message of required product on this page.
(6) and (7) step
Result of page searching is returned meaning of one's words identification buyer proxy server 20 through the WWW.Should notice that the SOME RESULTS page can return meaning of one's words identification buyer proxy server 20 simultaneously.Through accelerating seller's time for replying and the Search Results that returns being distributed to different storage addresss by multithreading; Meaning of one's words identification buyer of the present invention acts on behalf of 20 and can solve the busy problem of web service, and the whole process of client/buyer's shopping online is being controlled in the WWW at present.
(8) step
The corresponding seller of meaning of one's words identification buyer proxy server 20 bases explains and analyzes the page that returns.All result of page searching arrived or searched for overtime after, relevant information can be extracted from the page that returns by quilt with data, and is revealed with the formative way of output.
Shown in figure 28, user/buyer and server 22 carry out communication, with the just dll file in processing procedure (NextGen.dll) on the activity service page (ASP) file (NextGen.asp) runtime server 22 shown in passing through.
Preferably, realize other buyer's proxy server 20 with the form open language of ActiveX assembly many benefits are arranged.The first, can improve overall performance.The meaning of one's words identification buyer proxy server of writing with Visual C++ makes this Proxy function stronger, and makes it to have the power of ActiveX assembly.There is no need to provide the workspace scheme of HTML and scripted code to satisfy application demand.Use the ActiveX assembly, can move this proxy server, carry out and leave all complex processing for server end through in the client html file, increasing several line codes.
The second, the ActiveX assembly offers other application programs and copies similar function with reusability rather than in each application module.The ActiveX assembly of being created can be by all Active Server Pages (Active Sever Pages is also referred to as the dynamic state server homepage) module accesses.In other words, need in the ASP module, not comprise all code logic.Therefore, this has eliminated the redundancy of using.Though meaning of one's words identification buyer proxy server is created in single application, this does not hinder it and other application integration.And then say it, this characteristic helps to significantly reduce the development time.
The 3rd, this is useful for being connected to the ASP assembly DLL (dynamic link library) file, because they are independently compiled and link.Recompile and the heavy chain that need not add just connect and can upgrade the ASP assembly.Therefore, use the benefit of the ActiveX assembly of DLL be improve processing speed or be convenient to after the new function of increase.In addition, DLL can reduce internal memory and disk space requirement, and its mode is through between a plurality of modules, sharing single common code copy.
If there are a plurality of assemblies to use same static link library, must store and carry out a plurality of the same storehouses so simultaneously and copy.Therefore, if they move simultaneously a plurality of the same copies just need be arranged in internal memory.Obviously, static link library causes redundancy and space waste.
If using DLL to replace static link library, so only needing a code and resource copy.This can guarantee that server handles a plurality of concurrent connection from the Internet under the workload of minimum.
The ActiveX assembly that meaning of one's words identification buyer proxy server 20 is preferably developed as in-process dynamic link library (in-process DLL).This allows the user to create the SRBA object through the WWW.In order between user and server, to communicate by letter, ASP is used as the gateway between user and the server.
ASP is open applied environment, html page in this environment, and script and ActiveX assembly combine the application program of establishment based on WEB.In addition, it is built into Internet server Application Program Interface (ISAPI), the IIS (IIS) of Microsoft or with the similar reciprocity WEB server of IIS on move.
In order to carry out ASP, used the ActiveX script of Microsoft, such as Visual Basic (VB) script, this script uses in the process of management ActiveX assembly.Make this language mobilism activate the ActiveX assembly that on server, moves through increasing function with the DLL form.
Programmed logic-establishment meaning of one's words identification buyer proxy server object
, the user will create the object of a meaning of one's words identification buyer proxy server 20 when beginning to inquire about desirable product price.
At Active Server Page, the module of writing with pseudocode form is following:
Figure BDA0000134717110000451
When this page is loaded, above-mentioned module creation meaning of one's words identification buyer proxy server 20 objects, wherein NextGen is the name of ActiveX assembly.Meaning of one's words identification buyer (Semantics-Recognition Buyer) is the name of proxy server in the NextGen assembly.
Connect user and server
, the instance of meaning of one's words identification buyer proxy server 20 just between user and server, set up connection after being created, and shown in figure 29.
Meaning of one's words identification buyer proxy server uses attachable object to safeguard " one to one " (channel); User and server are through this channel communication; Such as user request service device comparative price; And meaning of one's words identification buyer proxy server 20 uses outer outgoing interface as connection (channel), thereby makes server and telex network, and the result that will inquire about such as the server response request returns to and sends requesting users.The user can visit the attribute of meaning of one's words identification buyer proxy server and activate its method through IConnectionPoint.
Meaning of one's words identification buyer proxy server 20 adopts following method:
1, OnStartPage (unknown agency)
This method is used for initialization meaning of one's words identification buyer proxy server object, and this method is called when ASP is loaded automatically.
2、OnEndPage()
This method is used to stop meaning of one's words identification buyer proxy server object, and this method is called when ASP is unloaded automatically.
3、GetSearch(B?Str?input,B?Str?*output)
This method is used for after the user provides the description of product (like model), searching required product price on the Internet.Input is the description of product that the user provides, and output is the output content of the lookup result page.The grammer that calls this method is:
OutputName=AdObjectName.GetSearch(“Product?Name”)
In above-mentioned code, " AdObjectName " is the Instance Name of object, and " Product Name " is the ProductName that buyer/user wants comparative price, and " OutputName " is the variable of depositing the end value that is obtained.Pseudo-code with reference to following example:
If?result=Agent
Get?search(“Radar?detector”)
The server logic
Registration component
Before user's initialization meaning of one's words identification buyer proxy server object, this assembly must be registered (registration) with following order on server:
Register?path\Nextgen.dll
Wherein path is the absolute path of depositing of Nextgen.dll.
User's request responding
When meaning of one's words identification buyer proxy server object called the GetSearch method through IconnectionPoint, the instance of the meaning of one's words identification buyer proxy server on server machine was carried out dynamic link library (DLL).Referring to Figure 30.
Connect database
DSN (DSN) need be provided, and sign (ID) and password are so that be connected to SQL Server (sql server) through ODBC.RETCODE is the variable of storage SQL Server rreturn value.SQL_SUCCESS shows that success obtains again.
Figure BDA0000134717110000471
Carry out SQL query
Before obtaining desired data, need the query statement of appointment from SQL.
Figure BDA0000134717110000482
Obtain field
After the successful execution inquiry, seller's descriptive information will be stored in the vendor_description array.This array has two member variables: wrapper and seller URL.
Figure BDA0000134717110000483
Figure BDA0000134717110000491
Fill up a form
If N seller's explanation arranged, buyer's proxy server is filled in the form that each seller of appointment is described by the seller with an initialization N thread.
The grammer that moves each thread is:
Figure BDA0000134717110000492
To each thread, preferably about 5 seconds of time restriction.If the seller does not have return results in 5 seconds, will abandon this seller so specifically, otherwise its result will be stored in the internal memory so that the use of next program.
When the user imported the key word of the request of purchase in the text box that door of the present invention (inlet) place provides, it just determined whether relevant seller's explanation is arranged, promptly comprise seller's explanation of key word.All comprise relevant seller's explanation of wrapper and URL and all from offline database, extract.After this, but meaning of one's words identification buyer proxy server 20 and each online seller's search engine executed in parallel is filled up a form and is sent to vendor sites.In vendor sites, buyer's proxy server will call member function httpPost and finish the work.The HttpPost member function sends URL and list data to seller according to seller's explanation, returns the HTML response with character variable.The HttpPost member function returns Boolean (boolean value), and wherein " very " representes successfully to obtain html document, and " vacation " expression makes a mistake.If rreturn value is true, project name that is produced and price will be extracted from html document.The flow process that transmits form is shown in figure 31.
In step 1002, be this conversation establishing Cinternet Session object.Cinternet Session class is connected to server through internet session.Usually such is used for session in early days to be established to the connection of WEB server.
In step 1004, create the ChttpConnection object through the GetHttpConnection member function that calls Cinternet Session object.The ChttpConnection class is set up and is connected with the HTTP of server.
In step 1006, create the CHttpFile object through the OpenRequest member function that calls the ChttpConnection object.The function of CHttpFile class is that the file that transmits on the Internet can be handled as the file that is operated in local disk.It is worked with the ChttpConnection object and reads and writes internet data.
The SendRequest member function that step 1008 is called the CHttpFile object sends the POST request and arrives long-range http server with list data.
Step 1010, the READ member function of 1012 and 1014 recall ChttpFile objects returns a large amount of response datas and gives program.When Read returns 0, the data that just do not have to obtain.
Extract price
Obtain after the page, meaning of one's words identification buyer proxy server 20 will be used general wrong each page of masterplate coupling.If page and masterplate do not match, just think the inquiry of success.Buyer's proxy server 20 will use the wrapper of corresponding seller from the successful page, to peel off head and trailer information then.Such as, if the user searches the MD product, model is MD203, given wrapper be 7,<b>,</B>,<l>,</l>,</TABLE>, page is as follows so.
In wrapper, Useful Information starts from the 7th row, ends at</TABLE>, so meaning of one's words identification buyer proxy server 20 will remove useless information before extracting model and price.Removing head and trailer information html file afterwards is:
Figure BDA0000134717110000521
Then, meaning of one's words identification buyer proxy server 20 will use pattern match to extract the price of model and product.In wrapper, the pattern of model is<b>*</B>, the pattern of price is</1>#</1>Wherein * representes model, and # representes price.Proxy server will at first extract model HM381MD, and compare with model MD203 that the user is asked.Because they do not match, meaning of one's words identification buyer proxy server 20 is just searched another model, up to finding model MD203.Find after the model, meaning of one's words identification buyer proxy server 20 uses price schema to extract this model first price afterwards.After model and price were extracted, meaning of one's words identification buyer proxy server 20 stopped from this page information extraction, and sends to the array that a name calls array_item [] to information.
Key component
Array_item [] is the data that N thread shared, and all threads can both be visited this member variable.Danger is the access conflict that a plurality of threads produce when visiting array_item [] simultaneously.In order to protect these data of sharing under the state of unanimity, use key component (Critical Section) to stop more than one thread to revise data simultaneously.Be described below:
CCriticalSection?m_csDoor;
Before?inserting?an?element?into?the?array_item,the?line
m_csDoor,Lock();
Above content is increased, to be used to start key component.All inner variablees of key component all lock, and stop this specific variable of other thread accesses.Accomplish after the insertion, OK
m_osDoor.Unlock();
Be increased, to be used to representing that key component finishes.All variablees that lock allow other thread accesses member variable with release.Like this, the member variable of array_Item can be shared by all threads safely.
Price category
In designated time intervals, the array sort_item that deposits product price will be classified through fast classification method.
Rapid classification method is realized in such a way:
IF?left<right?THEN
BEGIN
To select " key " structured value in each recurrence position of code.This function is pressed this structure of scanning that both direction repeats.Be placed on the left side of structure less than the number of key value, bigger value is placed on the right of structure.The scanning of this " from left to right " and " from right to left " and exchange last till that status indication tells them to finish.
Response is returned to the user
A html file will return to the user, and this document will be stored among the member variable m_output, and it shows the classification results that SRBA20 searches.
//String?used?to?display?content?to?browser
*define?HTTP_HEADER“Content-type:text/htm\n\n”
//Codes?to?display?to?brower
Submitted to a front to mention computer program tabulation appendix with the application, it provides generation and has realized the code section of selectivity characteristic of the present invention.Particularly, source code is provided, has been used for: the main class file that " 3.1.1 Main COOSA Application Class "-be used for COOSA uses in the part that indicates " 3.1 The learning Phase (knowing the stage) "; " 3.1.2 Add Vendor Class "-add seller's class to database; The file at " 3.1.3 COOSADoc Class "-call meaning of one's words identification person of knowing interface and screen display and; " 3.1.4 COOSA View the Class "-person of knowing interface and function screen thereof; " 3.1.5 Training Data Class "-call the meaning of one's words identification person of knowing proxy server, " 3.1.6 Vendor Class "-description labeling algorithm is through the processing of all seller WEB pages.Be marked with " Shopping Phase (purchase stage) " part, following source code is provided, " 3.2.1 Agent Class "-explain meaning of one's words identification buyer proxy server; The subprogram of " 3.2.2 Thread Process "-meaning of one's words identification buyer proxy server.
With reference to Figure 32 to 39, will describe in detail below and the interconnective GUI of the present invention or interactive proxy server role buyer/Buyer's view (Interactive-Agent-Character Shopper/Buyer interface).In Figure 32 simple declaration with the " master menu " screen of interconnective GUI of the present invention or interactive proxy server role purchaser interface (IACS/BI).Be noted that the product " channel (channel) " (classification) that has the confession user to select in the upper right corner of this main menu screen.The left side at screen provides " fast query (Quick Search) " characteristic.Under it, a text box that the animation typewriting is wherein arranged is provided, instruct the online user how to use the fast query option.The screen left side also provides one group of text box to supply the visitor to use as an interim member or permanent membership.(note the function of most of inlet of the present invention is disabled always obtain correct affirmation) up to the user.In the lower left corner one group of link to online seller is provided, these sellers are registered in inlet of the present invention.On the right; Can see that a big message box is marked with " feedback (feedback) "; Offer the online user and send to EMAIL server input suggestion message, preferably use the EMAIL server of the Outlook Express brand of Microsoft through EMAIL.
Figure 33 simple declaration GUI or the purchaser interface used with the present invention, shown that wherein some company informations are in response to " government is to commercial (Government-to-Business) " text icon of clicking in line buyer/user screen (not showing) in front.But; Notice that this screen can not work; Because these companies; The authority of perhaps making government limit the member to the present strictness of commercial electronic business service or platform provider visits their WEB server database, and its mode is through integrated authentication security interface in the computer network environment that whole sealing connects.
Figure 34 simple declaration the GUI that uses with the present invention or the screen display of purchaser interface; Details about user-selected company wherein is provided, and the user clicks " Advanced Search (Advance Search) " option of screen among Figure 33 and from a plurality of companies that provide, selects.Attention in this screen, the flag in the framework be positioned at five types the territory under, can see the message " senior agency (ADVANCED SEARCH AGENTS ARE ON) in service " of capitalization.In addition, in the bottom of screen, offer the user session frame and can supply to fill in so that use the function run search of meaning of one's words identification buyer proxy server provided by the present invention.But; To notice that still this screen can not work; Because this company; The authority of perhaps making government limit the member to the present strictness of commercial electronic business service or platform provider visits their WEB server database, and its mode is through integrated authentication security interface in the computer network environment that whole sealing connects.
The GUI that Figure 35 is simple declaration uses with the present invention or the screen of purchaser interface have wherein shown some companies, " businessman is to businessman " text icon of clicking in its response on-line purchase person/user screen (not showing) in front.But; Notice that this screen can not work; Because these companies; Perhaps be the commercial authority that arrives the member of their WEB server database of the present strict limiting access of commercial electronic business service or platform provider, its mode is through integrated authentication security interface in the computer network environment that whole sealing connects.
Figure 36 simple declaration the GUI that uses with the present invention or the screen display of purchaser interface interface; Details about user-selected company wherein is provided, and the user clicks " Advanced Search " option of screen among Figure 35 and from a plurality of companies that provide, selects.
The GUI that Figure 37 is simple declaration uses with the present invention or the screen of purchaser interface have shown that wherein the response user selects the item selected and the explanation thereof of " territory A " label.
Figure 38 simple declaration the GUI that uses with the present invention or the screen display of buyer/Buyer's view, the seller's sell goods project among the territory A that wherein lists, the response user clicks " Advanced Search " option in Figure 37 screen.
Figure 39 simple declaration the GUI that uses with the present invention or the screen display of buyer/Buyer's view, the details of using meaning of one's words identification buyer proxy server Search Results of the present invention wherein is provided.Buyer/Buyer's view sends the user of searching request through the search parameter response of bottom of screen among Figure 38.
Need be understood that further that although the present invention is illustrated according to the Internet and WWW, the present invention is equally applicable to the system and the system of future generation of introducing recently.For instance; The wireless application developing instrument; J2MF (Java to Micro Edition) can be used to be attached to mobile/wireless platform with territory independence price comparison ability with on-line intelligence is multilingual, comprises the 3G or the WEB phone of all models, interactive serves (Ultimate TV) with the satellite interactive TV; Pocket PC; Palm PC (Palm organizer), integrated networking palm synchronous device (all-in-one Web-enabled Palm Synchronizer), wireless board etc.; With the many products on the homepage and the commercial WEB service of multilingual increment send to mobile working person and netizen, wherein this homepage is as the points of access based on the one-stop anywhere on 24/7/365 basis.
And then say it; The present invention can send various products and the multilingual commercial WEB service of increment through wired and mobile/wireless platform; These products & services have following ability, function, characteristic: price comparison, and stored value card is integrated, has the inter-agent communication of negotiation ability--negotiation of Proxy-to-Proxy (A-to-A) contract---to the real world simulation capacity of a plurality of ecommerce sections; Comprise that the consumer arrives commerce (C-to-B); The consumer is to consumer (C-to-C) and commerce to commerce (B-to-B) auction, and government arrives business transaction (G-to-B) etc.The activity of these A-to-A commercial affairs or A commercial affairs will be constructed and activated in the framework of world market, and these needs user uses keyboard, and mouse or pointing device just can dynamically carry out in time.
Here the term that adopts just is used for describing rather than restriction with expressing, and uses these terms and an equivalent feature of expressing the characteristic that the institute that is not intended to repel in this explanation shows and describe, and recognize the possibility that in the scope of requirement of the present invention, has various modifications.
Figure BDA0000134717110000571
Figure BDA0000134717110000581
Figure BDA0000134717110000591
Figure BDA0000134717110000601
Figure BDA0000134717110000611
Figure BDA0000134717110000621
Figure BDA0000134717110000631
Figure BDA0000134717110000641
Figure BDA0000134717110000661

Claims (2)

1. one kind is carried out the method for real-time online searching disposal through interconnective computer network, and said method comprises the steps:
A. through offline database of interconnective computer network access; The seller's explanation that has a plurality of vendor sites of the vendor sites that comprises different native languages in the said database; Said seller's explanation has the information about each vendor sites in said a plurality of vendor sites, and this information comprises:
I. the URL of each vendor sites in said a plurality of vendor sites,
Each seller's search form URL among the said a plurality of sellers of ii,
Iii. the territory explanation of finding in each vendor sites in said a plurality of vendor sites,
Iv. be the universalization rule how to organize about product information on each vendor sites in said a plurality of vendor sites;
V. the price of obtaining from said a plurality of sellers and the sample of product information;
B. receive price contrast request with one of said different native language from the online user about expected product;
C. identification possibly have the vendor sites of the pricing information relevant with said price contrast request from seller's explanation;
D. utilize the said website explanation that comprises corresponding search form URL to make up searching request to expected product about the said vendor sites of each identification;
E. constructed searching request is directly sent to the vendor sites of being discerned;
F. from the Search Results that receives in response to the searching request of being sent, extract price and product information, the price of wherein being extracted is one of said different native language with the product information employing; And
G. price of being extracted and product information are shown to said user.
2. one kind is carried out the method for real-time online searching disposal through interconnective computer network, and said method comprises the steps:
A. in an offline database, store the information of a plurality of vendor sites of the relevant vendor sites that comprises different native languages through interconnective computer network; Said information comprises URL; Search form URL; Territory explanation and seller's explanation, wherein said seller's explanation comprise that about product information on each said vendor sites be a plurality of universalization rules how to organize;
B. use the information in the said offline database that is stored in; The parameter of the price contrast request of the expected product that processing receives from the online user with one of said different native language comprises that identification possibly have the vendor sites of the pricing information relevant with said price contrast request from seller's explanation;
C. in the information that receives in response to searching request in the vendor sites of from a plurality of vendor sites, being discerned; Extract real time price and product information; Wherein, Said searching request is to use the information that is stored in relevant each vendor sites of discerning in the said offline database to make up, and the price of being extracted adopts one of different native language with product information; And
D. be shown to said user to price of being extracted and product information.
CN201210028555.8A 2000-09-29 2001-09-27 The method carrying out real-time online search process by interconnective computer network Expired - Lifetime CN102708114B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US60/192,999 2000-03-28
US23657400P 2000-09-29 2000-09-29
US60/236,574 2000-09-29
US19299901P 2001-06-19 2001-06-19

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CNA01819690XA Division CN1478237A (en) 2000-09-29 2001-09-27 Online intelligent information comparison agent of multilingual electronic data sources over inter-connected computer networks

Publications (2)

Publication Number Publication Date
CN102708114A true CN102708114A (en) 2012-10-03
CN102708114B CN102708114B (en) 2016-08-03

Family

ID=46900906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210028555.8A Expired - Lifetime CN102708114B (en) 2000-09-29 2001-09-27 The method carrying out real-time online search process by interconnective computer network

Country Status (1)

Country Link
CN (1) CN102708114B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI567628B (en) * 2016-01-20 2017-01-21 Long press the message immediately after the search method
CN110347480A (en) * 2019-06-26 2019-10-18 联动优势科技有限公司 The preferred access path method and device of data source containing coincidence data item label
CN111259732A (en) * 2019-12-31 2020-06-09 维沃移动通信有限公司 Information display method and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998032289A2 (en) * 1997-01-17 1998-07-23 The Board Of Regents Of The University Of Washington Method and apparatus for accessing on-line stores
WO2000028455A1 (en) * 1998-11-12 2000-05-18 Ac Properties B.V. A system, method and article of manufacture for advanced mobile bargain shopping
CN1255680A (en) * 1998-12-01 2000-06-07 韩国电子通信研究院 Information searching method and system for on-line shop products
EP1024448A2 (en) * 1999-01-28 2000-08-02 R-U-Sure Ltd. E-commerce system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998032289A2 (en) * 1997-01-17 1998-07-23 The Board Of Regents Of The University Of Washington Method and apparatus for accessing on-line stores
WO2000028455A1 (en) * 1998-11-12 2000-05-18 Ac Properties B.V. A system, method and article of manufacture for advanced mobile bargain shopping
CN1255680A (en) * 1998-12-01 2000-06-07 韩国电子通信研究院 Information searching method and system for on-line shop products
EP1024448A2 (en) * 1999-01-28 2000-08-02 R-U-Sure Ltd. E-commerce system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI567628B (en) * 2016-01-20 2017-01-21 Long press the message immediately after the search method
CN110347480A (en) * 2019-06-26 2019-10-18 联动优势科技有限公司 The preferred access path method and device of data source containing coincidence data item label
CN110347480B (en) * 2019-06-26 2021-06-25 联动优势科技有限公司 Data source preferred access path method and device containing coincident data item label
CN111259732A (en) * 2019-12-31 2020-06-09 维沃移动通信有限公司 Information display method and electronic equipment

Also Published As

Publication number Publication date
CN102708114B (en) 2016-08-03

Similar Documents

Publication Publication Date Title
US10672047B2 (en) Intelligent multimedia e-catalog
US7013289B2 (en) Global electronic commerce system
US7555448B2 (en) Online intelligent information comparison agent of multilingual electronic data sources over inter-connected computer networks
US7299201B2 (en) System and method for designing and operating an electronic store
CN1997992A (en) Online intelligent multilingual comparison-shop agents for wireless networks
Doorenbos et al. A scalable comparison-shopping agent for the world-wide web
Silverman et al. Implications of buyer decision theory for design of e-commerce websites
Yoo et al. Web-based knowledge management for sharing product data in virtual enterprises
US11741514B2 (en) Intelligent multimedia e-catalog
Szykman et al. A web-based system for design artifact modeling
US20150262270A1 (en) Methods and Systems for Integrating Procurement Systems with Electronic Catalogs
Tolman et al. eConstruct: expectations, solutions and results.
CN102708114B (en) The method carrying out real-time online search process by interconnective computer network
Aiken et al. XML in Data Management: Understanding and Applying Them Together
WO2002027604A2 (en) Method and system for performing electronic commerce
Van Amstel et al. An interchange format for cross-media personalized publishing
Heiskala et al. A tool for comparing configurable products
Huang et al. Web-based electronic product cataloguing
Sølvberg et al. Structured Analysis and Design
Minch Engineering Semantic Web for E-Commerce Business Intelligence: a Bilingual EEPS Ontology Model
Silverman et al. Buyer Decision Support Systems and Search Agents for eCommerce Websites
Apshankar et al. UDDI-based Electronic Marketplaces
Ma XMLFinder: an intelligent agent based on CBR for E-Commerce
Phuavong The study of ways to implement internet marketing
Yen Analysis and Customization of Web-Based Electronic Catalogs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20160527

Address after: Delaware

Applicant after: Kaichuang Research Co., Ltd.

Address before: Delaware

Applicant before: Lingqiu Co., Ltd.

C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20160803