WO2009006844A1 - Procédé et système d'application d'un sémantème de page web - Google Patents

Procédé et système d'application d'un sémantème de page web Download PDF

Info

Publication number
WO2009006844A1
WO2009006844A1 PCT/CN2008/071587 CN2008071587W WO2009006844A1 WO 2009006844 A1 WO2009006844 A1 WO 2009006844A1 CN 2008071587 W CN2008071587 W CN 2008071587W WO 2009006844 A1 WO2009006844 A1 WO 2009006844A1
Authority
WO
WIPO (PCT)
Prior art keywords
webpage
content
file
description file
keyword
Prior art date
Application number
PCT/CN2008/071587
Other languages
English (en)
Chinese (zh)
Inventor
Zhiping Meng
Original Assignee
Zhiping Meng
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhiping Meng filed Critical Zhiping Meng
Publication of WO2009006844A1 publication Critical patent/WO2009006844A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • the embodiments of the present invention relate to IT technologies, and in particular, to a method and system for applying webpage semantics. Background technique
  • Web pages are usually generated during the production or maintenance of a website, most of which are scripts.
  • advertisers have developed multiple strategies to maximize the value of advertising.
  • One strategy is that advertisers use common means to provide interactive media or services. That is, advertisers can target ads to a more focused audience, thus providing the possibility for ads to better target audiences. For example, advertisers can post the latest game news through the game section of Sina.com to fans who like games.
  • Another strategy is for advertisers to spread the ads to more audiences as much as possible for general advertising, in order to expect better advertising results.
  • web-based advertising or web advertising
  • web advertising is often presented in the form of a banner ad in front of a website viewer (hereinafter referred to as a user).
  • a user By clicking on a certain banner ad, the user will enter the website of the banner link pointing to an advertiser.
  • the ratio between the number of times a user clicks and the number of times an ad is displayed is called the click rate.
  • the problem now is that although advertisers advertise on a large number of websites, the click-through rate of advertisements is not high, so advertisers are not satisfied with the report on advertising investment.
  • Some advertisers try to improve the efficiency of their ads by tracking their online habits, but this practice It often leads to behavior that infringes on user privacy.
  • the owner of the website (hereafter referred to as the website owner) also encounters the problem of needing to increase the revenue of the advertisement without affecting the user's feelings.
  • Some website owners have chosen to expand the advertising and ignore the user's feelings, resulting in the loss of a large number of website users.
  • the other type is a search engine website, such as google, which enables advertisers to determine their advertising goals to be presented to users along with the ads through search pages associated with the ads. While the search results page gives advertisers the opportunity to point their ads to searchers, the search results are only a small part of the World Wide Web and it is not possible to target all of the ads that need to be targeted to such potential customers (here referred to the aforementioned search) ()) to deliver.
  • Embodiments of the present invention provide a method and system for applying web page semantics to well represent the semantics of web page content.
  • An embodiment of the present invention provides a method for actively adding auxiliary information according to webpage content, including:
  • Determining whether the content of the webpage matches the keyword if the content of the webpage is found to match the keyword, the auxiliary information corresponding to the keyword is retrieved.
  • the embodiment of the present invention further provides an active system for adding auxiliary information according to webpage content, including a client and a server.
  • the server sends the stored auxiliary information to the client according to the request of the client;
  • the client is connected to the server, and specifically includes a keyword matching module, configured to determine whether a keyword and a webpage content match, and if the webpage content and the keyword are found to match, the key is retrieved. Auxiliary information corresponding to the word.
  • the embodiment of the present invention further provides a passive method for joining webpage content and keyword-related auxiliary information links, including:
  • the webpage source file that has been hyperlinked to the auxiliary information address associated with the keyword is sent to the user, and the auxiliary information is extracted at the user end.
  • the embodiment of the invention further provides a logic control statement for adding a webpage source file.
  • Methods including:
  • the client parses the web page file and controls the trigger condition of the logic control statement to execute the operations defined in the logic control statement.
  • the embodiment of the invention further provides a passive method for generating a webpage content description file for a webpage, including:
  • the text content that can be displayed in the webpage is matched with the keyword list, and if the matching is successful, the correspondence between the successfully matched keyword entry and the webpage content is retained in the content description file.
  • the embodiment of the invention further provides a method for selecting or limiting an object to be served for a webpage, including:
  • the user compares the user's own user information with the webpage suitable object information in the process of requesting the webpage, if the user information satisfies the webpage suitable object information And sending the webpage to the user, where the webpage description file includes at least one of a content description file and a function description file of the webpage.
  • An embodiment of the present invention further provides a method for implementing a personalized web browsing client, including:
  • the web browser obtains user information
  • the browser When browsing the webpage, the browser loads the user information, and interacts with the server of the webpage according to the content of the user information.
  • the embodiment of the invention further provides a passive method for generating a webpage function description text, including:
  • the operation logic is that the webpage element needs to perform a corresponding operation under certain circumstances.
  • the embodiment of the invention further provides a passive system for using a web page description file, including a server and a client.
  • the server includes a processing module for processing a webpage file, and generating a content description file and/or a function description file of the webpage in the server;
  • the client specifically includes a webpage browsing module and a function file parser.
  • the webpage browsing module is configured to parse and browse a webpage source file
  • the function description file parser is configured to parse the function description file, and complete the operation logic defined in the function description file by using the function description file and the webpage browsing module;
  • the operation logic is that the webpage element needs to perform a corresponding operation under certain circumstances.
  • the embodiment of the present invention further provides a data processing method for a website and a website or a website and a user, including:
  • the processing party receives the webpage file from the producer, and the function description file and/or the content description file corresponding to the webpage file;
  • the processing party processes the received content description file corresponding to the webpage and/or the webpage according to the operation logic described in the function description file;
  • the generating party is: generating a webpage file, and generating one of a content description file and/or a function description file corresponding to the webpage; the processing party is: processing the received webpage file, and the content description file corresponding to the webpage file and / or a party to the function description file.
  • the semantics of the content of the webpage is abstracted through the form of keywords, and the method can be used to conveniently add auxiliary information to the webpage content. In addition to effectively controlling the webpage, the method can also be used for online advertising.
  • FIG. 1 is a schematic structural diagram of a system according to an active embodiment of the present invention
  • 2 is a schematic structural diagram of another system in the active embodiment of the present invention
  • FIG. 3 is a schematic diagram of a data structure of a keyword matching module according to an embodiment of the present invention
  • FIG. 4 is an active first webpage information processing according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a process for processing an active second webpage information according to an embodiment of the present invention
  • FIG. 6 is a flow chart of a third active web page information processing process in an embodiment of the present invention.
  • FIG. 7 is a flowchart of adding a link to an auxiliary information address for a passive webpage source file in an embodiment of the present invention
  • FIG. 8 is a structural diagram of a system for adding a pointing auxiliary information to a passive webpage source file according to an embodiment of the present invention
  • FIG. 9 is a schematic diagram of analyzing a webpage content and forming a tree in an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a passive generated content description file and a function description file in the embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a passive generation and use function description file in an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of passively placing a webpage by setting a webpage suitable object according to an embodiment of the present invention.
  • the specific implementation of the improvement makes the webpage itself more practical and interactive, and can add a large amount of auxiliary information to the webpage on the basis of retaining the original webpage, and in the present invention, two descriptions are added to the webpage.
  • the profile structure greatly enriches the functionality of the web page.
  • the core of the present invention revolves around a theme: according to the content of the webpage itself, and combining the keyword list and the attributes of the webpage itself (basic information of the webpage), extracting the semantic information of the webpage, and performing some predetermined according to the semantic information of the webpage. Operational logic. In the semantics of the web page There are two ways of analysis, passive and active.
  • the so-called active means that the client runs some programs or plug-ins to perform some analysis functions on the semantics of the webpage without modifying the existing webpage or adding new files, and performs certain applications according to the semantics, for example,
  • the webpage containing the keyword corresponding to the auxiliary information is found, and the webpage is provided with auxiliary information (eg, advertisement, etc.) related to the webpage content.
  • auxiliary information eg, advertisement, etc.
  • the so-called passive type refers to the pre-processing of the webpage or the modification of the original webpage script or the addition of the webpage content description file, the function description file, etc., and the user-side client can identify the webpage file by upgrading the program or installing the plug-in. Modified or newly added web page description files (including content description files and function description files).
  • Passive or proactive methods can analyze the semantics of web pages, and can control some actions of web browsers or external programs of browsers through scripting languages or preset programs. That is to say, the behavior of the browser is not completely controlled by the user, but is determined in part by the content of the webpage itself or a script or other description file pre-set by the webpage (in this patent, the content description file and the function description file) . It is obvious to those skilled in the relevant art to understand the working principles and concepts of the present invention. The techniques and systems for making simple adjustments and modifications in accordance with the principles and concepts of the present invention are within the scope of the present invention.
  • FIG. 1 is a schematic structural diagram of a system according to an active embodiment of the present invention. It is mainly divided into two large parts, one part is the client side and the other part is the server side.
  • the client includes five important modules (the secondary or general modules are not drawn in the legend), and the web browsing module 120 mainly parses the webpage and displays it on the client, and the user can browse the requested webpage through the webpage browsing module 120.
  • the content importing module 130 mainly extracts part or all of the content of the webpage according to different applications, and imports the extracted content into the keyword matching module.
  • the content importing module includes several common content importing methods:
  • the content of the webpage requested by the user ie, the webpage source file
  • the second type is to import the webpage content in the window that the user is displaying or the webpage content in a certain frame (frame), that is, a part of the webpage content.
  • To the keyword matching module third, import the webpage content around the mouse or the area selected by the user into the keyword matching module.
  • the keyword matching module 150 a keyword data structure that can be updated or edited is maintained. As shown in FIG. 3, the keyword list 310 and the corresponding auxiliary information location are usually included. 320.
  • the keyword matching module searches for a matching relationship between the webpage content imported from the content importing module 130 and the keyword list 310.
  • the communication module 140 initiates the auxiliary information to the server according to the keyword corresponding auxiliary information location 320. request.
  • the server 170 finds the corresponding auxiliary information in the auxiliary information storage module, and sends the auxiliary information to the client 110 through the client.
  • the auxiliary information playing module of the terminal 110 plays.
  • FIG. 2 another schematic diagram of the system structure of the active mode in the present invention, and the difference from Fig. 1 is that the keyword matching module is moved from the client to the server.
  • the communication between the content import module and the keyword matching module is accomplished by the communication module 140 of the client 110 and the communication module 180 of the server 170, rather than being completed inside the client as in Fig. 1.
  • the problem with this is that it may reduce the pressure on the client due to matching operations in the keyword matching module, but it may increase the pressure on client and server communication.
  • auxiliary information transmission information of the word
  • the auxiliary information is not necessarily stored on the same server as the keyword matching module, and it is possible that the auxiliary information storage module is stored on another associated server. In order to illustrate the problem, the present invention uses only a relatively simple case.
  • the auxiliary information playing module 160 and the content importing module 130 can be executed as a plug-in or a program in a web browser (such as IE, etc.), or can be placed as a separate program in a web browser.
  • the auxiliary information playing module can play at a certain position within the webpage when playing the auxiliary information, or can be played at a certain position outside the browser window.
  • the content import module is usually interfaced with a web browser so that the user can request or browse the web. For content, it is more flexible and convenient to get web content.
  • FIG. 3 is a schematic diagram of a data structure of a keyword matching module in the present invention.
  • the keyword matching module 140 at least the data structure of the keyword list 310 and the corresponding auxiliary information location 320 are generally retained, so that a simple lookup table can be used. Determine the location of the auxiliary information you need.
  • some auxiliary information may be needed in the data structure design process, for example, the location of auxiliary information playback, keyword priority, client IP address, server IP address, etc. These information are optional, not shown in the figure. 3 - listed.
  • the keyword list can correspond to the auxiliary information or the auxiliary information address. Figure 3 only shows the case where the keyword list and the auxiliary information address correspond.
  • the correspondence between the keyword and the auxiliary information or the auxiliary information address may be that one keyword corresponds to multiple auxiliary information or multiple auxiliary information addresses, and multiple keywords correspond to one auxiliary information or one auxiliary information address, and one keyword corresponds to one Auxiliary information or an auxiliary information address.
  • FIG. 4 is a flowchart of the first active webpage information processing process in the present invention.
  • the processing procedure is an example of FIG. 1, and specifically includes:
  • the client receives the webpage, and the keyword matching module searches the user to browse the webpage content, that is, searches the HTML or XML (extensible markup language) file of the user browsing the webpage;
  • the auxiliary information is retrieved according to the address corresponding to the keyword
  • the content of the webpage may also be the content in the webpage frame that the user is watching, or the content of the display part in the user's browser window, and the content is extracted by some API programs such as JavaScript or the operating system.
  • FIG. 5 is a flowchart of a second active webpage information processing process according to the present invention, and the processing procedure is an example of FIG. 2.
  • the user obtains the content of the webpage, it can be accompanied by a simple operation, for example, 510, pointing the mouse or the cursor to a word that is not understood.
  • the content of the webpage pointed by the mouse or the cursor is transmitted back to the server; 520, and the server is judged.
  • the server retrieves the auxiliary information corresponding to the keyword or retrieves the auxiliary information according to the address corresponding to the search keyword, the server The auxiliary information corresponding to the keyword is transmitted back to the client; 540, and finally the auxiliary information is played on the client.
  • JavaScript or other scripting techniques or you can call some underlying API functions according to different operating systems. For example, in Windows, hooks can be called to implement Windows. Take words from the screen.
  • Fig. 6 is a flow chart showing the third active web page information processing process in the present invention. This process differs greatly from Fig. 4 and Fig. 5 in that it occurs with the user clicking on the web page. 610, the user clicks on a text entry with a hypertext link in the webpage, 620, and determines whether the text entry matches the keyword; 630, if there is a match, the auxiliary information is retrieved according to the address corresponding to the keyword; 640, and will be taken The returned information is played on the client.
  • FIG. 7 is a flowchart of adding a link to an auxiliary information address for a passive webpage source file according to the present invention.
  • a feature of this method is that the process of adding a link to the auxiliary information address must be completed before browsing the webpage, unlike FIG. 4 5, FIG. 6 is to retrieve the auxiliary information directly through the keyword matching module in the process of browsing the webpage. That is to say, the existing webpage file is processed before being browsed, and a link to the auxiliary information address is added.
  • the specific process is, 710, the user obtains the content of the webpage source file; 720, determining whether the content of the webpage source file matches the keyword; 730, if there is a match, adding a link to the auxiliary information address for the matching place in the source file; 740, The web page to which the auxiliary information link is added or a new auxiliary information description file is formed and then sent to the user. In this way, the system structure will be fine-tuned. As shown in Figure 8, a system structure diagram pointing to the auxiliary information is added to the passive web page source file.
  • the webpage before the link to the auxiliary information is added is called the old webpage
  • the webpage after the auxiliary information link is added is called the new webpage.
  • This patent is mainly directed to, but not limited to, processing text information in web pages.
  • auxiliary information such as comments, functions, etc.
  • all displayable text information can be divided into two types, one with One of the link information, there is no link information.
  • ⁇ a href il address> ⁇ /a>
  • a new link structure for the text information matching the link structure may be added to the auxiliary information, or a new link structure may be added to the structure having no link structure but matching the keyword to point to the auxiliary information.
  • the addresses of the auxiliary information corresponding to the keyword are respectively http://2008.htmK http://beijing.html 3 ⁇ 4 http: //01ympicgames.html, where the keyword matching module (FIG. 8) and the function of FIG. 1 to 6 according to the address retrieved auxiliary information auxiliary information is different, where the keyword matching module is also responsible for the auxiliary information
  • the address is added to the location on which the old web page is matched, forming a new web page 820. There are several ways to match:
  • different underlined manners or different colored texts or different fonts may be used to display the link structures of different priorities. For example, colorless means the highest priority, the second is red, and the yellow is lower than red.
  • the user can activate the link of the existing auxiliary information by the following means: (1) The mouse retrieves the text and displays the auxiliary information carried by the webpage; (2) The mouse performs the prescribed action on the text of the substitute auxiliary information link, such as a circle Draw a circle, etc.; (3) Click on a web page entry with a link structure, and at the same time, retrieve the auxiliary information and display it; (4) The browser recognizes the priority automatically or retrieves and displays the auxiliary information according to the time.
  • the user can display the retrieved auxiliary information by: (1) opening a new webpage, ie executing a new browser thread or process, to display the retrieved auxiliary information; (2) passing the internal webpage through the original webpage
  • the program calls or executes a scripting language such as JavaScript or executes an ActiveX control or invokes a browser plug-in to display the retrieved auxiliary information; (3) display the retrieved externally by calling the system API or system device or a new software and hardware program outside the browser.
  • a scripting language such as JavaScript or executes an ActiveX control or invokes a browser plug-in to display the retrieved auxiliary information
  • display the retrieved externally by calling the system API or system device or a new software and hardware program outside the browser.
  • the browser parses the link structure of the webpage, it recognizes the link structure nested inside the link structure, and opens the auxiliary information of these links in a conditional manner. It should be noted that this is also part of the invention, as this nested representation is not supported in the original link structure.
  • the source file content of the new web page is:
  • a blue underline represents the outermost nesting
  • a red underline indicates inner nesting, etc., which requires a browser to multi-layer embedded.
  • the user can activate the link of the existing auxiliary information by the following means: (1) The mouse retrieves the text and displays the auxiliary information carried by the webpage; (2) The mouse performs the prescribed action on the text of the substitute auxiliary information link, such as a circle Draw a circle, etc.; (3) mouse click on the text with the auxiliary information link; (4) The browser recognizes the nesting level automatically or retrieves and displays the auxiliary information according to the time.
  • the user can display the retrieved auxiliary information by: (1) opening a new webpage, ie executing a new browser thread or process, to display the retrieved auxiliary information; (2) passing the original webpage
  • the internal program calls or executes a scripting language such as JavaScript or executes an ActiveX control or invokes a browser plugin to display the retrieved auxiliary information; (3) Displaying the retrieval outside the browser by calling the system API or system device or a new software and hardware program The method of auxiliary information.
  • the user's operation can also be specified in the webpage, for example, the user has a click.
  • the method of adding logical control statements to a web page is a technique for distinguishing existing web page expressions.
  • various embedding methods and various embedding expressions For example, various controls may occur.
  • Keywords (only a few keywords such as if, while, etc.), there may be multiple language unit tags (for example, language end tag, split language tag, etc.), may insert control keys in different locations word. All of the above variations are within the scope of the invention and are also within the scope of the invention.
  • the control keyword and the semantic logic segmentation in the present invention can refer to the control keyword and semantic partitioning method of the C language or other programming languages.
  • you need a special compiler you can use a compiler similar to C language, the whole compiled file, you can also use Matlab's compiler for M language, and the language is compiled line by line. method.
  • control logic structure can be added to the web page file, it can also be in the web page text.
  • the corresponding description file for example, the function description file in the present invention
  • the grammatical structure, keyword type, semantic logic division, etc. used are similar to the method of directly adding control logic to a web page file.
  • the invention introduces a new file format and a profile for the webpage, and takes the auxiliary information description file as an example.
  • the auxiliary information description file is generated by processing the webpage, the support information can be well supported, including retrieval and playback. Auxiliary information and other functions.
  • the auxiliary information description file is actually a kind of function description file.
  • the description file is usually corresponding to the web page.
  • the relationship between the description file and the web page may also be that the plurality of description files correspond to the same web page.
  • a content description file can be generated.
  • the generated method can be as follows: Construct a keyword database with vector semantics, such as Coca-Cola>Beverage>Food, etc. For each keyword, there can be one such vector semantics. Pointing to this keyword, this approach is similar to how search engines classify search keywords.
  • This search engine includes google, Baidu, and so on.
  • the web page is analyzed, the existing DOM technology or the like is used to parse the web page into an object tree, and then the nodes below the object tree are corresponding to the On the keyword database, through this correspondence, you can create a file like this in the web page.
  • This file is called a content description file, and it can also contain some basic content about the web page, such as URL, time information, and so on.
  • the content of the object controlled or edited in the webpage source file and the auxiliary information of the present invention includes all media, for example, video, audio, image, text, and the like.
  • a schematic diagram of a tree is analyzed for analyzing the content of the webpage.
  • the web page is usually analyzed by the DOM and a tree is generated.
  • the DOM logically builds a tree model for XML documents by parsing HTML or XML documents.
  • the nodes of the tree are one-by-one objects. This allows you to manipulate HTML or XML documents by manipulating the tree and these objects, providing a good conceptual framework for working with all aspects of the document. It also prepared for the subsequent generation of content description files and/or function description files.
  • Element is the basic component of XML, which describes the basic information of XML.
  • root element 920 head element 930, file body element. 940, title element 950, link element 960, title element 970, form element 980, table item element 986, body element 987.
  • Attribute nodes contain information about element nodes, usually contained within elements, describing the attributes of the elements.
  • Figure 9 has the 962 hyperlink attribute and the 985 table attribute.
  • Text Contains a lot of text information or just blanks. In Figure 9, there are 951, 961, 962, 971, 981, 982, 983, 984 are all text. And for the convenience of representation, all the text is marked in the 900 text box, which can also facilitate the keyword matching module as a whole.
  • a document node is the parent of all other nodes in the entire document.
  • an ID number or a name can be assigned to each element node, so that there is no need to traverse the entire tree during operation, and a content description file or function description is generated later.
  • the file brings convenience.
  • FIG. 10 is a schematic diagram of a passively generated content description file and a function description file in the invention, using different ID numbers to represent different elements.
  • the title element 950 is represented by ID1
  • the link element 960 is corresponding to different texts 961 and 962.
  • the title element 970 is represented by ID4
  • the same three body elements 987 correspond to three different texts, represented by ID5, ID6, and ID7
  • the last body element 987 is represented by ID8.
  • Use 900 to represent all the text content abstracted from the web page file.
  • This content is input into the keyword matching module 150 to generate a content description file 1000 of the web page (HTML file).
  • the keyword matching module here, although maintaining a keyword database, does not necessarily have a link address of the auxiliary information.
  • the content description file 1000 thus generated may only contain some keyword information and basic information of the webpage.
  • the content description file 1000 contains at least some of the following sections:
  • ID1 corresponds to the keyword "Beijing”
  • ID2 corresponds to the keyword “Olympic”, etc.
  • the basic information of the webpage such as the address of the webpage http://..., the webpage creation time, the webpage suitable for the object, the type of information published by the webpage, and the meta date of some webpages. This information is very useful for users to understand web page information, and it is also convenient to re-process the web page.
  • the basic information in the web page is suitable for the object (this information can also be placed in the function description file, but most of the time it is placed in the content description file), in order to target the user or restrict some users to browse the web page. For example, not all web pages are suitable for children to watch. Adding webpage users' information or restriction information to the content description files of some webpages can prevent some unhealthy information from spreading around. This can also find a more suitable browsing population for the web page.
  • the content description file 1000 includes a webpage suitable object, and indicates that the webpage is suitable for a user who is older than 16 years old.
  • the personalized client needs to actively acquire the webpage content, there are several ways to judge the webpage. Whether the page is suitable for this user:
  • the user obtains the content description file, and finds that the webpage is suitable for the object 16 or older, and the personalized client 111 finds that the user information does not satisfy the above conditions, and stops the request process of the webpage;
  • the user obtains the restriction information of the content description file, for example, greater than 16 years old, and the personalized client 111 finds that the user information does not satisfy the above conditions, and stops the webpage request process; 3.
  • the personalized client 111 first sends the user information or the encrypted user information to the website, and the website finds that the user information of the personalized client 111 is 10 years old and the webpage in the content description file of the requested webpage of the user is suitable for the object condition. If it is satisfied, the request process of the webpage is stopped.
  • the keyword matching module can also include the link address of the auxiliary information, and can also define some operation logic for the web page (HTML or XML file) to increase the function of the web page itself. This will generate the feature description file 2000. This process may also generate a content description file 1000 first, and then process the content description file 1000 through other functional modules to generate a function description file 2000.
  • a feature description file usually contains at least some of the following sections:
  • the basic information of the webpage such as the address of the webpage http://..., the webpage creation time, the webpage suitable for the object, the type of information published by the webpage, and the meta date of some webpages. This information is very useful for users to understand web page information, and it is also convenient to process web pages again.
  • the operation logic that is, the user (client) will actively or passively execute these operational logic when using and browsing the webpage.
  • the active execution means that certain specific program actions are automatically executed according to the operation logic, such as opening/playing/closing auxiliary information, opening/closing a new webpage, adding/deleting objects in a webpage, etc., without requiring a user operation;
  • the passive execution means that the user is required to operate and use the operation logic activated and executed by the user's operation, such as some operation logic when the user moves the mouse or clicks on the webpage, and the operation logic also includes opening/playing/ Turn off auxiliary information, open/close new web pages, add/delete objects in web pages, and more. As shown in the function description file 2000 in FIG.
  • the record "ID1: http://beijing.html: click: new window” indicates that the element whose ID number is ID1 (title element) is linked after being clicked (click) "http: ⁇ beijing.html” and open the obtained web page in a new window (new window); likewise, record “ID4: http://pingpang.jpg: create: beside”, indicating that the ID number is ID4 The element is being created After creating "create”, link to "http://pingpang.jpg” and play the obtained file on both sides of the original page; likewise, record “ID5: http://bootball.swf : mouse on : new layer”, means that when the mouse moves over the element whose ID number is ID5, it retrieves "http://bootball.swf” and plays it in the new layer created.
  • the operation logic can be varied. Sometimes more complex operational logic can be used according to actual needs, which also requires better support for file descriptor parsers. For example, according to some complicated operational logic, there may be advanced conditions such as conditional logic, concurrent logic, and selection logic.
  • the logic in the language this time can use a similar high-level language method to add the if (then) statement, while statement, switch statement, for statement and other more complex logic control structure in the function description file 2000, the operation logic description can also be used Existing computer programming languages, such as C, C++, Java, C# or any scripting language.
  • the two new file structures that exist in Figure 10, the Content Description File 1000 and the Feature Description File 2000, are all designed to make better use of the web page functionality.
  • the main role of the content description file 1000 is to understand the content of the web page from the semantic level, while the existing HTML or XML files mostly describe the web page based on the syntax structure, that is, the existing HTML or XML file. It can only tell the browser how to display the file, and the relationship inside the file, but not what is in the existing HTML or XML file, or what is probably. With such a semantic-based description file, it is convenient to perform complex classification and processing of massive web page data.
  • the main function of the function description file 2000 is to describe some active or passive operation logic when the webpage is used by the user (client), that is, a description of some actions and functions customized by the user, for example, the function description file can be A good completion of the function of providing auxiliary information to the user, but is by no means limited to this function. Users can get today's weather by clicking on the word "weather" on the webpage. Users can send unrecognized words to more professional webpages for translation through function description files. Users can put webpages and some local applications. Linked together, to complete some complex functions, these seemingly cumbersome work can be easily solved and completed through the function description file.
  • Plain text for encoding formats of content description files and function description files The character mode can also be used in binary format. Encrypted or unencrypted methods can be used during the transmission of content description files and function description files.
  • a passive schematic diagram of a system for generating and using a function description file includes a web page file 4000, a processing module 3000, a content description file 1000 generated by the processing module 3000, and a function description file generated by the processing module 3000.
  • the web file 4000 contains all script files of HTML or XML.
  • the processing module is an abstract module, including a keyword matching module and the like.
  • the function is to process the web file. , generating a function description file 2000 or a content description file 1000.
  • the content description file is usually not directly sent to the final browsing user of the webpage, that is, the client, but is merely an intermediate file for processing the webpage file, and is generally stored on the website.
  • the client Normally, only the web page file 4000 and the function description file 2000 are sent to the client, ie the user.
  • the processing flow of the client is such that the client obtains the function description file corresponding to the webpage file and the webpage (can be obtained at the same time or sequentially), and the client parses and opens the webpage through the webpage browsing module, and this time also parses through the function description file.
  • Parsing the function description file and then executing the operation logic in the function file through a browser or an external program, and also activating various operation logic by sensing the user's action, for example, an action accompanying the user's click may be activated. Click the object ID to retrieve the operation logic of an advertisement information.
  • the function description file may be directly generated by the processing module 3000, or the content description file 1000 may be generated by the processing module first, and then the processing module processes the content description file 1000, and finally generates the function description file 2000.
  • the client 110 includes a webpage browsing module 120 and a function description file parser 2100.
  • the webpage browsing module 120 can parse and display the webpage file 4000, and the function description file parser 2100 can parse the function description file and cooperate with the webpage.
  • the browsing module completes the operational logic predefined in the functional description file.
  • the function description file parser 2100 can be a single software or a plug-in in a browser.
  • the function file parser is an abstract module capable of parsing the function description file, and can be a software function upgrade of the current general browser. Can be a new software module.
  • the function description file here can use a similar script file (eg, JavaScript,
  • the language and control structure of XML can also use similar control keywords.
  • the language structure can be similar to inserting a logical control statement directly into a web page. The difference is that there is no need to write the content of the web page element, but the ID of the element can be used, which is simpler.
  • the ID of the element is 790410
  • the language in the function description file may have various expressions. For example, there may be multiple control keywords (only a few keywords such as if, while, etc. are listed above), and there may be multiple languages.
  • the unit's mark for example, the language end mark, the split language mark, etc.). All of the above variations are within the scope of the invention and are also within the scope of the invention.
  • Control keywords and words in the present invention The division of semantic logic can refer to the control keywords and semantic division methods of C language or other programming languages.
  • you need a special compiler-function description file parser you can use a compiler similar to C language, the entire compiled file, you can also use Matlab's compiler for M language, and use the language line by line. The method of compilation.
  • a party website or user must generate a webpage file, and generate a content description file and/or a function description file corresponding to the webpage, hereinafter referred to as a producer; there must be another party (website or user) processing the received webpage file, And a content description file and/or a function description file corresponding to the webpage file, hereinafter referred to as a processing party.
  • the processing party processes the received content description file corresponding to the webpage and/or the webpage according to the operation logic described in the function description file, and the processing method includes: modifying the data, collecting the data, generating the report, calculating the data, and analyzing the data. , and forward data and other operations as required.
  • a large number of complex applications can also be derived by using content description files and functional description files.
  • Example 1 the content exchange file is exchanged between the website and the website.
  • a search engine like Google needs to use a crawler to search a large number of complex web pages for analysis, but if you have a content description file, you may only need to obtain each web content description file for analysis. . There are times when you have to search the content of some websites. If it is very difficult to retrieve and analyze all the pages, you can consider analyzing the content description files corresponding to the pages in the website, which will become very easy.
  • www.baidu.com is the node
  • www.baidu.com/mp3 It is the first-level child node of the node
  • www.baidu.com/mp3/list is the second-level child node of the root node.
  • Example 2 When users use the website, they sometimes have to deal with some website data to deepen their understanding of the website.
  • the function description file of the web page can be used to exchange data between the individual and multiple websites.
  • the function description file can be used to define the content of the webpage and the interface of other websites, and the data of some characteristics can be imported into the website specified by the function description file.
  • a simple example is to directly import some special words into a large search or encyclopedia website, for example, See “Cretaceous" on the web page, you can send the "Cretaceous" entry to the Wikipedia website by mouse operation (such as dragging or clicking)
  • the webpage description file (including the content description file 1000 and the function description file 2000) may include basic information of the webpage.
  • the basic information of the webpage includes the link address of the webpage, the creation time, the save time, and the webpage.
  • Type of information including, web content classification, such as: entertainment, sports, etc.), webpage language (such as: Chinese, English, etc.), fonts used by webpages (eg GB2312, etc.), locations where webpages are generated (eg: region name Or company name), the place where the page is placed (such as: region, etc.), the web page is suitable for the object (such as: the age, gender, hobbies, etc. of the object used by the web page;).
  • the content description file should also contain the name or ID of the element in the web page, and the keyword information corresponding to the name or ID.
  • the personalized placement of the web page cannot be completed, and the user also needs a personalized client 111.
  • a personalized client is a web page receiving browsing device that contains user information.
  • the user information may be collected by the user himself or by the personalized client by other means, the user information including the user's identity information (user's name, address, gender, age, email address, identity ID, etc.), the user's Hobbies (for example, users like cars, music, stocks, etc.).
  • the personalized client can not only judge whether a web page is suitable for the user to browse, but also an important purpose is to push personalized advertisement information for the browser according to the hobbies in the user information, and can also include the following steps:
  • the website obtains a personalized client The information (including hobbies in the user information), matching according to the website resources and the interests of the users, and pushing personalized information (including advertising information) for the user.
  • the information that the web page fits into the object is also a vacant structure in the existing web page structure, so this part can also be placed in the existing web page structure (such as HTML file), usually placed at the head of the web page ⁇ 11 ⁇ 2&(1 > or meta date, so that when the user personalized browser parses the HTML file, it can quickly find out whether the web page is suitable for the user to browse.
  • the existing web page structure such as HTML file
  • the structure of the web page is also within the scope of this patent.
  • the techniques described in the embodiments of the present invention may be implemented in hardware, software, or a combination. If executed in software, the technique can be directly directed to a computer readable medium containing program code, which is executed in a device that encodes a video sequence.
  • the computer readable medium may include RAM (Random Access Memory), SDRAM (Synchronous Dynamic RAM), ROM (Read Only Memory), NVRAM (non-volatile) RAM non-volatile random access memory), EEPROM (Electrically-Erasable Programmable Read-Only Memory), FLASH (flash memory), etc.
  • the method of using the semantics of the webpage actually abstracts the semantics of the webpage content through the form of keywords, and can conveniently use the method to add auxiliary information to the webpage content.
  • the method can also be used for online advertising.
  • the present invention provides an active and passive processing method for web pages, analyzes web page semantics, and provides web-based services and applications by analyzing web page semantics.
  • the so-called active means that the client runs some programs or plug-ins to perform some analysis functions on the semantics of the webpage without modifying the existing webpage, and performs certain specific applications according to the semantics, for example, through semantic analysis of the webpage. , providing webpages with auxiliary information (advertisements, etc.) related to the content of the webpage.
  • the so-called passive type refers to the pre-processing of the webpage or the modification of the original webpage script or the addition of the webpage content description file, the function description file, etc.
  • the user-side client can identify the webpage file by upgrading the program or installing the plug-in.
  • Modified or newly added web page description files (including content description files and function description files).
  • Passive or proactive methods can analyze the semantics of web pages, and can control some actions of web browsers through scripting language or preset programs. In other words, the behavior of the browser is not complete. It is controlled by the user, but is determined in part by the content of the web page itself or a pre-set script of the web page or some other description file (in this patent, a content description file and a function description file).
  • auxiliary information Taking the method of proactively analyzing webpage semantics and providing users with auxiliary information based on webpage content, firstly, the content of webpages received by the client from the website (for example, script webpages such as HTML), through the semantic analysis system of the present invention (active).
  • the keyword matching module in the formula determines whether the content of the webpage matches some predefined keywords, and if so, according to the link address of the auxiliary information corresponding to the keyword that is successfully matched in the keyword matching module
  • the terminal initiates a service request and obtains the required auxiliary information.
  • the auxiliary information may be related information about introduction, analysis, advertisement, and the like of the specific related content on the webpage.
  • the keyword matching module maintained on the client is used to analyze the content of the user webpage, and the obtained matching information (including the address information of the auxiliary information corresponding to the keyword that is successfully matched) is sent to the server. For example, when a user opens a sports website and watches a web page introducing a football match, the keyword matching module will find "soccer" which is the matching information that needs to be searched, and through the keyword matching module can get about "soccer, two The location of the auxiliary information of the word, such as a specific URL on the network (HTTP://.../bootball.html), etc., this specific URL is usually on the server, and the client will serve the service. The terminal initiates a service request and requests to retrieve the auxiliary information of the URL address.
  • the auxiliary information includes all media information such as video, image, sound, text, etc.
  • the inconvenience of browsing the webpage by the user can also greatly improve the effect of the online advertisement delivery. It can also be used to push other auxiliary information than the advertisement, for example, the user encounters a mathematical formula on the webpage, and similarly, the derivation method of the mathematical formula can be pushed to the user by the method.
  • the process of obtaining web page semantic information is the process of generating a web content description file or The process of becoming a new webpage 820. Taking a content description file as an example, the content description file is a concentrated webpage, which can basically summarize the main body of the webpage information content, but may save more storage space than the webpage itself.
  • the method of describing the file also uses the keyword matching process, which is generated by the processing of the webpage source file by the processing module 3000 in the present invention.
  • the function description file of the webpage can often be generated through the webpage content description file.
  • the function description file can be directly generated by the processing module 3000.
  • the present invention can also serve a webpage in a targeted manner or a limited webpage by adding a webpage to a webpage source file in a description file (including a content description file and a function description file) or a webpage source file.
  • the present invention can be implemented by hardware, or can be implemented by means of software plus necessary general hardware platform. Based on such understanding, the technical solution of the present invention can be embodied in the form of a software product that can be stored in a non-volatile storage medium.
  • a computer device (may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne un procédé et un système d'application d'un sémantème de page web. Deux procédés d'analyse d'un sémantème de page web sont le mode passif et le mode actif. Le mode actif comprend le fonctionnement d'un module de mise en correspondance de mots clés chez un client, l'analyse et la recherche de mots clés dans la page web par un mode désigné, l'envoi d'une demande d'informations complémentaires qui est habituellement une demande d'informations publicitaires à un serveur lorsqu'un mot clé correspondant est trouvé, l'obtention d'informations publicitaires, l'affichage des informations publicitaires à une place appropriée. Le mode passif comprend le pré-traitement de la page web, la formation d'un fichier de description de contenu et d'un fichier de description de fonction de la page web, l'envoi du fichier de description de fonction et de la page web au client, le client exécutant la logique de fonctionnement de manière prédéfinie.
PCT/CN2008/071587 2007-07-09 2008-07-08 Procédé et système d'application d'un sémantème de page web WO2009006844A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200710118523.6 2007-07-09
CN2007101185236A CN101154231B (zh) 2007-07-09 2007-07-09 一种应用网页语义的方法和系统

Publications (1)

Publication Number Publication Date
WO2009006844A1 true WO2009006844A1 (fr) 2009-01-15

Family

ID=39255892

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/071587 WO2009006844A1 (fr) 2007-07-09 2008-07-08 Procédé et système d'application d'un sémantème de page web

Country Status (2)

Country Link
CN (1) CN101154231B (fr)
WO (1) WO2009006844A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177610A (zh) * 2011-12-26 2013-06-26 邹仕洪 一种电子书阅读器及其系统
CN115271822A (zh) * 2022-08-11 2022-11-01 北京创新乐知网络技术有限公司 一种推广信息投放方法及装置

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101154231B (zh) * 2007-07-09 2011-06-29 孟智平 一种应用网页语义的方法和系统
CN101207807B (zh) * 2007-12-18 2013-01-02 孟智平 一种处理视频的方法及其系统
CN101582911B (zh) * 2008-05-14 2014-12-03 华为技术有限公司 一种呈现广告的方法、系统和装置
WO2010117443A1 (fr) * 2009-04-06 2010-10-14 Kinetic Procede et appareil permettant l'affichage de resultats de recherche lors de la preparation d'un plan media
CN102713956B (zh) * 2009-09-08 2017-07-28 启创互联公司 使用消费者提供的上下文同步消息传送
US20110106615A1 (en) * 2009-11-03 2011-05-05 Yahoo! Inc. Multimode online advertisements and online advertisement exchanges
CN101827125B (zh) * 2010-03-31 2013-04-10 吉林大学 语义Web服务本体及其应用
CN102170469B (zh) * 2011-04-12 2017-02-22 百度时代网络技术(北京)有限公司 一种基于web访客唯一性的电话效果监测方法
CN104506426B (zh) * 2012-03-23 2019-03-01 北京奇虎科技有限公司 邮件的信息提示方法及装置
CN102663291B (zh) * 2012-03-23 2015-02-25 北京奇虎科技有限公司 邮件的信息提示方法及装置
CN102722573A (zh) * 2012-06-04 2012-10-10 北京吉亚互联科技有限公司 识别用户来源并推送网页的方法和系统
CN102982135A (zh) * 2012-11-16 2013-03-20 北京百度网讯科技有限公司 一种用于提供呈现信息的方法和设备
CN104239012A (zh) * 2013-06-17 2014-12-24 腾讯科技(深圳)有限公司 一种推送网页应用消息的方法和装置
EP3143512A4 (fr) * 2014-05-14 2017-04-19 Pagecloud Inc. Procédés et systèmes de génération de contenu web
CN105653359B (zh) * 2014-11-28 2020-06-09 金蝶软件(中国)有限公司 生成操作说明书的方法和应用系统
CN105224316A (zh) * 2015-09-14 2016-01-06 北京蓝海讯通科技有限公司 一种Web应用程序中的脚本插入方法及装置
CN105843910A (zh) * 2016-03-23 2016-08-10 网易(杭州)网络有限公司 一种电子书内容搜索方法和装置
CN106682063B (zh) * 2016-10-20 2018-04-03 北京跃盟科技有限公司 一种广告信息推送方法、装置以及系统
CN106850572B (zh) * 2016-12-29 2020-07-21 网宿科技股份有限公司 目标资源的访问方法和装置
CN108664535B (zh) * 2017-04-01 2022-08-12 北京京东尚科信息技术有限公司 信息输出方法和装置
CN112989233A (zh) * 2019-12-02 2021-06-18 北京小米移动软件有限公司 文件下载方法、装置及存储介质
CN111753240A (zh) * 2020-06-30 2020-10-09 上海二三四五网络科技有限公司 一种基于h5页面自动提供匹配信息的控制方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002117049A (ja) * 2000-10-05 2002-04-19 Fuji Xerox Co Ltd ウェブページ生成システム及びウェブページ生成方法
CN1487438A (zh) * 2002-09-23 2004-04-07 国际商业机器公司 根据用户输入的url和/或搜索关键词提供广告的方法和系统
CN1932817A (zh) * 2006-09-15 2007-03-21 陈远 通用互联网内容关键词交互系统
CN1932811A (zh) * 2005-09-13 2007-03-21 中时网路科技股份有限公司 内容网站的文字中关联于关键词的链接的建立系统
CN101154231A (zh) * 2007-07-09 2008-04-02 孟智平 一种应用网页语义的方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002117049A (ja) * 2000-10-05 2002-04-19 Fuji Xerox Co Ltd ウェブページ生成システム及びウェブページ生成方法
CN1487438A (zh) * 2002-09-23 2004-04-07 国际商业机器公司 根据用户输入的url和/或搜索关键词提供广告的方法和系统
CN1932811A (zh) * 2005-09-13 2007-03-21 中时网路科技股份有限公司 内容网站的文字中关联于关键词的链接的建立系统
CN1932817A (zh) * 2006-09-15 2007-03-21 陈远 通用互联网内容关键词交互系统
CN101154231A (zh) * 2007-07-09 2008-04-02 孟智平 一种应用网页语义的方法和系统

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177610A (zh) * 2011-12-26 2013-06-26 邹仕洪 一种电子书阅读器及其系统
CN115271822A (zh) * 2022-08-11 2022-11-01 北京创新乐知网络技术有限公司 一种推广信息投放方法及装置
CN115271822B (zh) * 2022-08-11 2023-08-11 北京创新乐知网络技术有限公司 一种推广信息投放方法及装置

Also Published As

Publication number Publication date
CN101154231A (zh) 2008-04-02
CN101154231B (zh) 2011-06-29

Similar Documents

Publication Publication Date Title
WO2009006844A1 (fr) Procédé et système d'application d'un sémantème de page web
US20170154118A1 (en) Search and Search Optimization Using A Pattern Of A Location Identifier
US10628847B2 (en) Search-enhanced semantic advertising
US9037567B2 (en) Generating user-customized search results and building a semantics-enhanced search engine
US8312022B2 (en) Search engine optimization
US20190163758A1 (en) Method and server for presenting a recommended content item to a user
JP5581309B2 (ja) 放送サービスシステムの情報処理方法、その情報処理方法を実施する放送サービスシステム及びその情報処理方法に関する記録媒体
US9002895B2 (en) Systems and methods for providing modular configurable creative units for delivery via intext advertising
CN104866512B (zh) 提取网页内容的方法、装置及系统
US9268856B2 (en) System and method for inclusion of interactive elements on a search results page
US9081853B2 (en) Information display system based on user profile data with assisted and explicit profile modification
WO2015196910A1 (fr) Procédé d'extraction d'informations récapitulatives basé sur un moteur de recherche, appareil et moteur de recherche
US20120054440A1 (en) Systems and methods for providing a hierarchy of cache layers of different types for intext advertising
CN101114284B (zh) 一种显示网页内容相关信息的方法及系统
WO2015084457A1 (fr) Insertion dynamique d'annonce native
KR20030079926A (ko) 미디어 객체를 통일적으로 추출하기 위한 시스템
WO2010000182A1 (fr) Procédé et système de distribution de contenu par mots-clés
US20220292160A1 (en) Automated system and method for creating structured data objects for a media-based electronic document
CN101950289A (zh) 一种应用网页语义的方法和系统
WO2016046650A1 (fr) Procédé et serveur de traitement d'un message pour déterminer une intention
KR101485592B1 (ko) 검색결과 제작 방법
US20230061394A1 (en) Systems and methods for dynamic hyperlinking
Nazar Exploring SEO techniques for Web 2.0 websites
KR20090110764A (ko) 멀티미디어 콘텐츠 정보에 포함된 메타 정보 기반 키워드광고 서비스 방법 및 그 서비스를 위한 시스템
KR101372580B1 (ko) 브라우저 ui를 제공하기 위한 방법, 단말 장치, 서버 및 컴퓨터 판독 가능한 기록 매체

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08773141

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08773141

Country of ref document: EP

Kind code of ref document: A1