WO2017000402A1 - 网页生成方法和装置 - Google Patents

网页生成方法和装置 Download PDF

Info

Publication number
WO2017000402A1
WO2017000402A1 PCT/CN2015/090703 CN2015090703W WO2017000402A1 WO 2017000402 A1 WO2017000402 A1 WO 2017000402A1 CN 2015090703 W CN2015090703 W CN 2015090703W WO 2017000402 A1 WO2017000402 A1 WO 2017000402A1
Authority
WO
WIPO (PCT)
Prior art keywords
webpage
push information
keyword
content
generating
Prior art date
Application number
PCT/CN2015/090703
Other languages
English (en)
French (fr)
Inventor
裘皓萍
徐云峰
陈炜于
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2017000402A1 publication Critical patent/WO2017000402A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs

Definitions

  • the present application relates to the field of computer technologies, and in particular, to the field of Internet technologies, and in particular, to a web page generation method and apparatus.
  • Information Push also known as “webcasting” is a technology that reduces information overload by pushing the information the user needs on the Internet through certain technical standards or protocols. Information push technology can reduce the time it takes for users to search on the network by actively pushing information to users.
  • the existing information push method usually loads various push information directly on the webpage, and the push information is obviously different from the content of the webpage. Therefore, there is insufficient use of the webpage content related data, and the information push lacks targetedness. problem.
  • the purpose of the present application is to propose an improved web page generation method and apparatus to solve the technical problems mentioned in the above background art.
  • the present application provides a method for generating a webpage, the method comprising: receiving a webpage browsing request of a user, wherein the webpage browsing request includes a webpage; and performing content parsing and extracting on a webpage page corresponding to the webpage a keyword set; selecting at least one candidate push information to generate a push information set based on a matching relationship between the keyword set and each piece of candidate push information; and based on the content of the web page and the push information Collect, generate new web pages.
  • the content parsing of the webpage corresponding to the webpage to extract a keyword set includes: performing statistical analysis and/or semantic analysis on content of the webpage page corresponding to the webpage, extracting at least one a keyword; generating a keyword set based on the at least one keyword.
  • the generating a set of keywords based on the at least one keyword comprises: expanding, for a single keyword of the at least one keyword, to generate an extended keyword, wherein the extended key The word includes at least one of the single keyword and the following: a synonym of the single keyword, a synonym of the single keyword, a related word of the single keyword; generating a keyword set based on the extended keyword .
  • the selecting, according to the matching relationship between the keyword set and each piece of candidate push information, selecting at least one candidate push information to generate a push information set includes: parsing content of each candidate push information, respectively And extracting the candidate push information keyword set; performing the similarity calculation on each of the keyword push sets and each candidate push information keyword set; and selecting at least one candidate push information to generate the push information set based on the result of the similarity calculation.
  • the selecting, based on the result of the similarity calculation, the at least one candidate push information to generate the push information set comprising: selecting the quantity value based on the result of the similarity calculation and the magnitude of the preset push information.
  • the candidate push information of the number generates a push information set.
  • the generating a new webpage based on the content of the webpage page and the push information set includes: pushing information in the push information set and corresponding keywords in content of the webpage page Associated with the way to generate new web pages.
  • the generating a new webpage based on the content of the webpage page and the push information set comprises: setting a push information in the push information set separately from a content of the webpage page, Generate a new web page.
  • the application provides a webpage generating apparatus, the apparatus comprising: a receiving unit configured to receive a webpage browsing request of a user, wherein the webpage browsing request includes a webpage; and a parsing unit configured to be used by the webpage
  • the webpage page corresponding to the webpage performs content analysis, extracts a keyword set, and the information selecting unit is configured to be based on the key a matching relationship between the word set and each candidate push information, selecting at least one candidate push information to generate a push information set; and a generating unit configured to generate a new web page based on the content of the web page and the push information set.
  • the parsing unit includes: an analyzing module configured to perform statistical analysis and/or semantic analysis on content of the webpage page corresponding to the webpage, extracting at least one keyword; and generating a module configured to be configured to A keyword set is generated based on the at least one keyword.
  • the generating module is further configured to: expand a single keyword of the at least one keyword to generate an extended keyword, wherein the extended keyword includes the single keyword and At least one of the following: a synonym of the single keyword, a synonym of the single keyword, a related word of the single keyword; generating a keyword set based on the extended keyword.
  • the information selection unit includes: a parsing module configured to perform content parsing on each piece of candidate push information, respectively extract a candidate push information keyword set; and a similarity calculation module configured to use the key The word set respectively performs similarity calculation with each candidate push information keyword set; and the selecting module is configured to select at least one candidate push information to generate a push information set based on the result of the similarity calculation.
  • the selecting module is further configured to: select, according to the result of the similarity calculation and the magnitude of the preset push information, the candidate push information of the magnitude to generate a push information set.
  • the generating unit is further configured to generate a new web page in a manner that the push information in the push information set is associated with a corresponding keyword in the content of the web page.
  • the generating unit is further configured to generate a new webpage in a manner that the push information in the push information set is separately set from the content of the webpage page.
  • the webpage generating method and apparatus provided by the present application extracts a keyword set by performing content analysis on a webpage page corresponding to a webpage requested by a user, and then selects a push information based on a matching relationship between the keyword set and each candidate push information. Finally, based on the content of the web page and the push information to generate a new web page, thereby effectively utilizing the content of the web page Data, to achieve a targeted information push.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flow chart of one embodiment of a web page generating method according to the present application.
  • FIG. 3 is a schematic diagram of an application scenario of a webpage generating method according to the present application.
  • FIG. 4 is a flow chart of still another embodiment of a web page generating method according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of a webpage generating apparatus according to the present application.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server of an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 of an embodiment of a web page generation method or web page generation apparatus to which the present application may be applied.
  • system architecture 100 can include terminal devices 101, 102, 103, network 104, and server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • Network 104 may include various types of connections, such as wired, wireless communication links, fiber optic cables, and the like.
  • the user can interact with the server 105 over the network 104 using the terminal devices 101, 102, 103 to receive or transmit messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as a web browser application, a shopping application, and a search class. Use, instant messaging tools, email client, social platform software, etc.
  • the terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic The video specialist compresses the standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV) player, laptop portable computer and desktop computer, and the like.
  • MP3 players Motion Picture Experts Group Audio Layer III, dynamic The video specialist compresses the standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV
  • the server 105 may be a server that provides various services, such as a back-end web server that provides support for web pages displayed on the terminal devices 101, 102, 103.
  • the background web server may perform processing such as analyzing the received web page request and the like, and feed back the processing result (for example, web page page data) to the terminal device.
  • the webpage generating method provided by the embodiment of the present application is generally executed by the server 105. Accordingly, the webpage generating apparatus is generally disposed in the server 105.
  • terminal devices, networks, and servers in Figure 1 is merely illustrative. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
  • the method for generating a webpage includes the following steps:
  • Step 201 Receive a webpage browsing request of the user.
  • the electronic device (for example, the server shown in FIG. 1) on which the webpage generating method is executed may receive a webpage browsing request from a terminal that the user performs webpage browsing by using a wired connection manner or a wireless connection manner, where
  • the above webpage browsing request includes the address of the webpage that the user desires to browse, that is, the webpage.
  • the URL is generally represented by a Uniform Resource Locator (URL).
  • URL Uniform Resource Locator
  • the above wireless connection manner may include but is not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods that are now known or developed in the future. .
  • the user browses the webpage by using a web browser installed on the terminal.
  • the user can initiate a webpage browsing request to the webpage server by directly inputting the webpage or clicking a link in the webpage presented in the webpage browser.
  • the webpage may include an html format, an xhtml format, an asp format, a php format, a jsp format, and an shtml format.
  • Nsp format, xml format webpage or other webpages that will be developed in the future (as long as the webpage file of this format can be opened by a browser and browsed the included images, animations, texts, etc.).
  • Step 202 Perform content analysis on the webpage page corresponding to the webpage, and extract a keyword set.
  • the electronic device may first obtain a webpage page corresponding to the webpage; and then use various analysis means to use the content of the webpage. Analyze to extract one or more keywords.
  • the manner of analyzing the content of the web page may be a statistical analysis manner. For example, the frequency of occurrence of each word existing in the above content may be counted and sorted, and then one or more words with the highest frequency of occurrence are selected as the keywords to be extracted.
  • the manner of analyzing the content of the web page may also be a semantic analysis manner.
  • the content of the webpage page may be subjected to a full-segmentation method or the like to divide the content into words; and then the importance calculation is performed on the obtained word (for example, a term frequency-inverse document frequency (Term Frequency-Inverse Document Frequency) is used. TF-IDF)), based on the result of the importance calculation to get the keyword.
  • TF-IDF term frequency-inverse document frequency
  • the N-Gram model described here is a commonly used language model. For Chinese, it can be called the Chinese Language Model (CLM).
  • CLM Chinese Language Model
  • the N-Gram model is based on the assumption that the appearance of the Nth word is only related to the previous N-1 words, but Any word is irrelevant, the probability of the whole sentence is the product of the probability of occurrence of each word, and these probabilities can be obtained by counting the number of simultaneous occurrences of N words directly from the corpus.
  • the word frequency-reverse file frequency method can be used to calculate the importance of these words, and then the words are selected as keywords or importance scores based on importance.
  • the main idea of the word frequency-reverse file frequency method is that if a word or phrase appears in an article with a high frequency (Term Frequency, TF) and rarely appears in other articles, the word or phrase is considered very Good class distinguishing ability, suitable for classification.
  • the Inverse Document Frequency (IDF) mainly means that if there are fewer documents containing a certain word or phrase, the larger the IDF, the better the class distinguishing ability of the word or phrase.
  • IDF Inverse Document Frequency
  • Step 203 Select at least one candidate push information to generate a push information set based on a matching relationship between the keyword set and each candidate push information.
  • the electronic device on which the webpage generating method runs may pre-store a plurality of candidate push information, and the candidate push information may be used in combination with the content of the webpage to be used as a whole on the browser. Presented.
  • the electronic device may match the content of the keyword set and the pieces of candidate push information one by one, and determine the candidate push according to the number of keywords included in the content of each candidate push information.
  • the matching relationship between information and keyword collection For example, if the content of a candidate push information includes all the keywords in the keyword set, it may be determined that the candidate push information has an exact match relationship with the keyword set; and if the content of a candidate push information is in the content If a part of the keywords in the keyword set is included, it may be determined that the candidate push information has a partial matching relationship with the keyword set; and if the content of the candidate push information does not include any keyword in the keyword set Then, it can be determined that there is a mismatch between the candidate push information and the keyword set.
  • the electronic device may select at least one candidate push information from the plurality of candidate push information, and thereby generate a push information set.
  • the electronic device may select candidate push information that matches the keyword set in an exact match relationship as the push information to be combined with the content of the web page.
  • Step 204 Generate a new webpage based on the content of the webpage page and the push information set.
  • the electronic device may combine the content of the webpage page (ie, the page content of the page corresponding to the webpage) with the push information set as the content of the new webpage, and thereby generate a new webpage.
  • the electronic device may generate a new webpage by using the push information in the push information set to be associated with the corresponding keyword in the content of the webpage.
  • the electronic device may first determine its location in the webpage page; then, search for the push information matching the keyword from the push information set; The found push information is set to be associated with the keyword display. For example, if the user's mouse hovers over the above keyword of the web page, the corresponding push information will pop up.
  • the electronic device may generate a new webpage in a manner that the push information in the push information set is separately set from the content of the webpage.
  • the electronic device may set the push information and the content of the webpage page to be displayed in different display areas of the new webpage.
  • the user may centrally view the push information and will not be viewed when viewing the content of the webpage page. interference.
  • FIG. 3 is a schematic diagram of an application scenario of a webpage generating method according to the embodiment.
  • the user first initiates a browsing request for the knowledge type webpage; afterwards, the web server can obtain the content of the webpage page in the background, and extract the keyword “automatic driving”; then, the web server is pre-processed from the web server. One or more pieces of information matching the keyword "automatic driving” are found as the push information in the stored candidate push information; finally, the web server may take the push information and the keyword "automatic driving” to generate a new webpage.
  • the push information will pop up as shown in FIG.
  • the method provided by the above embodiment of the present application achieves targeted information push by associating the content of the web page with the push information.
  • the process 400 of the web page generation method includes the following steps:
  • Step 401 Receive a webpage browsing request of the user.
  • the electronic device (for example, the server shown in FIG. 1) on which the webpage generating method is executed may receive a webpage browsing request from a terminal that the user performs webpage browsing by using a wired connection manner or a wireless connection manner, where
  • the above webpage browsing request includes the address of the webpage that the user desires to browse, that is, the webpage.
  • Step 402 Perform statistical analysis and/or semantic analysis on the content of the webpage page corresponding to the webpage, and extract at least one keyword.
  • the manner of analyzing the content of the web page may be a statistical analysis manner. For example, the frequency of occurrence of each word existing in the above content may be counted and sorted, and then one or more words with the highest frequency of occurrence are selected as the keywords to be extracted.
  • the manner of analyzing the content of the web page may be a semantic analysis method. As an example, the content of the webpage page may be subjected to a full segmentation method or the like to divide the content into words; then the importance calculation is performed on the obtained word, and the keyword is obtained based on the result of the importance calculation.
  • the two methods of statistical analysis and semantic analysis can also be used to extract keywords.
  • Step 403 expanding a single keyword in at least one keyword to generate an extended keyword.
  • a single keyword of the at least one keyword may be extended to generate an extended keyword, wherein the extended keyword includes at least one of the above single keyword and the following: the single key Synonym of the word, for example, the keyword “child” may have the synonym "child”; the synonym of the keyword, for example, the keyword “Chinese medicine” may have the synonym "herbal”, and the "attendance” may have the synonym "participation”; the key A related word of a word, for example, the keyword “cold” may have a related word such as "fever” or "flu”.
  • Step 404 Generate a keyword set based on the extended keyword.
  • an extended keyword may be generated for each keyword; after that, the extended keyword of each of the at least one keyword may be aggregated to generate a keyword set (which includes All of the extended keywords of at least one of the above keywords).
  • Step 405 Select at least one candidate push information to generate a push information set based on a matching relationship between the keyword set and each candidate push information.
  • this step can be performed as follows:
  • the content of each candidate push information is analyzed, and the candidate push information keyword set is extracted.
  • the content analysis method for each piece of candidate push information may be performed by the same content analysis method as that of step 402, whereby the corresponding candidate push information keyword set may be extracted for each piece of candidate push information.
  • the keyword set obtained in step 404 is similarly calculated with each candidate push information keyword set.
  • the candidate push information may be first sorted according to the result of the similarity calculation to obtain a candidate push information sequence (for example, the order of similarity is taken from high to low); after that, according to the magnitude condition (required) The number of candidate push information or the threshold condition (for example, the similarity value is greater than a preset threshold), and at least one candidate push information is selected from the above sequence to generate a push information set.
  • a candidate push information sequence for example, the order of similarity is taken from high to low
  • the number of candidate push information or the threshold condition for example, the similarity value is greater than a preset threshold
  • Step 406 Generate a new webpage based on the content of the webpage page and the push information set.
  • the electronic device may combine the content of the webpage page with the push information set as the content of the new webpage, and thereby generate a new webpage.
  • the flow 400 of the web page generating method in the present embodiment highlights the step of expanding the keyword as compared with the embodiment corresponding to FIG. 2.
  • the solution described in this embodiment can introduce more keyword related data, thereby achieving more comprehensive selection of candidate push information and more efficient web page generation.
  • the present application provides an embodiment of a webpage generating apparatus, and the apparatus embodiment and the method shown in FIG.
  • the device can be specifically applied to various electronic devices.
  • the webpage generating apparatus 500 described in this embodiment includes a receiving unit 501, a parsing unit 502, an information selecting unit 503, and a generating unit 504.
  • the receiving unit 501 is configured to receive a webpage browsing request of the user, where the webpage browsing request includes a webpage;
  • the parsing unit 502 is configured to perform content parsing on the webpage page corresponding to the webpage, and extract a keyword set;
  • the information selecting unit The 503 is configured to select at least one candidate push information to generate a push information set based on a matching relationship between the keyword set and each candidate push information, and the generating unit 504 is configured to use the content of the webpage and the push information set according to the webpage. , generate a new page.
  • the receiving unit 501 of the webpage generating apparatus 500 may receive a webpage browsing request from a terminal that the user performs webpage browsing by using a wired connection manner or a wireless connection manner, where the webpage browsing request includes a webpage that the user desires to browse. Address, which is the URL.
  • the parsing unit 502 may first obtain the webpage page corresponding to the webpage; and then analyze the content of the webpage by using various analysis means, thereby extracting one or Multiple keywords.
  • the webpage generating apparatus 500 may pre-store a plurality of candidate push information, and the candidate push information may be combined with the content of the webpage page to be presented as a whole on the browser.
  • the information selecting unit 503 of the web page generating apparatus 500 can respectively match the keyword set and the content of each piece of candidate push information one by one, and determine the number according to the number of keywords included in the content of each candidate push information.
  • the information extracting unit 503 may select at least one candidate push information from the plurality of candidate push information, and thereby generate a push information set.
  • the generating unit 504 may combine the content of the webpage page (ie, the page content of the webpage corresponding to the webpage) with the push information set as the content of the new webpage, and thereby generate a new webpage.
  • the web page generating apparatus 500 described above also includes other well-known structures such as a processor, a memory, etc., which are not shown in FIG. 5 in order to unnecessarily obscure the embodiments of the present disclosure.
  • FIG. 6 a block diagram of a computer system 600 suitable for use in implementing a terminal device or server of an embodiment of the present application is shown.
  • computer system 600 includes a central processing unit (CPU) 601 that can be loaded into a program in random access memory (RAM) 603 according to a program stored in read only memory (ROM) 602 or from storage portion 608. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM random access memory
  • various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more logic for implementing the specified.
  • Functional executable instructions can also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may actually be executed substantially in parallel, they Sometimes it can be performed in the reverse order, depending on the function involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present application may be implemented by software or by hardware.
  • the described unit may also be provided in a processor, for example, as a processor including a receiving unit, a parsing unit, an information selecting unit, and a generating unit.
  • the names of these units do not constitute a limitation on the unit itself in some cases.
  • the receiving unit may also be described as “a unit that receives a web browsing request of a user”.
  • the present application further provides a computer readable storage medium, which may be a computer readable storage medium included in the apparatus described in the foregoing embodiment, or may exist separately, not A computer readable storage medium that is assembled into a terminal.
  • the computer readable storage medium stores one or more programs that are used by one or more processors to perform the web page generation method described in this application.

Abstract

本申请公开了网页生成方法和装置。所述方法的一具体实施方式包括:接收用户的网页浏览请求,其中,所述网页浏览请求包括网址;对所述网址所对应的网页页面进行内容解析,提取关键词集合;基于所述关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合;基于所述网页页面的内容和所述推送信息集合,生成新网页。该实施方式实现了富于针对性的信息推送。

Description

网页生成方法和装置
相关申请的交叉引用
本申请要求百度在线网络技术(北京)有限公司于2015年6月30日提交的、发明名称为“网页生成方法和装置”的、中国专利申请号“201510385768.X”的优先权,其全部内容作为整体并入本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及互联网技术领域,尤其涉及网页生成方法和和装置。
背景技术
信息推送,又称为“网络广播”,是通过一定的技术标准或协议,在互联网上通过推送用户需要的信息来减少信息过载的一项技术。信息推送技术通过主动推送信息给用户,可以减少用户在网络上搜索所花的时间。
然而,现有的信息推送方式通常是在网页上直接加载各种推送信息,这些推送信息与所在网页的内容有明显的差异,从而,存在着网页内容相关数据利用不足,信息推送缺乏针对性的问题。
发明内容
本申请的目的在于提出一种改进的网页生成方法和装置,来解决以上背景技术部分提到的技术问题。
第一方面,本申请提供了一种网页生成方法,所述方法包括:接收用户的网页浏览请求,其中,所述网页浏览请求包括网址;对所述网址所对应的网页页面进行内容解析,提取关键词集合;基于所述关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合;基于所述网页页面的内容和所述推送信息 集合,生成新网页。
在一些实施例中,所述对所述网址所对应的网页页面进行内容解析提取关键词集合,包括:对所述网址所对应的网页页面的内容进行统计分析和/或语义分析,提取至少一个关键词;基于所述至少一个关键词,生成关键词集合。
在一些实施例中,所述基于所述至少一个关键词,生成关键词集合,包括:对于所述至少一个关键词中的单个关键词,进行扩展以生成扩展关键词,其中,所述扩展关键词包括所述单个关键词和以下中的至少一项:所述单个关键词的同义词、所述单个关键词的近义词、所述单个关键词的关联词;基于所述扩展关键词,生成关键词集合。
在一些实施例中,所述基于所述关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合,包括:对各条候选推送信息进行内容解析,分别提取候选推送信息关键词集合;将所述关键词集合分别与各个候选推送信息关键词集合进行相似度计算;基于相似度计算的结果,选取至少一条候选推送信息生成推送信息集合。
在一些实施例中,所述基于相似度计算的结果,选取至少一条候选推送信息生成推送信息集合,包括:基于相似度计算的结果以及预设的推送信息的量值,选取所述量值个数的候选推送信息生成推送信息集合。
在一些实施例中,所述基于所述网页页面的内容和所述推送信息集合,生成新网页,包括:以所述推送信息集合中的推送信息与所述网页页面的内容中的相应关键词相关联的方式,生成新网页。
在一些实施例中,所述基于所述网页页面的内容和所述推送信息集合,生成新网页,包括:以所述推送信息集合中的推送信息与所述网页页面的内容分开设置的方式,生成新网页。
第二方面,本申请提供了一种网页生成装置,所述装置包括:接收单元,配置用于接收用户的网页浏览请求,其中,所述网页浏览请求包括网址;解析单元,配置用于对所述网址所对应的网页页面进行内容解析,提取关键词集合;信息选取单元,配置用于基于所述关键 词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合;生成单元,配置用于基于所述网页页面的内容和所述推送信息集合,生成新网页。
在一些实施例中,所述解析单元包括:分析模块,配置用于对所述网址所对应的网页页面的内容进行统计分析和/或语义分析,提取至少一个关键词;生成模块,配置用于基于所述至少一个关键词,生成关键词集合。
在一些实施例中,所述生成模块进一步配置用于:对于所述至少一个关键词中的单个关键词,进行扩展以生成扩展关键词,其中,所述扩展关键词包括所述单个关键词和以下中的至少一项:所述单个关键词的同义词、所述单个关键词的近义词、所述单个关键词的关联词;基于所述扩展关键词,生成关键词集合。
在一些实施例中,所述信息选取单元包括:解析模块,配置用于对各条候选推送信息进行内容解析,分别提取候选推送信息关键词集合;相似度计算模块,配置用于将所述关键词集合分别与各个候选推送信息关键词集合进行相似度计算;选取模块,配置用于基于相似度计算的结果,选取至少一条候选推送信息生成推送信息集合。
在一些实施例中,所述选取模块进一步配置用于:基于相似度计算的结果以及预设的推送信息的量值,选取所述量值个数的候选推送信息生成推送信息集合。
在一些实施例中,所述生成单元进一步配置用于:以所述推送信息集合中的推送信息与所述网页页面的内容中的相应关键词相关联的方式,生成新网页。
在一些实施例中,所述生成单元进一步配置用于:以所述推送信息集合中的推送信息与所述网页页面的内容分开设置的方式,生成新网页。
本申请提供的网页生成方法和装置,通过对用户请求的网址所对应的网页页面进行内容解析以便提取关键词集合,而后基于关键词集合与各条候选推送信息之间的匹配关系来选取推送信息,最后基于网页页面的内容和推送信息来生成新网页,从而有效利用了网页的内容 数据,实现了富于针对性的信息推送。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的网页生成方法的一个实施例的流程图;
图3是根据本申请的网页生成方法的一个应用场景的示意图;
图4是根据本申请的网页生成方法的又一个实施例的流程图;
图5是根据本申请的网页生成装置的一个实施例的结构示意图;
图6是适于用来实现本申请实施例的终端设备或服务器的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的网页生成方法或网页生成装置的实施例的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应 用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的网页提供支持的后台网页服务器。后台网页服务器可以对接收到的网页页面请求等数据进行分析等处理,并将处理结果(例如网页页面数据)反馈给终端设备。
需要说明的是,本申请实施例所提供的网页生成方法一般由服务器105执行,相应地,网页生成装置一般设置于服务器105中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的网页生成方法的一个实施例的流程200。所述的网页生成方法,包括以下步骤:
步骤201,接收用户的网页浏览请求。
在本实施例中,网页生成方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从用户利用其进行网页浏览的终端接收网页浏览请求,其中,上述网页浏览请求包括了用户期望浏览的网页的地址,即网址。实践中,网址一般由统一资源定位符(Uniform Resource Locator,URL)来表示。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。
通常,用户利用终端上安装的网页浏览器来浏览网页,这时,用户可以通过直接输入网址或者点击网页浏览器中呈现的网页中的链接来向网页服务器发起网页浏览请求。在本实施例中,上述网页可以包括html格式、xhtml格式、asp格式、php格式、jsp格式、shtml格式、 nsp格式、xml格式的网页或者其它未来将开发的格式的网页(只要这种格式的网页文件可以用浏览器打开并浏览其包含的图片、动画、文字等内容)。
步骤202,对网址所对应的网页页面进行内容解析,提取关键词集合。
在本实施例中,基于步骤201中得到的网址,上述电子设备(例如图1所示的服务器)可以首先获取上述网址所对应的网页页面;之后再利用各种分析手段对上述网页页面的内容进行分析,从而提取出一个或多个关键词。
在本实施例的一些可选的实现方式中,对上述网页页面的内容的分析方式可以是统计分析方式。例如,可以对上述内容中存在的各个词语的出现频率进行统计和排序,之后,再选取出现频率排序靠前的一个或多个词语作为待提取的关键词。
在本实施例的一些可选的实现方式中,对上述网页页面的内容的分析方式还可以是语义分析方式。作为示例,可以对网页页面的内容进行全切分方法等处理,把内容分割成词;再对所得到的词进行重要性计算(例如采用词频-逆向文件频率方法(Term Frequency-Inverse Document Frequency,TF-IDF)),基于重要性计算的结果来得到关键词。
利用全切分方法,可以首先切分出与语言词库匹配的所有可能的词,再运用统计语言模型确定最优的切分结果。以用户输入信息“南京市长江大桥”为例,可以首先进行语言词库匹配,找到匹配的所有词——南京,市,长江,大桥,南京市,长江大桥,市长,江大桥,江大,桥;这些词以词网格(word lattices)形式表示,接着基于词网格做路径搜索,再基于统计语言模型(例如N-Gram模型,)找到最优路径。如果结果显示“南京市长江大桥”的语言模型得分最高,则“南京市长江大桥”即为“南京市长江大桥”的最优切分。在这里所述的N-Gram模型是常用的一种语言模型,对中文而言,可以称之为汉语语言模型(CLM,Chinese Language Model)。该N-Gram模型基于这样一种假设,第N个词的出现只与前面N-1个词相关,而与其 它任何词都不相关,整句的概率就是各个词出现概率的乘积,而这些概率可以通过直接从语料中统计N个词同时出现的次数得到。
利用全切分方法将内容分割成词之后,可以采用词频-逆向文件频率方法对这些词进行重要性计算,再基于重要性来选取词作为关键词或者对这些词进行重要性评分。词频-逆向文件频率方法的主要思想是,如果某个词或短语在一篇文章中出现的频率(Term Frequency,TF)高,并且在其他文章中很少出现,则认为此词或者短语具有很好的类别区分能力,适合用来分类。而逆向文件频率(Inverse Document Frequency,IDF)主要是指,如果包含某个词或短语的文档越少,则IDF越大,则说明该词或短语具有很好的类别区分能力。由此,使用词频-逆向文件频率方法,可以计算某个词或短语在某篇文章里面的重要性。
需要说明的是,上述语义分析方式的各种方法是目前广泛研究和应用的公知技术,在此不再赘述。
步骤203,基于关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合。
在本实施例中,网页生成方法运行于其上的电子设备上可以预先存储多条候选推送信息,这些候选推送信息可以用于与上述网页页面的内容相结合,以便作为一个整体在浏览器上呈现。
在本实施例中,上述电子设备可以将上述关键词集合与各条候选推送信息的内容分别进行逐一匹配,并根据每条候选推送信息的内容所包括的关键词的数目来确定该条候选推送信息与关键词集合的匹配关系。例如,如果某条候选推送信息的内容中包括了上述关键词集合中的所有关键词,则可以确定该条候选推送信息与关键词集合存在完全匹配关系;而如果某条候选推送信息的内容中包括了上述关键词集合中的部分关键词,则可以确定该条候选推送信息与关键词集合存在部分匹配关系;而如果某条候选推送信息的内容中不包括上述关键词集合中的任何关键词,则可以确定该条候选推送信息与关键词集合存在不匹配关系。根据匹配关系,上述电子设备可以从多条候选推送信息中选取至少一条候选推送信息,并由此生成推送信息集合。例如, 上述电子设备可以选取与关键词集合符合完全匹配关系的候选推送信息,作为待与上述网页页面的内容结合的推送信息。
步骤204,基于网页页面的内容和推送信息集合,生成新网页。
在本实施例中,上述电子设备可以将上述网页页面的内容(即上述网址对应的页面的页面内容)与推送信息集合相结合作为新网页的内容,并由此生成新网页。
在本实施例的一些可选的实现方式中,上述电子设备可以以上述推送信息集合中的推送信息与上述网页页面的内容中的相应关键词相关联的方式,生成新网页。作为示例,对于网页页面的内容中的某个关键词而言,上述电子设备可以首先确定其在网页页面中的位置;接着,从推送信息集合中查找与该关键词相匹配的推送信息;最后,将查找到的推送信息与该关键词设置为关联显示,例如,如果用户的鼠标悬停在网页页面的上述关键词上时,将弹出相应的推送信息。
在本实施例的一些可选的实现方式中,上述电子设备可以以上述推送信息集合中的推送信息与上述网页页面的内容分开设置的方式,生成新网页。作为示例,上述电子设备可以将推送信息与上述网页页面的内容设置为在新网页的不同显示区域进行显示,这时,用户可以集中查看推送信息且在其观看上述网页页面的内容时不会被干扰。
继续参见图3,图3是根据本实施例的网页生成方法的应用场景的一个示意图。在图3的应用场景中,用户首先发起一个知识类网页的浏览请求;之后,网页服务器可以后台获取上述网页页面的内容,并提取出关键词“自动驾驶”;然后,上述网页服务器从其预存储的候选推送信息中找到与关键词“自动驾驶”相匹配的一条或多条信息作为推送信息;最后,上述网页服务器可以采取推送信息与关键词“自动驾驶”相关联的方式,生成新网页。当用户浏览新网页时,如果用户点击了“自动驾驶”这个词条,就会如图3所示,弹出推送信息。
本申请的上述实施例提供的方法通过将网页页面的内容和推送信息相关联,实现了富于针对性的信息推送。
进一步参考图4,其示出了网页生成方法的又一个实施例的流程 400。该网页生成方法的流程400,包括以下步骤:
步骤401,接收用户的网页浏览请求。
在本实施例中,网页生成方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从用户利用其进行网页浏览的终端接收网页浏览请求,其中,上述网页浏览请求包括了用户期望浏览的网页的地址,即网址。
步骤402,对网址所对应的网页页面的内容进行统计分析和/或语义分析,提取至少一个关键词。
在本实施例中,对上述网页页面的内容的分析方式可以是统计分析方式。例如,可以对上述内容中存在的各个词语的出现频率进行统计和排序,之后,再选取出现频率排序靠前的一个或多个词语作为待提取的关键词。或者,对上述网页页面的内容的分析方式还可以是语义分析方式。作为示例,可以对网页页面的内容进行全切分方法等处理,把内容分割成词;再对所得到的词进行重要性计算,基于重要性计算的结果来得到关键词。本领域技术人员可以理解的是,也可以综合采用统计分析和语义分析这两种手段来提取关键词。
步骤403,对于至少一个关键词中的单个关键词,进行扩展以生成扩展关键词。
在本实施例中,可以对上述至少一个关键词中的单个关键词,都进行扩展以生成扩展关键词,其中,扩展关键词包括了上述单个关键词和以下中的至少一项:该单个关键词的同义词,例如,关键词“孩子”可以具有同义词“儿童”;该关键词的近义词,例如,关键词“中药”可以具有近义词“草药”,“出席”可以具有近义词“参加”;该关键词的关联词,例如,关键词“感冒”可以具有“发烧”或“流感”之类的关联词。
步骤404,基于扩展关键词,生成关键词集合。
在本实施例中,利用步骤403,可以针对每一个关键词生成其扩展关键词;之后,可以汇总上述至少一个关键词中的每一个关键词的扩展关键词,生成关键词集合(其包括了上述至少一个关键词的所有的扩展关键词)。
步骤405,基于关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合。
在本实施例中,本步骤可以如下进行:
首先,对各条候选推送信息进行内容解析,分别提取候选推送信息关键词集合。作为示例,可以采取与步骤402相同的内容分析方式来对各条候选推送信息进行内容解析,由此可以针对每条候选推送信息都提取对应的候选推送信息关键词集合。
其次,将步骤404中得到的关键词集合分别与各个候选推送信息关键词集合进行相似度计算。在本实施例中,可以采用余弦相似度(cosine similarity)算法、Jaccard系数之类的公知的文本相似度计算方法来进行相似度计算。以Jaccard系数方法为例,步骤404中得到的关键词集合与候选推送信息关键词集合之间的相似度=关键词集合与候选推送信息关键词集合之间共有的词的数目/关键词集合与候选推送信息关键词集合一起包括的词的数目。
最后,基于相似度计算的结果,选取至少一条候选推送信息生成推送信息集合。在本实施例中,可以首先基于相似度计算的结果,对各条候选推送信息进行排序得到候选推送信息序列(例如采取相似度由高到低的顺序);之后,可以根据量值条件(需要的候选推送信息的数量)或者阈值条件(例如相似度值要大于预设阈值),从上述序列中选取至少一条候选推送信息生成推送信息集合。
步骤406,基于网页页面的内容和推送信息集合,生成新网页。
在本实施例中,上述电子设备可以将上述网页页面的内容与推送信息集合相结合作为新网页的内容,并由此生成新网页。
从图4中可以看出,与图2对应的实施例相比,本实施例中的网页生成方法的流程400突出了对关键词进行扩展的步骤。由此,本实施例描述的方案可以引入更多的关键词相关数据,从而实现更全面的候选推送信息的选取和更有效的网页生成。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种网页生成装置的一个实施例,该装置实施例与图2所示的方法 实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例所述的网页生成装置500包括:接收单元501、解析单元502、信息选取单元503和生成单元504。其中,接收单元501配置用于接收用户的网页浏览请求,其中,上述网页浏览请求包括网址;解析单元502配置用于对上述网址所对应的网页页面进行内容解析,提取关键词集合;信息选取单元503配置用于基于上述关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合;而生成单元504配置用于基于上述网页页面的内容和上述推送信息集合,生成新网页。
在本实施例中,网页生成装置500的接收单元501可以通过有线连接方式或者无线连接方式从用户利用其进行网页浏览的终端接收网页浏览请求,其中,上述网页浏览请求包括了用户期望浏览的网页的地址,即网址。
在本实施例中,基于接收单元501得到的网址,上述解析单元502可以首先获取上述网址所对应的网页页面;之后再利用各种分析手段对上述网页页面的内容进行分析,从而提取出一个或多个关键词。
在本实施例中,网页生成装置500上可以预先存储多条候选推送信息,这些候选推送信息可以与上述网页页面的内容相结合,以便作为一个整体在浏览器上呈现。由此,网页生成装置500的信息选取单元503可以将上述关键词集合与各条候选推送信息的内容分别进行逐一匹配,并根据每条候选推送信息的内容所包括的关键词的数目来确定该条候选推送信息与关键词集合的匹配关系。根据匹配关系,上述信息选取单元503可以从多条候选推送信息中选取至少一条候选推送信息,并由此生成推送信息集合。
在本实施例中,上述生成单元504可以将上述网页页面的内容(即上述网址对应的页面的页面内容)与推送信息集合相结合作为新网页的内容,并由此生成新网页。
本领域技术人员可以理解,上述网页生成装置500还包括一些其他公知结构,例如处理器、存储器等,为了不必要地模糊本公开的实施例,这些公知的结构在图5中未示出。
下面参考图6,其示出了适于用来实现本申请实施例的终端设备或服务器的计算机系统600的结构示意图。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,所述计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们 有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括接收单元、解析单元、信息选取单元和生成单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,接收单元还可以被描述为“接收用户的网页浏览请求的单元”。
作为另一方面,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中所述装置中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。所述计算机可读存储介质存储有一个或者一个以上程序,所述程序被一个或者一个以上的处理器用来执行描述于本申请的网页生成方法。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (16)

  1. 一种网页生成方法,其特征在于,所述方法包括:
    接收用户的网页浏览请求,其中,所述网页浏览请求包括网址;
    对所述网址所对应的网页页面进行内容解析,提取关键词集合;
    基于所述关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合;
    基于所述网页页面的内容和所述推送信息集合,生成新网页。
  2. 根据权利要求1所述的网页生成方法,其特征在于,所述对所述网址所对应的网页页面进行内容解析提取关键词集合,包括:
    对所述网址所对应的网页页面的内容进行统计分析和/或语义分析,提取至少一个关键词;
    基于所述至少一个关键词,生成关键词集合。
  3. 根据权利要求2所述的网页生成方法,其特征在于,所述基于所述至少一个关键词,生成关键词集合,包括:
    对于所述至少一个关键词中的单个关键词,进行扩展以生成扩展关键词,其中,所述扩展关键词包括所述单个关键词和以下中的至少一项:所述单个关键词的同义词、所述单个关键词的近义词、所述单个关键词的关联词;
    基于所述扩展关键词,生成关键词集合。
  4. 根据权利要求1-3之一所述的网页生成方法,其特征在于,所述基于所述关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合,包括:
    对各条候选推送信息进行内容解析,分别提取候选推送信息关键词集合;
    将所述关键词集合分别与各个候选推送信息关键词集合进行相似度计算;
    基于相似度计算的结果,选取至少一条候选推送信息生成推送信息集合。
  5. 根据权利要求4所述的网页生成方法,其特征在于,所述基于相似度计算的结果,选取至少一条候选推送信息生成推送信息集合,包括:
    基于相似度计算的结果以及预设的推送信息的量值,选取所述量值个数的候选推送信息生成推送信息集合。
  6. 根据权利要求1-5之一所述的网页生成方法,其特征在于,所述基于所述网页页面的内容和所述推送信息集合,生成新网页,包括:
    以所述推送信息集合中的推送信息与所述网页页面的内容中的相应关键词相关联的方式,生成新网页。
  7. 根据权利要求1-6之一所述的网页生成方法,其特征在于,所述基于所述网页页面的内容和所述推送信息集合,生成新网页,包括:
    以所述推送信息集合中的推送信息与所述网页页面的内容分开设置的方式,生成新网页。
  8. 一种网页生成装置,其特征在于,所述装置包括:
    接收单元,配置用于接收用户的网页浏览请求,其中,所述网页浏览请求包括网址;
    解析单元,配置用于对所述网址所对应的网页页面进行内容解析,提取关键词集合;
    信息选取单元,配置用于基于所述关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合;
    生成单元,配置用于基于所述网页页面的内容和所述推送信息集合,生成新网页。
  9. 根据权利要求8所述的网页生成装置,其特征在于,所述解析 单元包括:
    分析模块,配置用于对所述网址所对应的网页页面的内容进行统计分析和/或语义分析,提取至少一个关键词;
    生成模块,配置用于基于所述至少一个关键词,生成关键词集合。
  10. 根据权利要求9所述的网页生成装置,其特征在于,所述生成模块进一步配置用于:
    对于所述至少一个关键词中的单个关键词,进行扩展以生成扩展关键词,其中,所述扩展关键词包括所述单个关键词和以下中的至少一项:所述单个关键词的同义词、所述单个关键词的近义词、所述单个关键词的关联词;
    基于所述扩展关键词,生成关键词集合。
  11. 根据权利要求8-10之一所述的网页生成装置,其特征在于,所述信息选取单元包括:
    解析模块,配置用于对各条候选推送信息进行内容解析,分别提取候选推送信息关键词集合;
    相似度计算模块,配置用于将所述关键词集合分别与各个候选推送信息关键词集合进行相似度计算;
    选取模块,配置用于基于相似度计算的结果,选取至少一条候选推送信息生成推送信息集合。
  12. 根据权利要求11所述的网页生成装置,其特征在于,所述选取模块进一步配置用于:
    基于相似度计算的结果以及预设的推送信息的量值,选取所述量值个数的候选推送信息生成推送信息集合。
  13. 根据权利要求8-12之一所述的网页生成装置,其特征在于,所述生成单元进一步配置用于:
    以所述推送信息集合中的推送信息与所述网页页面的内容中的相 应关键词相关联的方式,生成新网页。
  14. 根据权利要求8-13之一所述的网页生成装置,其特征在于,所述生成单元进一步配置用于:
    以所述推送信息集合中的推送信息与所述网页页面的内容分开设置的方式,生成新网页。
  15. 一种设备,其特征在于,包括:
    一个或者多个处理器;
    存储器;
    一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或多个处理器执行时:
    接收用户的网页浏览请求,其中,所述网页浏览请求包括网址;
    对所述网址所对应的网页页面进行内容解析,提取关键词集合;
    基于所述关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合;
    基于所述网页页面的内容和所述推送信息集合,生成新网页。
  16. 一种非易失性计算机存储介质,所述计算机存储介质存储有一个或多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备:
    接收用户的网页浏览请求,其中,所述网页浏览请求包括网址;
    对所述网址所对应的网页页面进行内容解析,提取关键词集合;
    基于所述关键词集合与各条候选推送信息之间的匹配关系,选取至少一条候选推送信息生成推送信息集合;
    基于所述网页页面的内容和所述推送信息集合,生成新网页。
PCT/CN2015/090703 2015-06-30 2015-09-25 网页生成方法和装置 WO2017000402A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510385768.XA CN105095394B (zh) 2015-06-30 2015-06-30 网页生成方法和装置
CN201510385768.X 2015-06-30

Publications (1)

Publication Number Publication Date
WO2017000402A1 true WO2017000402A1 (zh) 2017-01-05

Family

ID=54575831

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/090703 WO2017000402A1 (zh) 2015-06-30 2015-09-25 网页生成方法和装置

Country Status (2)

Country Link
CN (1) CN105095394B (zh)
WO (1) WO2017000402A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609965A (zh) * 2018-05-28 2019-12-24 腾讯科技(深圳)有限公司 一种页面显示方法、装置和存储介质

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426508B (zh) * 2015-11-30 2019-07-05 百度在线网络技术(北京)有限公司 网页生成方法和装置
CN105488161A (zh) * 2015-11-30 2016-04-13 百度在线网络技术(北京)有限公司 信息推送方法和装置
CN105488205B (zh) * 2015-12-09 2019-05-03 百度在线网络技术(北京)有限公司 页面生成方法和装置
CN105634860B (zh) * 2015-12-21 2019-09-24 中国电子科技集团公司第十五研究所 一种上网行为轨迹还原的方法和装置
CN105578222B (zh) * 2016-02-01 2019-04-12 百度在线网络技术(北京)有限公司 一种信息推送方法和装置
CN105808636B (zh) * 2016-02-03 2020-11-27 北京中搜云商网络技术有限公司 一种基于app信息数据的超文本链接推送系统
CN105701232B (zh) * 2016-02-03 2020-11-27 北京中搜云商网络技术有限公司 一种基于app信息数据的超文本链接清单推送系统
CN105760523A (zh) * 2016-02-29 2016-07-13 百度在线网络技术(北京)有限公司 一种信息推送方法和装置
CN107656954A (zh) * 2017-01-19 2018-02-02 深圳市谷熊网络科技有限公司 信息推送方法、推送信息的获取方法及装置
CN108363707B (zh) * 2017-01-26 2020-01-24 百度在线网络技术(北京)有限公司 用于生成网页的方法和装置
CN107172151B (zh) 2017-05-18 2020-08-07 百度在线网络技术(北京)有限公司 用于推送信息的方法和装置
CN106982420B (zh) * 2017-05-22 2020-05-15 张胜利 一种基于wifi的信息发布、传播、推送方法及系统
CN110147488B (zh) * 2017-10-23 2023-05-16 腾讯科技(深圳)有限公司 页面内容的处理方法、处理装置、计算设备及存储介质
CN108171552A (zh) * 2018-01-16 2018-06-15 百度在线网络技术(北京)有限公司 搜索推广方法及装置
CN109063147A (zh) * 2018-08-06 2018-12-21 北京航空航天大学 基于文本相似度的在线课程论坛内容推荐方法及系统
US11250716B2 (en) * 2018-08-30 2022-02-15 Microsoft Technology Licensing, Llc Network system for contextual course recommendation based on third-party content

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071424A (zh) * 2006-06-23 2007-11-14 腾讯科技(深圳)有限公司 一种个性化信息推送系统和方法
CN101866341A (zh) * 2009-04-17 2010-10-20 华为技术有限公司 一种信息推送方法、装置及系统
CN102646248A (zh) * 2012-02-27 2012-08-22 沈文策 一种广告发布方法及系统
CN103530339A (zh) * 2013-10-08 2014-01-22 北京百度网讯科技有限公司 移动应用信息推送方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693271B (zh) * 2012-03-06 2015-11-25 天津奇思科技有限公司 一种网络信息推荐方法及系统
US10878044B2 (en) * 2012-10-30 2020-12-29 Sk Planet Co., Ltd. System and method for providing content recommendation service
CN103870461B (zh) * 2012-12-10 2019-09-10 腾讯科技(深圳)有限公司 主题推荐方法、装置和服务器
CN103577595B (zh) * 2013-11-15 2017-09-22 北京奇虎科技有限公司 基于当前浏览页面的关键词推送方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071424A (zh) * 2006-06-23 2007-11-14 腾讯科技(深圳)有限公司 一种个性化信息推送系统和方法
CN101866341A (zh) * 2009-04-17 2010-10-20 华为技术有限公司 一种信息推送方法、装置及系统
CN102646248A (zh) * 2012-02-27 2012-08-22 沈文策 一种广告发布方法及系统
CN103530339A (zh) * 2013-10-08 2014-01-22 北京百度网讯科技有限公司 移动应用信息推送方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609965A (zh) * 2018-05-28 2019-12-24 腾讯科技(深圳)有限公司 一种页面显示方法、装置和存储介质
CN110609965B (zh) * 2018-05-28 2023-09-22 腾讯科技(深圳)有限公司 一种页面显示方法、装置和存储介质

Also Published As

Publication number Publication date
CN105095394A (zh) 2015-11-25
CN105095394B (zh) 2017-06-06

Similar Documents

Publication Publication Date Title
WO2017000402A1 (zh) 网页生成方法和装置
JP6161679B2 (ja) 検索エンジン及びその実現方法
JP6511487B2 (ja) 情報プッシュ用の方法及び装置
CN107220386B (zh) 信息推送方法和装置
CN107172151B (zh) 用于推送信息的方法和装置
WO2018192491A1 (zh) 信息推送方法和装置
WO2016206210A1 (zh) 信息推送方法和装置
Li et al. Opinion community detection and opinion leader detection based on text information and network topology in cloud environment
Li et al. Filtering out the noise in short text topic modeling
WO2018149115A1 (zh) 用于提供搜索结果的方法和装置
US11281860B2 (en) Method, apparatus and device for recognizing text type
JP6224731B2 (ja) 個人的ユーザ経験を改善するためにソーシャル・メディアを豊富にする方法および装置
US20090094210A1 (en) Intelligently sorted search results
US20060287988A1 (en) Keyword charaterization and application
US11250203B2 (en) Browsing images via mined hyperlinked text snippets
CN107526718B (zh) 用于生成文本的方法和装置
WO2017092294A1 (zh) 网页生成方法和装置
US20180046628A1 (en) Ranking social media content
CN105760523A (zh) 一种信息推送方法和装置
WO2016173185A1 (zh) 信息推送方法和装置
CN113688310A (zh) 一种内容推荐方法、装置、设备及存储介质
CN108280081B (zh) 生成网页的方法和装置
Hidayatullah et al. Topic modeling on Indonesian online shop chat
KR20120082620A (ko) 온톨로지 정렬 방법 및 이를 적용한 온톨로지 정렬 시스템
JP6699031B2 (ja) モデル学習方法、説明文評価方法、及び装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15896932

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15896932

Country of ref document: EP

Kind code of ref document: A1