WO2012126259A1 - 一种具有信息发布和搜索功能的系统及信息发布方法 - Google Patents

一种具有信息发布和搜索功能的系统及信息发布方法 Download PDF

Info

Publication number
WO2012126259A1
WO2012126259A1 PCT/CN2011/083412 CN2011083412W WO2012126259A1 WO 2012126259 A1 WO2012126259 A1 WO 2012126259A1 CN 2011083412 W CN2011083412 W CN 2011083412W WO 2012126259 A1 WO2012126259 A1 WO 2012126259A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
user
publishing
probability
platform
Prior art date
Application number
PCT/CN2011/083412
Other languages
English (en)
French (fr)
Inventor
李彦宏
廖若雪
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2012126259A1 publication Critical patent/WO2012126259A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention relates to the field of search technologies, and in particular, to a system and a method for distributing information with information publishing and searching functions.
  • search has become an indispensable part of people's lives. Through search, people can easily access knowledge and information in various fields, which greatly accelerates the dissemination of information.
  • the emergence of search technology has greatly changed the way people learn traditionally.
  • the knowledge that can be obtained from libraries and educational institutions in the past can now be easily obtained from the vast Internet of information with a click of a mouse.
  • search technology moves toward a more intelligent direction, people are more and more likely to obtain personalized information.
  • WEB2.0 the Internet has become a platform for communication and communication. Accessing information from the Internet often provides information for the Internet. For example, people's widely used blogs, Weibo, Q&A community, SNS, etc.
  • the system is capable of determining the user's intention based on the information input by the user, thereby determining whether to return the search result to the user or to post the user input information to the platform described above.
  • each platform for publishing information is based on its own management. If users register services on multiple platforms, and hope to be different. The platform publishes the same information, so he has to log in and publish on multiple platforms, which obviously brings inconvenience to the user.
  • the technical problem to be solved by the present invention is to provide a system having information distribution and search functions and a information distribution method for realizing the purpose of judging the user's intention by inputting information by the user and searching or publishing the input information according to the user's intention.
  • the technical solution for solving the technical problem is to provide a system for information publishing and searching, comprising: a display module, configured to provide a user with a usage interface of the system, and the use interface is used for receiving a user.
  • the input information and the processing result returned by the system are displayed to the user;
  • the classifier building module is configured to construct a classifier model according to the historical data of the offline mining or the corpus data provided by the third-party information publishing platform;
  • the information analysis module And configured to analyze the input information according to the classifier model, and input Deriving the input information has a first probability of information publishing requirement, the first probability is used to describe the possibility that the input information has the information publishing requirement based on a semantic feature angle;
  • the comprehensive decision module is configured to Determining whether to retrieve or publish the input information;
  • a publishing module configured to invoke a data interface of the third-party information publishing platform, and connect to the Internet to publish the input information to the third-party information publishing platform;
  • a retrieval module configured to query an index library
  • the system is a search engine system.
  • the form of the use interface includes a WEB page, a WAP page, a combination of a browser with a search plugin and the WEB page, or a browser with a search plugin and the WAP page. Combination of.
  • the WEB ⁇ plane or the WAP plane includes a search box, an address bar, an input method frame or an information input interface.
  • the third-party information publishing platform includes a microblog platform, a social network platform, a forum platform, or an electronic bulletin platform.
  • the classifier model is constructed using a machine learning algorithm based on the historical data or the corpus data.
  • the information distribution requirement includes a specific information distribution requirement or a general information distribution requirement.
  • the system prompts the user by the display module to obtain the user's confirmation information about the prompt before being released.
  • the presentation module returns a retrieval result of the input information by the retrieval module when prompted by the user.
  • the prompt includes a plurality of prompt information about the third-party information publishing platform.
  • the confirmation information includes selection information or login information for the third party information distribution platform.
  • the display module after receiving the confirmation information of the prompt by the user, issues the input information through the publishing module.
  • the publishing module is further configured to publish the input information to a plurality of the third-party information publishing platforms.
  • the system further includes: a user information acquiring module, configured to: Obtaining user information of the user in the third-party information publishing platform, to obtain a second probability that the input information has the information publishing requirement, where the second probability is used to describe the input information based on a user information feature angle The possibility of having an information publishing requirement; the comprehensive decision module uses the first probability and the second probability to determine whether to retrieve or publish the input information.
  • a user information acquiring module configured to: Obtaining user information of the user in the third-party information publishing platform, to obtain a second probability that the input information has the information publishing requirement, where the second probability is used to describe the input information based on a user information feature angle The possibility of having an information publishing requirement; the comprehensive decision module uses the first probability and the second probability to determine whether to retrieve or publish the input information.
  • the user information includes account information of the user or usage frequency information of the user.
  • the method for obtaining the user information includes verifying the online status of the user in the third-party information publishing platform, calling the user's usage record of the account of the third-party information publishing platform, or receiving the user's The input of the display module.
  • the system further includes: a user behavior analysis module, configured to analyze a past behavior of the user, to obtain a third probability that the input information has the information publishing requirement, wherein the third The probability is used to describe the possibility that the input information has the information publishing requirement based on the user historical behavior feature angle; the comprehensive decision module determines whether the input information is used by using the first probability and the third probability Search or publish.
  • a user behavior analysis module configured to analyze a past behavior of the user, to obtain a third probability that the input information has the information publishing requirement, wherein the third The probability is used to describe the possibility that the input information has the information publishing requirement based on the user historical behavior feature angle
  • the comprehensive decision module determines whether the input information is used by using the first probability and the third probability Search or publish.
  • the system further includes: a high-level grammar mining module, configured to mine Internet data, extract, from the Internet data, keywords that describe the third-party information publishing platform by the user, and The keyword is semantically extended to generate a description term library for the third-party information publishing platform; the high-level syntax matching module is configured to perform matching verification on the input information according to the description term library to determine that the user uses the advanced grammar a fourth probability, when the fourth probability is greater than the first threshold, the high-level syntax matching module further decomposes the input information into a content portion and a syntax portion, and transmits the content portion and the fourth probability And the information analysis module, when the fourth probability is not greater than the first threshold, the advanced syntax matching module further directly transmitting the input information to the information analysis module; the information analysis module utilizes the advanced The data passed by the grammar matching module and the classifier model, the output The first probability.
  • a high-level grammar mining module configured to mine Internet data, extract, from the Internet data, keywords that describe the third-party information publishing platform by the user, and The keyword is
  • the system further includes: a user configuration module, configured to configure a default behavior mode for the system according to a user's selection, where the default behavior mode includes a default search or a default release.
  • the default behavior mode includes a default search or a default release.
  • the user configuration identification module is configured to identify the user configuration. Information, and searching or publishing the input information according to the user configuration information.
  • the system is provided to the user by the display module before being released. Show to obtain the user's confirmation information about the prompt.
  • the present invention also provides an information publishing method, comprising the steps of: a. receiving user input information; b. analyzing the input information according to a classifier model to obtain a first probability that the input information has a information publishing requirement, The first probability is used to describe the possibility that the input information has the information publishing requirement based on a semantic feature angle; c . searching the input information according to the first probability or publishing the input information to Third-party information publishing platform.
  • the user input information is received from the use interface of the search engine in the step a.
  • the specific form of the use interface includes a WEB page, a WAP face, a combination of a browser with a search plugin and the WEB face, or a browser with a search plugin and the WAP.
  • the combination of pages includes a WEB page, a WAP face, a combination of a browser with a search plugin and the WEB face, or a browser with a search plugin and the WAP.
  • the WEB page or the WAP page includes a search box, an address bar, an input method frame, or an information input interface.
  • the third-party information publishing platform includes a microblog platform, a social network platform, a forum platform, or an electronic bulletin platform.
  • the classifier model is constructed by using a machine learning algorithm according to historical data of offline mining or corpus data provided by the third party information publishing platform.
  • the information distribution requirement includes a specific information distribution requirement or a general information distribution requirement.
  • the user is prompted to obtain the confirmation information of the prompt by the user before the issuance.
  • the retrieval result of the input information is returned when the user is prompted.
  • the prompt includes a plurality of prompt information about the third-party information publishing platform.
  • the confirmation information includes selection information or login information for the third party information distribution platform.
  • the input information is released.
  • the input information is posted to a plurality of the third-party information publishing platforms.
  • the method further comprises the steps of: step d: obtaining user information of the user in the third-party information publishing platform, to obtain that the input information has the information publishing requirement a second probability, wherein the second probability is used to describe the possibility that the input information has the information publishing requirement based on a user information feature angle; the step c uses the first probability and the second probability The input information is retrieved or the input information is posted to a third party information publishing platform.
  • the user information includes account information of the user or usage frequency information of the user.
  • the obtaining manner of the user information in the step d includes verifying the online status of the user in the third-party information publishing platform, and calling the usage record of the user's account on the third-party information publishing platform or Receiving input from the user at the display module.
  • the method further comprises the steps of: e. analyzing the past behavior of the user to obtain a third probability that the input information has the information publishing requirement, wherein the third The probability is used to describe the possibility that the input information has the information publishing requirement based on the user historical behavior feature angle; the step c uses the first probability and the third probability to search the input information or The input information is posted to a third party information publishing platform.
  • the method further comprises the steps of: before step b: f. performing matching verification on the input information according to the description term library to determine a fourth probability that the user uses the advanced grammar, wherein the description
  • the term library is generated by extracting Internet data from the Internet data, extracting keywords of the third-party information publishing platform from the user, and semantically expanding the keywords; and when the fourth probability is greater than
  • the input information is decomposed into a content portion and a grammar portion, and the step b uses the content portion and the fourth probability and the classifier model to obtain a first probability.
  • the method further comprises the steps of: before step b: g. identifying user configuration information, wherein the user configuration information is a default behavior mode configured for the system according to a user selection, the default The behavior mode includes defaulting to search or defaulting to publishing, wherein when the system is configured to be the default for searching, the system only performs retrieval without performing publishing, and when the system is configured to be the default for publishing, the system only performs publishing without executing. Retrieving; when it is recognized that the system has user configuration information, the input information is retrieved or published according to the user configuration information.
  • the user is prompted to obtain confirmation of the prompt by the user prior to the posting.
  • the system can satisfy the different needs of different users by analyzing and judging the input information of the user, so that the system has information transmission on the basis of information retrieval.
  • the cloth function when the user does not specify the information publishing requirement of the specific publishing platform, the system can also easily publish the information for the user on multiple information publishing platforms, which not only greatly simplifies the information publishing process, but also saves the user.
  • the time also provides the possibility for users to reach the information demand side faster and more accurately.
  • FIG. 1 is a block diagram showing the structure of a first embodiment of a system having an information distribution and search function in an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a specific form of an interface used in an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of another embodiment of a specific interface used in the embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an embodiment of the embodiment of the present invention using an interface to display the returned result of the system to the user;
  • FIG. 5 is a schematic diagram of an embodiment of prompt information including multiple third-party information publishing platforms in a prompt returned by a display module according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of an embodiment of a prompt returned by a display module in the embodiment of the present invention
  • FIG. 7 is a schematic diagram of an embodiment of a prompt returned by a display module according to an embodiment of the present invention
  • FIG. 8 is a schematic diagram of an embodiment of the present invention
  • a schematic diagram of an embodiment of the notification that the display module returns a successful announcement after the publication is successful;
  • Figure 9 is a block diagram showing the structure of a second embodiment of a system having information distribution and search functions in an embodiment of the present invention.
  • Figure 10 is a block diagram showing the structure of a third embodiment of the system having the information distribution and search function in the embodiment of the present invention.
  • Figure 11 is a block diagram showing the structure of a fourth embodiment of a system having information distribution and search functions in an embodiment of the present invention.
  • FIG. 12 is a schematic flowchart diagram of Embodiment 1 of an information distribution method according to an embodiment of the present invention.
  • FIG. 13 is a schematic flowchart diagram of Embodiment 2 of an information distribution method according to an embodiment of the present invention.
  • FIG. 14 is a schematic flowchart diagram of Embodiment 3 of an information distribution method according to an embodiment of the present invention.
  • FIG. 15 is a schematic flowchart diagram of Embodiment 4 of the information distribution method in the embodiment of the present invention.
  • the system having the information distribution and search function in the present invention may be any search engine system, or other BS (browser-server) structure or CS (client-server) structure system.
  • Figure 1 shows the invention A schematic block diagram of a first embodiment of a system having information dissemination and search functions in the embodiment.
  • the system having information publishing and searching functions includes a presentation module 101, a classifier building module 102, an information analysis module 103, an integrated decision module 104, a publishing module 105, and a retrieval module 106.
  • the display module 101 is configured to provide a user with a usage interface of the system, and the usage interface can receive the input information of the user and display the processing result returned by the search engine to the user.
  • the user interface includes a WEB page and a WAP page, wherein the WEB page is a webpage text that can be recognized by an ordinary computer browser conforming to the HTML format, and the WAP page is a webpage text more suitable for display by the mobile browser.
  • the use interface is not just a facet, but a combination of a browser and a page that includes a search plugin.
  • the user input information can also be obtained at the search box of the search plugin of the browser.
  • FIG. 4 it is a schematic diagram of an embodiment when the interface returns the result of the system to the user.
  • the use interface adds a guidance prompt to the user to post information in the search result page.
  • the search box, the address bar, the input method box or the information input interface may be included, and the information input interface may be used to input various information to be published, including microblog information and social network information. , forum information or electronic bulletin information.
  • the social networks include Kaixin.com and Renren.com.
  • information can be directly input through the information input interface, and the information can be released through the search engine.
  • the classifier building module 102 is configured to construct a classifier model according to the historical data of the offline mining or the corpus data provided by the third-party information publishing platform, using a machine learning algorithm.
  • the third-party information publishing platform refers to an information publishing platform system that is technically associated with the system, including a microblog platform, a social network platform, a forum platform, or an electronic bulletin platform.
  • the historical data extracted by the offline and the corpus data provided by the third-party information publishing platform refer to the user's query input data, which is a training sample when the machine learning algorithm is used to construct the classifier model.
  • the following SVM Small Vector Machine
  • the machine learning algorithm is taken as an example to introduce the construction of the classifier model.
  • the classification principle of SVM can be summarized as: Find a classification hyperplane, so that the two types of sample points in the training sample can be separated and as far as possible from the plane; and for the linear indivisible problem, the low-dimensional input through the kernel function
  • the spatial data is mapped to the high-order space, which transforms the linear indivisible problem of the original low-dimensional space into a linear separable problem in high-dimensional space.
  • xi in the sample set (xi, yi) is a feature vector consisting of the characteristics of the training corpus (ie, the data provided by the offline mining data or the third-party information publishing platform).
  • yi represents one of two types of problems. If there are multiple types of problems, it can be treated into multiple types of problems.
  • the type in the present invention refers to the type of publishing requirements for information, including the release. The release requirements of platform one, the release requirements of release platform two, and so on.
  • the following variables can be used as features: the number and position of various punctuation marks in the query, the length of the string in the query, whether the end of the query is a character, whether the query has a special string, or a number in the query Number, query whether there are words belonging to the classification vocabulary, the search amount of each word in the query, the number of search results obtained by the search engine for each word in the query, etc., wherein the classification vocabulary refers to such as economy, history Category table for astronomy, astronomy, geography.
  • the information analysis module 103 is configured to analyze user input information according to the classifier model constructed by the classifier construction module 102, and output a first probability that the user input information has an information publishing requirement, wherein the first probability is used to describe the input information.
  • the information publishing requirements include specific information publishing requirements or general information publishing requirements.
  • Specific information publishing requirements refer to publishing requirements such as publishing on specific publishing platforms such as Weibo and SNS, while general information publishing requirements refer to publishing requirements for users who do not specify a publishing platform. For example, users may want to transfer train tickets. Or you need to rent a house in a certain area. These information publishing requirements do not specify a specific publishing platform. Users just want to publish this type of information to the Internet. As long as they can be seen by other users, the users who post the information are I don't mind which platform the information is published on.
  • the information analysis module relies on the classifier model to analyze the semantics of the information input by the user, and to determine the first probability size of the information input by the user from a semantic perspective.
  • the information input by the user is "a set of three-bedroom apartment in Zhongguancun for rent", and the result of the information analysis module is (for rent)
  • P comprehensive 0.9
  • P Weibo 0.7
  • P knowledge quiz community 0.2
  • microblogging it means that the user intends to publish information on the microblogging platform, and the corresponding first probability on the microblogging platform is very large, and the first probability on other platforms is very small.
  • the data structure of the above output is only a schematic description for the purpose of illustrating the present invention, and other methods may be adopted as needed in the specific implementation, which is not limited by the present invention.
  • the comprehensive decision module 104 is configured to determine whether to retrieve or publish information input by the user according to the first probability. Further determining whether the input information should be published further includes determining whether the user needs to be prompted before the release. When the user needs to be prompted, the prompting module returns a prompt to the user, wherein the prompt may include one or more prompt information about the third-party information publishing platform.
  • FIG. 5 is a schematic diagram of an embodiment of prompting information of a plurality of third-party information publishing platforms in a prompt returned by a display module according to an embodiment of the present invention.
  • the search process of the general search engine can be performed on the information input by the user, and the user input information can be treated as the query information.
  • the user is prompted by the presentation module whether to publish information, for example, on the search result page, "post this message on Sina Weibo: I bought today A piece of clothing", when the display module receives the confirmation message returned by the user, the input information can be released.
  • the prompt may include a login prompt or a selection prompt of the third-party information publishing platform or an account prompt on the third-party information publishing platform.
  • the confirmation information may include login information or selection information of the third-party information publishing platform.
  • the login information includes an account number or a password
  • the selection information includes a selection of a third party information publishing platform or a selection of an account number on a third party information publishing platform.
  • FIG. 6 is a schematic diagram of an embodiment of a prompt returned by a display module according to an embodiment of the present invention
  • FIG. 7 is a schematic diagram of a prompt returned by a display module according to an embodiment of the present invention. Illustration of the example. It is worth noting that when the user is prompted to publish, it is also possible to perform a retrieval operation on the input information at the same time, and return the posting prompt together with the query result.
  • the system When the first probability is very large (assuming probability > threshold four), the system will directly publish the piece of information. In addition to the above strategies, it is also possible to perform a simple search and search on the user input information according to the difference in probability. Coexist with publishing prompts, simple posting tips, or direct publishing.
  • FIG. 8 is a schematic diagram of an embodiment of a notification that a display module returns a successful release after a successful release in an embodiment of the present invention.
  • the publishing module 105 is configured to invoke a data interface of the third-party information publishing platform, and connect to the Internet to publish the information input by the user to the information publishing platform of the third party.
  • the integrated decision module determines that the user input information needs to be published, the integrated decision module will invoke the publishing module to publish the information input by the user to the third party information publishing platform.
  • the publishing module also publishes the information input by the user information to the third party information publishing platform.
  • the publishing module can publish information with a specific information publishing requirement to a publishing platform such as 4 blogs, knowledge quiz communities, and social networks. It can also publish information with general information publishing requirements to such as 58. A classified information publishing platform such as the same city.
  • the publishing module can also publish information input by users to multiple third-party information publishing platforms. For example, the user wants to publish a message for transferring a train ticket. In the past, in order to let more people find their own information as soon as possible, the user will post this information on multiple classified information publishing platforms, but through the present invention, the user only needs to The search engine enters information in the search box, and the system of the present invention can automatically publish the information to multiple information publishing platforms, which greatly simplifies the information publishing process.
  • the retrieval module 106 is configured to query the index library according to the user input information, and return the query result to the display module.
  • the comprehensive decision module can retrieve the input information by calling the retrieval module.
  • the retrieval module includes a query sub-module and a sort sub-module. Like the ordinary search engine, the query sub-module is used to query the index library according to the user input information, and the sort sub-module is used to sort the query results, and return the sorted result. . Since the implementation technology of the retrieval module is the same as the prior art, it will not be described here.
  • FIG. 9 is a schematic block diagram showing the structure of a second embodiment of a system having an information distribution and search function according to an embodiment of the present invention.
  • the system having the information distribution and search function further includes a user information acquisition module 107 or a user behavior analysis module 108 or any combination of the above two modules.
  • the user information obtaining module 107 is configured to obtain user information of the user in the third-party information publishing platform, to obtain a second probability that the input information has a information publishing requirement, and the second probability is used to describe the input information based on the user information feature angle.
  • the user information of the user in the third-party information publishing platform including the user's account information or the user's usage frequency information, wherein the account information refers to Whether the user has an account number and an account number in the third-party information publishing platform, and the use frequency information refers to which third-party information publishing platform is a common platform for the user.
  • the user information obtaining module can poll the multiple third-party information publishing platforms through the user's cookie data or the user's IP address or the computer's MAC address, and obtain the user information of the user on the third-party information publishing platform.
  • the method for obtaining user information further includes verifying the online status of the user on the third-party information publishing platform, calling the usage record of the user's account on the third-party information publishing platform, or receiving the user's input in the display module. For example, in the display module, the account and password window for the user to log in to the microblog are provided, and the user inputs the information in the window.
  • the specific decision manner of the integrated decision module determining whether to search or publish the input information according to the first probability is to determine whether to input the information by using the first probability and the second probability.
  • the retrieval or publication may specifically determine the final judgment logic by using the first probability and the second probability to preset weights.
  • the user behavior analysis module 108 is configured to analyze the past behavior of the user to obtain a third probability that the input information has an information publishing requirement, and the third probability is used to describe the possibility that the input information has an information publishing requirement based on the user historical behavior feature angle. .
  • the user's past behavior has a guiding effect on deriving the user's current behavioral intent. If users often publish information in the same language, then when the user still enters information in this language, it is more likely to publish the information.
  • the user's past behaviors include his behavior on search engines, microblogs, forums, blogs, etc., which are expressed in language, behavioral habits (such as asking questions frequently or answering others' questions frequently).
  • the technical means used by the user behavior analysis module include data mining and machine learning, that is, through the user log, the user's behavior data is mined as a training sample, and the feature selection algorithm and the machine learning method are used to classify the user's behavior and output the third. Probability.
  • the specific decision mode of the integrated decision module determining whether to search or publish the input information according to the first probability is to determine whether to input the input information by using the first probability and the third probability. Or publishing, specifically, the method of presetting the weights for the first probability and the third probability may be used to determine the final judgment logic.
  • Figure 10 is a block diagram showing the structure of a third embodiment of a system having an information distribution and search function according to an embodiment of the present invention.
  • the system having the information distribution and search function further includes a high-level grammar mining module 109 and a high-level grammar matching module 110.
  • the high-level grammar mining module 109 is configured to mine Internet data, extract keywords of the third-party information publishing platform from the Internet data, and perform semantic expansion on the keywords to generate a pin.
  • the Weibo platform is often referred to as a bib, and "wb:” or “bib:” can be extended by Weibo as a description of the microblogging third-party information publishing platform.
  • Commonly used data mining methods include: neural network method, genetic algorithm, decision tree method, rough set method, statistical analysis method, fuzzy set method, etc., as these are all prior art, and will not be described in detail herein.
  • the high-level grammar matching module 110 is configured to perform matching verification on the input information according to the description word library to determine a fourth probability that the user uses the advanced grammar.
  • the fourth probability is greater than the threshold X
  • the input information is decomposed into a content part and a gram part. And transmitting the content part and the fourth probability to the information analysis module, and when the fourth probability is not greater than the threshold X, directly inputting the input information to the information analysis module.
  • the so-called high-level grammar refers to the information input method that describes how to write the word library. For example, if the input information is "wb: catching a big fish today", the advanced grammar is used.
  • the high-level grammar matching module further includes a verification unit and a decomposition unit, wherein the verification unit is configured to perform matching verification on the input information according to a predefined policy, and generate a fourth probability that the corresponding user uses the advanced grammar, and the decomposition unit is configured to The information whose fourth probability is greater than the threshold X is decomposed into a content part and a gram part.
  • the verification unit scans the input information according to a predefined policy to obtain a fourth probability, for example, a strategy is to scan at the beginning of the input information to fully match the description word.
  • the fourth probability is 0.9.
  • the decomposition unit will decompose "wb: catching a big fish today” into “wb:” and “fishing a big fish today", where "wb:” is the grammar part, "Catch a big fish today" for the content section.
  • the advanced syntax matching module not only outputs the decomposed content portion, but also outputs the fourth probability that the user uses the advanced grammar.
  • the verification unit performs matching verification, according to different strategies, different fourth syntax using high-level syntax can be obtained. For example, when the beginning part of the user input information completely matches the description word in the description word library, the fourth probability is 0.9; When the beginning part of the information does not exactly match the description word in the description word library, the fourth probability is 0.5; when the middle part of the user input information matches the description word in the description word library, the fourth probability is 0.3 and so on.
  • the content portion and the fourth probability output by the advanced syntax matching module are passed to the information analysis module to provide more basis for the information analysis module to calculate the first probability.
  • the specific implementation manner of the information analysis module outputting the first probability according to the classifier model is to output the first probability by using the data and the classifier model transmitted by the advanced syntax matching module, and may implement the preset strategy, for example, the fourth probability is large, indicating the user
  • the fourth probability is large, indicating the user
  • the possibility of using advanced grammar is high, and accordingly, the probability that the user wants to post information is large, the first probability It is also very big.
  • Figure 11 is a block diagram showing the structure of a fourth embodiment of a system having information distribution and search functions according to an embodiment of the present invention.
  • the system having the information distribution and search function further includes: a user configuration module 111 and a user configuration identification module 112.
  • the user configuration module 111 is configured to configure a default behavior mode for the search engine system according to the user's selection.
  • the default behavior mode includes default search or default release.
  • the system When the system is configured to default to search, the system only performs retrieval without performing publication.
  • the system When the system is configured to default to publish, the system only performs publishing without performing retrieval.
  • the default is search or the default is to publish. It can be further divided into more detailed configurations.
  • the default is to publish, and it can be configured to receive the information input by the user each time. It is directly published by the publishing module or displayed to the user through the display module. Tips before the message, or configured to publish to a specific platform, and so on.
  • the user configuration identification module 112 is configured to identify user configuration information, and search and publish the input information according to the user configuration information. Before the publication, the display module may prompt the user to obtain the confirmation information of the user.
  • the system recognizes that the user is configured as the default search, the user input information is regarded as the query information, and the search request is sent to the input information; if the system recognizes that the user configuration is the default release, the system judges directly according to the further refined configuration. The published configuration is still configured to prompt before the release. If the configuration is directly published, a request for direct posting is made to the user input information. Otherwise, the input information is sent a request to prompt the user before the release, and when the user's confirmation information is received, the input information can be released.
  • the user is configured to post the input information directly to Sina Weibo, which means that the user uses the search engine system of the present invention for a specific purpose, and the user himself is very clear about the purpose, in which case, as long as the user By configuring the corresponding processing, the user's needs can be well met, so there is no need to let the search engine perform other operations.
  • the user input information is output to the next processing module.
  • FIG. 12 is a schematic flowchart diagram of Embodiment 1 of an information distribution method according to an embodiment of the present invention.
  • the information publishing method includes the step 201: receiving user input information.
  • user input information is received from a search engine usage interface, wherein the search engine usage interface includes a WEB page, a WAP page, a browser with a search plugin and a WEB page, or a search plugin.
  • the search box, the address bar, the input method box or the information input interface may be included, and the information input interface may be used to input various information that needs to be published, and the package Including Weibo information, social network information, forum information or electronic bulletin information.
  • Receiving user input information is a prerequisite for subsequent processing.
  • Step 202 Analyze user input information according to a classifier model to obtain a first probability that the input information has a information publishing requirement, wherein the first probability is used to describe the possibility that the input information has a information publishing requirement based on a semantic feature angle.
  • the classifier model is constructed based on historical data of offline mining or corpus data provided by a third-party information publishing platform, using machine learning algorithms.
  • the historical data extracted by the offline and the corpus data provided by the third-party information publishing platform refer to the user's query input data, which is a training sample when the machine learning algorithm is used to construct the classifier model.
  • SVM Small Vector Machine
  • the machine learning algorithm is taken as an example to introduce the construction of the classifier model.
  • the classification principle of SVM can be summarized as: Find a classification hyperplane, so that the two types of sample points in the training sample can be separated and as far as possible from the plane; and for the linear indivisible problem, the low-dimensional input through the kernel function
  • the spatial data is mapped to the high-order space, which transforms the linear indivisible problem of the original low-dimensional space into a linear separable problem in high-dimensional space.
  • xi in the sample set (xi, yi) is a feature vector composed of features of training corpus (ie, offline mining data or data provided by a third-party information publishing platform), and yi represents one of two types of problems. If there are many types of problems, it can be changed into two types of problems for processing.
  • the type in the present invention refers to the type of publishing requirements of the information, including the publishing requirements of the publishing platform one, and the publishing platform two. Publish requirements and more.
  • the following variables can be used as features: the number and position of various punctuation marks in the query, the length of the string in the query, whether the end of the query is a character, whether the query has a special string, or a number in the query Number, query whether there are words belonging to the classification vocabulary, the search amount of each word in the query, the number of search results obtained by the search engine for each word in the query, etc., wherein the classification vocabulary refers to such as economy, history Classes such as astronomy and geography Do not list.
  • Information publishing requirements include specific information publishing requirements or general information publishing requirements, where specific information publishing requirements refer to publishing requirements such as published on a specific publishing platform such as Weibo and SNS, and general information publishing requirements refer to users not specified.
  • Publish the publishing requirements of the platform for example, the user may want to transfer the train ticket, or need to rent a house in a certain area.
  • the information publishing requirements do not specify a specific publishing platform, and the user only wants to publish this type of information to the Internet. As long as it can be seen by other users, the user who posted the information does not mind which platform the information is published on.
  • the information input by the user can be semantically analyzed to determine the first probability size of the information input by the user from a semantic perspective and published on various publishing platforms.
  • the information input by the user is "a set of three-bedroom apartment in Zhongguancun for rent".
  • the user inputs "microblogging” it means that the user intends to publish information on the microblogging platform, and the corresponding first probability on the microblogging platform is very large, and the first probability on other platforms is very small. .
  • the data structure of the above-mentioned results is only a schematic description for the purpose of illustrating the present invention, and other manners may be adopted as needed in the specific implementation, which is not limited by the present invention.
  • Step 203 Search the input information according to the first probability or publish the input information to a third-party information publishing platform.
  • the publishing the input information to the third-party information publishing platform further includes prompting the user to publish before the publishing.
  • the prompt may include one or more prompt information about a third party information publishing platform.
  • a series of policies may be set in advance to perform retrieval on the user input information or to issue a judgment.
  • the search process of the general search engine can be performed on the information input by the user, and the user input information is treated as the query information.
  • the user is prompted whether to publish information, for example, in the search result, "This message is posted on Sina Weibo: I bought a dress today. ", after receiving the confirmation message returned by the user, the input information can be published.
  • the prompt may include a login prompt or a selection prompt of a third-party information publishing platform or a third-party information publishing platform.
  • the account prompt correspondingly, the confirmation information may include login information or selection information of the third-party information publishing platform.
  • the login information includes an account number or a password
  • the selection information includes a selection of a third-party information publishing platform or a selection of an account on a third-party information publishing platform. It is worth noting that when the user is prompted to publish, it is also possible to perform a retrieval operation on the input information at the same time, and return the posting prompt together with the query result.
  • the system When the second probability is very large (assuming probability > threshold four), the system will directly publish the piece of information. After the release is successful, the user can also be returned a notification that the release was successful. In addition to this strategy, it is also possible to perform simple search, search and release prompt coexistence, simple release prompt, or direct release, depending on the probability.
  • the information input by the user can be posted to the third-party information publishing platform by calling the data interface of the third-party information publishing platform.
  • information with a specific information publishing requirement can be posted to a publishing platform such as Weibo, Q&A community, social network, etc., or information with general information publishing requirements can be published to such cities as 58 Class classification information publishing platform.
  • the method of the present invention can also publish information input by a user to a plurality of third party information publishing platforms. For example, the user wants to publish a message for transferring a train ticket.
  • the search process of the ordinary search engine can be performed, and will not be described in detail here.
  • FIG. 13 is a schematic flowchart diagram of Embodiment 2 of an information distribution method according to an embodiment of the present invention. Referring to FIG. 13, in the embodiment, steps 301 and 302 are the same as steps 201 and 202 in the first embodiment, and details are not described herein.
  • Step 303 Acquire user information of the user in the third-party information publishing platform, to obtain a second probability that the input information has a information publishing requirement, or analyze the user's past behavior, so as to obtain a third probability that the input information has a information publishing requirement, where The second probability is used to describe the possibility that the input information has an information publishing requirement based on the user information feature angle, and the third probability is used to describe the possibility that the input information has an information publishing requirement based on the user historical behavior feature angle.
  • the user information of the user in the third-party information publishing platform including the user's account information or the user's usage frequency information, wherein the account information refers to whether the user has an account number and an account number in the third-party information publishing platform, and which frequency information is used.
  • the tripartite information publishing platform is a common platform for users. Entering multiple third-party information publishing platforms through user's cookie data or user's IP address or computer's MAC address After polling, the user information of the user on the third-party information publishing platform is obtained.
  • the method for obtaining user information further includes verifying the online status of the user in the third-party information publishing platform, calling the usage record of the user's account on the third-party information publishing platform, or inputting by the user.
  • the user's past behavior has a guiding effect on deriving the user's current behavioral intent. If users often publish information in the same language, then when the user still enters information in this language, it is more likely to publish the information.
  • the user's past behaviors include his behavior on search engines, microblogs, forums, blogs, etc., which are expressed in language, behavioral habits (such as asking questions frequently or answering others' questions frequently).
  • the user behavior is analyzed.
  • the technical means used include data mining and machine learning. That is, through the user log, the user's behavior data is mined as a training sample.
  • the feature selection algorithm and the machine learning method are used to classify the user's behavior and output the first. Three probability.
  • Step 304 Search the input information by using the first probability and the second probability or by using the first probability and the third probability or publishing the input information to the three-party information publishing platform.
  • the input information is retrieved according to the first probability or the input information is posted to the third-party information publishing platform.
  • the specific implementation is to utilize the first probability and the second probability, the third Any combination of probabilities retrieves the input information or publishes the input information to a third-party information publishing platform.
  • the weight of each probability may be preset to determine the final judgment logic.
  • FIG. 14 is a schematic flowchart diagram of Embodiment 3 of a method for distributing information according to an embodiment of the present invention.
  • steps 401, 404, and 405 are the same as steps 301, 303, and 304 in the second embodiment, and details are not described herein.
  • Step 402 Perform matching verification on the user input information according to the description term library to determine a fourth probability that the user uses the advanced grammar.
  • the description term library is generated by extracting keywords of the third-party information publishing platform from the Internet data by mining the Internet data, and semantically expanding the keywords.
  • the Weibo platform is often called a bib, and Weibo can extend "wb:” or "bib:” as a description of the microblogging third-party information publishing platform.
  • Commonly used data mining methods include: neural network method, genetic algorithm, decision tree method, rough set method, statistical analysis method, fuzzy set method, etc., as these are all prior art, and will not be described in detail herein.
  • the so-called high-level grammar refers to the information input method that conforms to the method of describing the word library. For example, if the input information is "wb: catching a big fish today", the advanced grammar is used.
  • Match verification of user input information can be done according to a predefined strategy. For example, when the beginning part of the user input information completely matches the description word in the description word library, the fourth probability is 0.9; When the beginning part of the information does not exactly match the description word in the description word library, the fourth probability is 0.5; when the middle part of the user input information matches the description word in the description word library, the fourth probability is 0.3 and so on.
  • step 402 further includes step 4021: decomposing the user input information into a content portion and a syntax portion. For example, the user enters "wb: catching a big fish today” and gets a fourth probability of 0.9. Assuming that the threshold X is 0.5, since the fourth probability is greater than the threshold X, step 4021 decomposes "wb: catching a big fish today” into “wb:” and “fishing a big fish today", wherein “wb:” is the grammar part, "Catch a big fish today” for the content section.
  • Step 403 According to the fourth probability, two branches may be executed.
  • step 403 analyzes the input information by using a classifier model to obtain a first probability that the input information has a information publishing requirement.
  • step 403' analyzes the input information by using the content part and the fourth probability and the classifier model to obtain the first probability that the input information has a demand for information distribution, because in the fourth
  • the fourth probability is also used as a basis for calculating the first probability, which can effectively improve the confidence of the first probability.
  • FIG. 15 is a schematic flowchart diagram of Embodiment 4 of a method for distributing information according to an embodiment of the present invention.
  • steps 501, 503, 5031, 504 (504'), 505, 506 are the same as steps 401, 402, 4011, 403 (403'), 404, 405 in the third embodiment, I will not repeat them here.
  • Step 502 Identify user configuration information, where the user configuration information is a default behavior configured for the system according to the user's selection.
  • the default behavior mode includes default search or default release.
  • the system When the system is configured to default to search, the system only performs retrieval without performing the release; when the system is configured to default to publish, the system only performs the publication without performing the retrieval.
  • the default is search or the default is to publish. It can be further divided into more detailed configurations. For example, when the default is publishing, it can be configured to receive the information input by the user each time, and directly issue or display the prompt before posting the information, or configure. For publishing to a specific platform and more.
  • step 502 further includes step 5021: searching or publishing the user input information according to the user's configuration information.
  • the user may also be prompted to obtain confirmation of the prompt by the user prior to publication.
  • the user is configured to directly post to the Sina Weibo for each user input information, which means that the purpose of the user to publish the information is very clear. In this case, as long as the user is configured according to the user configuration, it can be very It satisfies the user's needs well, so there is no need to perform other operations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了一种具有信息发布和搜索功能的系统及信息发布方法,其中所述系统包括:展示模块;分类器构建模块,用于构建分类器模型;信息分析模块,用于根据分类器模型对输入信息进行分析,输出所述输入信息具有信息发布需求的第一概率;综合决策模块,用于根据第一概率决定是否对所述输入信息进行检索或发布;发布模块,用于将输入信息发布到第三方信息发布平台;检索模块,用于将查询结果返回给所述展示模块。通过上述方式,所述系统在搜索的基础上具有了信息发布的功能,可以很好地满足不同用户的需要。

Description

一种具有信息发布和搜索功能的系统及信息发布方法 本申请要求了申请日为 2011年 03月 18 日, 申请号为 2011 10066135.4, 发明 名称为"一种具有信息发布和搜索功能的系统及信息发布方法"的中国专利申请的优 先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及搜索技术领域, 特别涉及一种具有信息发布和搜索功能的系统及信 息发布方法。
背景技术
随着互联网技术的发展, 搜索已经成为了人们生活当中不可或缺的一部分。 通 过搜索, 人们可以轻易地获取到各个领域的知识和信息, 这大大地加速了信息的传 播。 搜索技术的出现, 极大地改变了人们传统的学习方式, 以往需要从图书馆、 教 育机构才能获取的知识, 现在人们只要轻点一下鼠标, 就可以轻松地从浩瀚的互联 网信息海洋中获取到。 随着搜索技术向着更加智能的方向发展, 人们也越来越容易 获取到个性化的信息, 然而, 随着 WEB2.0的广泛应用, 现在的互联网已经成为了 一个沟通和交流的平台,人们不仅从互联网获取信息, 更是常常为互联网提供信息, 例如人们广泛使用的博客、 微博、 知识问答社区、 SNS等, 都为人们提供了一个发 布信息、 分享知识的平台, 但是目前还没有一种系统, 能够根据用户输入的信息, 判断用户的意图, 从而决定是给用户返回搜索结果还是将用户输入信息发布到上文 所述的平台上去。 此外, 由于目前上文所述的平台彼此之间都是独立的, 每一个发 布信息的平台都建立在自己的管理基础之上, 如果用户在多个平台都注册了服务, 并且希望在不同的平台发布相同的信息, 那么他就不得不在多个平台进行登录和发 布的操作, 这显然给用户带来了不便。
发明内容
本发明所要解决的技术问题是提供一种具有信息发布和搜索功能的系统及信 息发布方法, 以实现通过用户输入信息判断用户意图, 并根据用户意图对输入信息 进行检索或发布的目的。
本发明为解决技术问题而釆用的技术方案是提供一种具有信息发布和搜索功 能的系统, 包括: 展示模块, 用于为用户提供所述系统的使用界面, 所述使用界面 用于接收用户的输入信息和将所述系统返回的处理结果展示给用户; 分类器构建模 块, 用于根据线下挖掘的历史数据或第三方信息发布平台提供的语料数据, 构建分 类器模型; 信息分析模块, 用于根据所述分类器模型对所述输入信息进行分析, 输 出所述输入信息具有信息发布需求的第一概率, 所述第一概率用于描述所述输入信 息基于语义特征角度的具有所述信息发布需求的可能性; 综合决策模块, 用于根据 所述第一概率决定是否对所述输入信息进行检索或发布; 发布模块, 用于调用所述 第三方信息发布平台的数据接口, 连接互联网以将所述输入信息发布到所述第三方 信息发布平台; 检索模块, 用于根据所述输入信息查询索引库, 并将查询结果返回 给所述展示模块。
根据本发明之一优选实施例, 所述系统为搜索引擎系统。
根据本发明之一优选实施例,所述使用界面的形式包括 WEB页面、 WAP页面、 带有搜索插件的浏览器与所述 WEB页面的结合、 或带有搜索插件的浏览器与所述 WAP页面的结合。
根据本发明之一优选实施例,所述 WEB ^面或所述 WAP 面内包括搜索框、 地址栏、 输入法框或信息输入界面。
根据本发明之一优选实施例, 所述第三方信息发布平台包括微博平台、 社交网 络平台、 论坛平台或电子公告平台。
根据本发明之一优选实施例,所述分类器模型是根据所述历史数据或所述语料 数据, 采用机器学习算法构建的。
根据本发明之一优选实施例,所述信息发布需求包括具体信息发布需求或通用 信息发布需求。
根据本发明之一优选实施例, 所述系统在发布前由所述展示模块对用户进行提 示以获取用户对所述提示的确认信息。
根据本发明之一优选实施例, 所述展示模块在对用户进行提示时返回所述检索 模块对所述输入信息的检索结果。
根据本发明之一优选实施例, 所述提示包括多个关于所述第三方信息发布平台 的提示信息。
根据本发明之一优选实施例, 所述确认信息包括对所述第三方信息发布平台的 选择信息或登录信息。
根据本发明之一优选实施例, 所述展示模块在接收到用户对所述提示的确认信 息后, 通过所述发布模块对所述输入信息进行发布。
根据本发明之一优选实施例, 所述发布模块进一步用于将所述输入信息发布于 多个所述第三方信息发布平台。
根据本发明之一优选实施例, 所述系统进一步包括: 用户信息获取模块, 用于 获取用户在所述第三方信息发布平台中的用户信息, 以得到所述输入信息具有所述 信息发布需求的第二概率, 其中所述第二概率用于描述所述输入信息基于用户信息 特征角度的具有信息发布需求的可能性; 所述综合决策模块利用所述第一概率与所 述第二概率, 决定是否对所述输入信息进行检索或发布。
根据本发明之一优选实施例,所述用户信息包括用户的帐号信息或用户的使用 频率信息。
根据本发明之一优选实施例, 所述用户信息的获取方式包括检验用户在所述第 三方信息发布平台的在线状态、 调用用户在所述第三方信息发布平台的帐号的使用 记录或接收用户在所述展示模块的输入。
根据本发明之一优选实施例, 所述系统进一步包括: 用户行为分析模块, 用于 分析用户以往的行为, 以得到所述输入信息具有所述信息发布需求的第三概率, 其 中所述第三概率用于描述所述输入信息基于用户历史行为特征角度的具有所述信 息发布需求的可能性; 所述综合决策模块利用所述第一概率与所述第三概率, 决定 是否对所述输入信息进行检索或发布。
根据本发明之一优选实施例, 所述系统进一步包括: 高级语法挖掘模块, 用于 对互联网数据进行挖掘, 从所述互联网数据中提取用户描述所述第三方信息发布平 台的关键词, 并对所述关键词进行语义扩展, 生成针对所述第三方信息发布平台的 描述词语库; 高级语法匹配模块, 用于根据所述描述词语库对所述输入信息进行匹 配验证, 以判断用户使用高级语法的第四概率, 当所述第四概率大于第一阈值时, 所述高级语法匹配模块进一步将所述输入信息分解为内容部分与语法部分, 并将所 述内容部分与所述第四概率传递给所述信息分析模块, 当所述第四概率不大于第一 阈值时, 所述高级语法匹配模块进一步将所述输入信息直接传递给所述信息分析模 块; 所述信息分析模块利用所述高级语法匹配模块传递的数据及所述分类器模型, 输出所述第一概率。
根据本发明之一优选实施例, 所述系统进一步包括: 用户配置模块, 用于根据 用户的选择为所述系统配置默认的行为模式, 所述默认的行为模式包括默认为搜索 或默认为发布, 其中当系统被配置成所述默认为搜索时, 系统只执行检索不执行发 布, 系统被配置成所述默认为发布时, 系统只执行发布不执行检索; 用户配置识别 模块, 用于识别用户配置信息, 并根据所述用户配置信息对所述输入信息进行检索 或发布。
根据本发明之一优选实施例,所述系统在发布前由所述展示模块对用户进行提 示以获取用户对所述提示的确认信息。
本发明还提供了一种信息发布方法, 包括步骤: a.接收用户输入信息; b. 根据 分类器模型对所述输入信息进行分析, 以得到所述输入信息具有信息发布需求的第 一概率, 所述第一概率用于描述所述输入信息基于语义特征角度的具有所述信息发 布需求的可能性; c. 根据所述第一概率对所述输入信息进行检索或将所述输入信息 发布到第三方信息发布平台。
根据本发明之一优选实施例,所述步骤 a中从搜索引擎的使用界面接收用户输 入信息。
根据本发明之一优选实施例, 所述使用界面的具体形式包括 WEB页面、 WAP 面、 带有搜索插件的浏览器与所述 WEB 面的结合、 或带有搜索插件的浏览器 与所述 WAP页面的结合。
根据本发明之一优选实施例,所述 WEB页面或所述 WAP页面内包括搜索框、 地址栏、 输入法框或信息输入界面。
根据本发明之一优选实施例, 所述第三方信息发布平台包括微博平台、 社交网 络平台、 论坛平台或电子公告平台。
根据本发明之一优选实施例,所述分类器模型是根据线下挖掘的历史数据或所 述第三方信息发布平台提供的语料数据, 采用机器学习算法构建的。
根据本发明之一优选实施例,所述信息发布需求包括具体信息发布需求或通用 信息发布需求。
根据本发明之一优选实施例, 所述步骤 c中, 在所述发布前对用户进行提示以 获取用户对所述提示的确认信息。
根据本发明之一优选实施例,在对用户进行所述提示时返回对所述输入信息的 检索结果。
根据本发明之一优选实施例, 所述提示包括多个关于所述第三方信息发布平台 的提示信息。
根据本发明之一优选实施例,所述确认信息包括对所述第三方信息发布平台的 选择信息或登录信息。
根据本发明之一优选实施例, 在接收到用户对所述提示的确认信息后, 对所述 输入信息进行发布。
根据本发明之一优选实施例, 所述步骤 C中, 将所述输入信息发布于多个所述 第三方信息发布平台。 根据本发明之一优选实施例, 所述方法在步骤 c前进一步包括步驟: d. 获取 用户在所述第三方信息发布平台中的用户信息, 以得到所述输入信息具有所述信息 发布需求的第二概率, 其中所述第二概率用于描述所述输入信息基于用户信息特征 角度的具有所述信息发布需求的可能性; 所述步骤 c中利用所述第一概率与所述第 二概率对所述输入信息进行检索或将所述输入信息发布到第三方信息发布平台。
根据本发明之一优选实施例,所述用户信息包括用户的帐号信息或用户的使用 频率信息。
根据本发明之一优选实施例,所述步骤 d中用户信息的获取方式包括检验用户 在所述第三方信息发布平台的在线状态、 调用用户在所述第三方信息发布平台的帐 号的使用记录或接收用户在所述展示模块的输入。
根据本发明之一优选实施例,所述方法在步骤 c前进一步包括步骤: e. 分析用 户以往的行为, 以得到所述输入信息具有所述信息发布需求的第三概率, 其中所述 第三概率用于描述所述输入信息基于用户历史行为特征角度的具有所述信息发布 需求的可能性; 所述步驟 c中利用所述第一概率与所述第三概率对所述输入信息进 行检索或将所述输入信息发布到第三方信息发布平台。
根据本发明之一优选实施例, 所述方法在步驟 b前进一步包括步骤: f. 根据描 述词语库对所述输入信息进行匹配验证, 以判断用户使用高级语法的第四概率, 其 中所述描述词语库是通过对互联网数据进行挖掘, 从所述互联网数据中提取用户描 述所述第三方信息发布平台的关键词, 并对所述关键词进行语义扩展后生成的; 当 所述第四概率大于第一阈值时, 将所述输入信息分解为内容部分与语法部分, 所述 步骤 b利用所述内容部分与所述第四概率及所述分类器模型得到第一概率。
根据本发明之一优选实施例, 所述方法在步骤 b前进一步包括步驟: g. 识别 用户配置信息, 其中所述用户配置信息是根据用户的选择为系统配置的默认的行为 模式, 所述默认的行为模式包括默认为搜索或默认为发布, 其中当系统被配置成所 述默认为搜索时, 系统只执行检索不执行发布, 系统被配置成所述默认为发布时, 系统只执行发布不执行检索; 当识别出系统具有用户配置信息时, 根据所述用户配 置信息对所述输入信息进行检索或发布。
根据本发明之一优选实施例,在所述发布前对用户进行提示以获取用户对所述 提示的确认信息。
由以上技术方案可以看出, 所述系统通过对用户输入信息进行分析判断, 可以 很好地满足不同用户的不同需要, 使得所述系统在信息检索的基础上具有了信息发 布功能, , 在用户有不指定特定发布平台的信息发布需求时, 所述系统还可以轻易 地为用户在多个信息发布平台上进行信息发布, 这不仅大大简化了信息发布流程, 节约了用户的时间, 也为用户发布的信息更快更准确地到达信息需求方提供了可能。 附图说明
图 1 是本发明实施例中具有信息发布和搜索功能的系统的实施例一的结构示 意框图;
图 2是本发明实施例中使用界面的一种具体形式的实施例示意图;
图 3是本发明实施例中使用界面的另一种具体形式的实施例示意图; 图 4 是本发明实施例中使用界面将系统返回的结果展示给用户时的一个实施 例示意图;
图 5 是本发明实施例中展示模块返回的提示中包含多个第三方信息发布平台 的提示信息的实施例示意图;
图 6是本发明实施例中展示模块返回的提示中包含登录提示的实施例示意图; 图 7为本发明实施例中展示模块返回的提示中包含选择提示的实施例示意图; 图 8 为本发明实施例中发布成功后展示模块返回发布成功的通知的实施例示 意图;
图 9 是本发明实施例中具有信息发布和搜索功能的系统的实施例二的结构示 意框图;
图 10是本发明实施例中具有信息发布和搜索功能的系统的实施例三的结构示 意框图;
图 11是本发明实施例中具有信息发布和搜索功能的系统的实施例四的结构示 意框图。
图 12是本发明实施例中信息发布方法的实施例一的流程示意图;
图 13是本发明实施例中信息发布方法的实施例二的流程示意图;
图 14是本发明实施例中信息发布方法的实施例三的流程示意图;
图 15是本发明实施例中信息发布方法的实施例四的流程示意图。
具体实施方式
为了使本发明的目的、 技术方案和优点更加清楚, 下面结合附图和具体实施例 对本发明进行详细描述。
本发明中具有信息发布和搜索功能的系统, 可以是任意的搜索引擎系统, 或其 他的 BS ( browser-server ) 结构或 CS ( client- server ) 结构的系统。 图 1为本发明实 施例中具有信息发布和搜索功能的系统的实施例一的结构示意框图。
如图 1所示, 具有信息发布和搜索功能的系统包括展示模块 101、 分类器构建 模块 102、 信息分析模块 103、 综合决策模块 104、 发布模块 105、 检索模块 106。
其中, 展示模块 101 , 用于为用户提供系统的使用界面, 该使用界面可接收用 户的输入信息和将搜索引擎返回的处理结果展示给用户。
如图 2 所示, 为使用界面的一种具体形式的实施例示意图。 使用界面包括了 WEB页面与 WAP页面, 其中 WEB页面是符合 HTML格式的普通电脑浏览器可以 识别的网页文本, WAP页面是更适用于手机浏览器显示的网页文本。
如图 3所示, 为使用界面的另一种具体形式的实施例示意图。 在该实施例中, 使用界面并不仅仅是一个苋面, 而是包含了搜索插件的浏览器与页面的结合, 在浏 览器的搜索插件的搜索框处, 也可以获取用户输入的信息。
如图 4所示,为使用界面将系统返回的结果展示给用户时的一个实施例示意图。 在该实施例中, 使用界面在搜索的结果页面中增加了对用户发布信息的引导提示。
在上述的 WEB页面或 WAP页面内, 可以包含搜索框、 地址栏、 输入法框或 信息输入界面, 其中的信息输入界面可以用来输入各种需要发表的信息, 包括微博 信息、 社交网络信息、 论坛信息或电子公告信息等。 其中的社交网络包括开心网, 人人网等网站。 对于有明确的信息发表目的的用户来说, 通过信息输入的界面, 就 可以直接输入信息, 通过搜索引擎将信息发布出去。
分类器构建模块 102, 用于根据线下挖掘的历史数据或第三方信息发布平台提 供的语料数据, 釆用机器学习算法, 构建分类器模型。 所述第三方信息发布平台指 的是与所述系统建立了技术关联的信息发布平台系统, 包括微博平台、 社交网络平 台、 论坛平台或电子公告平台。
线下挖掘的历史数据与第三方信息发布平台提供的语料数据, 指的是用户的 query输入数据, 是采用机器学习算法构建分类器模型时的训练样本, 下面以 SVM (支持向量机)这种机器学习算法为例, 对构建分类器模型进行介绍。
SVM 的分类原理可概括为: 寻找一个分类超平面, 使得训练样本中的两类样 本点能被分开, 并且距离该平面尽可能地远; 而对线性不可分的问题, 通过核函数 将低维输入空间的数据映射到高位空间, 从而将原低维空间的线性不可分问题转化 为高维空间上的线性可分问题。 对于两类问题, 给定样本集 J' ) , Xi e Rd , ^ = ^ ~l i = 1'2"'J , 以及核函 数 Κ、 ' χ J = (φ(χ' ) " Φ( )) ,其中 Φ是非线性映射函数。 SVM训练出的学习机器为: f{x) = {w→{x)) + b ^ 其中 w是权重, b是偏置。 对本发明而言,样本集 (xi, yi)中的 xi是由训练语料(即线下挖掘数据或第三方 信息发布平台提供的数据)的特征组成的特征向量, yi表示两类问题中的其中一个, 如果存在多类问题, 可以把它变为多个两类问题进行处理。 在本发明中的类型, 指 的是信息属于哪类发布需求, 包括有发布平台一的发布需求、 有发布平台二的发布 需求等等。
由此可见, 要用样本训练出具有较好分类效果的分类器, 也就是得到理想的分 类器权重 w和偏置 b, 特征选取是个关键因素。 在本发明中, 可以釆用下述变量作 为特征: query中各种标点符号的数量以及位置、 query中字符串长度、 query的末 尾是否为字符、 query是否有特殊字符串、 query中数字的个数、 query中是否有属 于分类词汇表中的词语、 query中每个词的搜索量、 query中每个词由搜索引擎得到 的搜索结果数等, 其中的分类词汇表指的是诸如经济、 历史、 天文、 地理之类的类 别表。
需要强调的是, 上述所列的特征仅是为了说明本实施例而列举, 并不代表本发 明仅限于使用上述特征, 任何为了构建分类器模型而使用的特征, 都不超出本发明 的思想范围。
信息分析模块 103 , 用于根据由分类器构建模块 102构建的分类器模型对用户 输入信息进行分析, 输出用户输入信息具有信息发布需求的第一概率, 其中所述第 一概率用于描述输入信息基于语义特征角度的具有信息发布需求的可能性。
所述信息发布需求包括具体信息发布需求或通用信息发布需求。 具体信息发布 需求指的是诸如发布于微博、 SNS等特定的发布平台的发布需求, 而通用的信息发 布需求指的是用户没有指定发布平台的发布需求, 例如用户可能想要转让火车票, 或者需要在某个区域租一套房子, 这些信息发布需求并不指定特定的发布平台, 用 户只是希望将此类型的信息发布到互联网上, 只要能被其他用户看到, 那么信息发 布的用户并不介意信息在哪个平台上发布。
信息分析模块依靠分类器模型, 可以对用户输入的信息在语义上进行分析, 判 断用户输入的信息从语义角度考虑, 在各种发布平台上发布的第一概率大小。 例如 用户输入的信息为 "求租中关村三居室一套" , 信息分析模块输出的结果为 (求租 中关村三居室一套, P综合 = 0.9 )、 (求租中关村三居室一套, P微博 = 0.7 )、 (求 租中关村三居室一套, P知识问答社区 = 0.2 )等等, 其中 P代表的是第一概率。 如 果用户输入的信息为 "哪儿的菜最好吃" , 那么信息分析模块输出的结果为 (哪儿 的菜最好吃, P综合 = 0.2 ) 、 (哪儿的菜最好吃, P微博 = 0.1 ) 、 (哪儿的菜最好 吃, P知识问答社区 = 0.1 )等等, 由于在各个平台上得到的第一概率都很小, 那么 "哪儿的菜最好吃" 从语义角度考虑, 就很可能不具备信息发布需求, 而有可能是 用户输入的查询信息。 另外如果用户输入 "发微博" , 则表明用户就是打算要在微 博平台上发布信息, 相应的在微博平台上的第一概率就非常大, 在其他平台上的第 一概率就很小。 上述输出结果的数据结构仅是为了说明本发明而釆用的示意性描述, 在具体实现时可根据需要采取其他方式, 本发明对此不作限定。
综合决策模块 104, 用于根据第一概率决定是否对用户输入的信息进行检索或 发布。 在判断出应该对输入信息进行发布时进一步包括判断是否需要在发布前对用 户进行提示。 当需要对用户进行提示时, 由展示模块返回对用户的提示, 其中所述 的提示可包括一个或多个关于第三方信息发布平台的提示信息。 请参考图 5 , 图 5 为本发明实施例中展示模块返回的提示中包含多个第三方信息发布平台的提示信 息的实施例示意图。
例如当第一概率很小时 (假设概率 <阈值一), 可以对用户输入的信息执行普 通搜索引擎的检索流程, 把用户输入信息当作查询信息来处理。
当第一概率位于某个区间时(假设阈值二 <概率 <阈值三) , 则通过展示模块提 示用户是否需要发布信息, 例如在搜索结果页面提示 "在新浪微博发布这条信息: 我今天买了一件衣服" , 当展示模块接收到用户返回的确认信息后, 就可以对输入 信息进行发布。 进一步地, 提示可包括登录提示或第三方信息发布平台的选择提示 或在第三方信息发布平台的帐号提示, 相应的, 确认信息可包括第三方信息发布平 台的登录信息或选择信息。 其中的登录信息包括帐号或密码, 选择信息包括对第三 方信息发布平台的选择或对在第三方信息发布平台上的帐号的选择。 请参考图 6和 图 7, 其中图 6为本发明实施例中展示模块返回的提示中包含登录提示的实施例示 意图,图 7为本发明实施例中展示模块返回的提示中包含选择提示的实施例示意图。 值得注意的是, 在对用户进行发布提示的时候, 也可以同时对输入信息执行检索操 作, 将发布提示与查询结果一并返回。
当第一概率非常大时 (假设概率>阈值四) , 系统就直接发布该条信息。 除以 上策略之外, 也可以根据概率的不同, 对用户输入信息分别执行单纯的搜索、 搜索 与发布提示并存、 单纯的发布提示或直接发布等操作。
另外, 在发布成功后, 还可以给用户返回发布成功的通知。 请参考图 8 , 图 8 为本发明实施例中发布成功后展示模块返回发布成功的通知的实施例示意图。
发布模块 105 , 用于调用第三方信息发布平台的数据接口, 连接互联网以将用 户输入的信息发布到第三方的信息发布平台。
当综合决策模块判断需要对用户输入信息进行发布时, 综合决策模块将调用发 布模块将用户输入的信息发布到第三方信息发布平台。 此外当展示模块接收到用户 返回的对发布提示的确认信息时, 发布模块也会将用户信息输入的信息发布到第三 方信息发布平台。
根据具体需要的不同, 发布模块可以将具有某一具体信息发布需求的信息发布 到 4 博、 知识问答社区、 社交网络之类的发布平台, 也可以将具有通用信息发布需 求的信息发布到诸如 58同城之类的分类信息发布平台。 此外, 发布模块还可以将 用户输入的信息发布于多个第三方信息发布平台。 例如用户想要发布一条转让火车 票的信息, 以往用户为了尽快让更多人找到自己的这条信息, 会在多个分类信息发 布平台上发布这条信息, 但是通过本发明, 用户只需要在搜索引擎的搜索框中输入 信息, 本发明的系统就可以自动将该信息发布于多个信息发布平台, 大大简化了信 息发布流程。
检索模块 106, 用于根据用户输入信息查询索引库, 并将查询结果返回给展示 模块。
在用户的输入信息没有发布需求或者用户的输入信息有发布需求但需要返回 发布提示的情况下,综合决策模块通过调用检索模块,都可实现对输入信息的检索。
检索模块包括查询子模块和排序子模块, 与普通的搜索引擎一样, 查询子模块 用于根据用户输入信息查询索引库, 而排序子模块用于对查询结果进行排序, 并将 排序后的结果返回。 由于检索模块的实现技术与现有技术相同, 在此不再赘述。
图 9为本发明实施例中具有信息发布和搜索功能的系统的实施例二的结构示意 框图。 在本实施例中, 具有信息发布和搜索功能的系统进一步还包括用户信息获取 模块 107或用户行为分析模块 108或以上两个模块的任意组合。
其中用户信息获取模块 107 , 用于获取用户在第三方信息发布平台中的用户信 息, 以得到输入信息具有信息发布需求的第二概率, 所述第二概率用于描述输入信 息基于用户信息特征角度的具有信息发布需求的可能性。 用户在第三方信息发布平 台中的用户信息, 包括用户的帐号信息或用户的使用频率信息, 其中帐号信息指用 户是否在第三方信息发布平台具有帐号及帐号是什么, 使用频率信息指哪一个第三 方信息发布平台是用户的常用平台。
用户信息获取模块可以通过用户的 cookie数据或用户的 IP地址或电脑的 MAC 地址等多种方式向多个第三方信息发布平台进行轮询, 得到用户在第三方信息发布 平台的用户信息。 用户信息获取的方式还包括检验用户在第三方信息发布平台的在 线状态、 调用用户在第三方信息发布平台的帐号的使用记录或接收用户在展示模块 的输入。 例如在展示模块, 提供用户登录微博的帐号和密码窗口, 由用户自行在窗 口输入信息。
由于有了第二概率, 因此在一个实施例中, 综合决策模块根据第一概率决定是 否对输入信息进行检索或发布时的具体实施方式为利用第一概率与第二概率决定 是否对输入信息进行检索或发布, 具体可釆用为第一概率与第二概率预置权重的方 式, 来决定最终的判断逻辑。
用户行为分析模块 108 , 用于分析用户以往的行为, 以得到输入信息具有信息 发布需求的第三概率, 该第三概率用于描述输入信息基于用户历史行为特征角度的 具有信息发布需求的可能性。
用户过去的行为对推导用户当前行为意图具有指导作用。如果用户常常用同一 种语言方式发布信息, 那么当用户仍以这种语言方式输入信息时, 就较有可能是为 了发布信息。 用户以往的行为包括其在搜索引擎、 微博、 论坛、 博客等平台上的行 为, 表现为语言方式、 行为习惯 (如常向别人提问还是常回答别人的问题) 等。
用户行为分析模块釆用的技术手段包括数据挖掘和机器学习,即通过用户日志, 挖掘出用户的行为数据作为训练样本, 由特征选择算法及机器学习方法, 对用户的 行为进行分类并输出第三概率。
由于有了第三概率, 在一个实施例中, 综合决策模块根据第一概率决定是否对 输入信息进行检索或发布时的具体实施方式为利用第一概率与第三概率决定是否 对输入信息进行检索或发布, 具体可采用为第一概率与第三概率预置权重的方式, 来决定最终的判断逻辑。
图 10为本发明实施例中具有信息发布和搜索功能的系统的实施例三的结构示 意框图。 在本实施例中, 具有信息发布和搜索功能的系统进一步还包括高级语法挖 掘模块 109与高级语法匹配模块 110。
其中高级语法挖掘模块 109, 用于对互联网数据进行挖掘, 从所述互联网数据 中提取用户描述第三方信息发布平台的关键词, 并对关键词进行语义扩展, 生成针 对第三方信息发布平台的描述词语库。
例如, 微博平台通常被称为围脖, 可以由微博扩展出 "wb: " 或 "围脖: " 作为对微博第三方信息发布平台的描述词语。
常用的数据挖掘方法包括:神经网络方法、遗传算法、决策树方法、粗集方法、 统计分析方法、模糊集方法等,由于这些都属于现有技术,在此不再进行详细描述。
高级语法匹配模块 110 , 用于根据描述词语库对输入信息进行匹配验证, 以判 断用户使用高级语法的第四概率, 当第四概率大于阈值 X时, 将输入信息分解为内 容部分及语法部分, 并将内容部分及第四概率传递给信息分析模块, 当第四概率不 大于阈值 X时, 直接将输入信息传递给信息分析模块。 所谓的高级语法, 指的是符 合描述词语库写法的信息输入方式,例如输入信息为 "wb:今天钓到一条很大的鱼", 就使用了高级语法。
在高级语法匹配模块中, 进一步包括验证单元与分解单元, 其中验证单元用于 根据预先定义的策略对输入信息进行匹配验证, 并产生相应的用户使用高级语法的 第四概率,分解单元用于将第四概率大于阈值 X的信息分解为内容部分和语法部分。
例如用户输入 "wb: 今天钓到一条很大的鱼" , 验证单元根据预先定义的策 略对输入信息进行扫描以得到第四概率, 例如一种策略为在输入信息的开头扫描到 完全符合描述词语库中的描述词语时, 第四概率为 0.9。 假设阈值 X为 0.5, 由于第 四概率大于阈值 X, 则分解单元会将" wb: 今天钓到一条很大的鱼"分解为 "wb: " 和 "今天钓到一条很大的鱼" , 其中 "wb: " 为语法部分, "今天钓到一条很大的 鱼" 为内容部分。
高级语法匹配模块不仅会输出分解后的内容部分,还会输出用户使用高级语法 的第四概率。 验证单元在匹配验证时, 根据不同的策略, 可以得到不同的使用高级 语法第四概率, 例如用户输入信息的开头部分完全匹配描述词语库中的描述词语时, 得到第四概率为 0.9; 用户输入信息的开头部分不完全匹配描述词语库中的描述词 语时, 得到第四概率为 0.5; 用户输入信息的中间部分匹配描述词语库中的描述词 语时, 得到第四概率为 0.3等等。
在本实施例中,高级语法匹配模块输出的内容部分与第四概率将传递到信息分 析模块, 用以为信息分析模块计算第一概率提供更多的依据。 信息分析模块根据分 类器模型输出第一概率的具体实施方式为利用高级语法匹配模块传递的数据及分 类器模型输出第一概率, 可釆用预置策略实现, 例如第四概率很大, 说明用户使用 高级语法的可能性很大, 相应地, 用户想要发布信息的可能性也就很大, 第一概率 也就很大。
图 11为本发明实施例中具有信息发布和搜索功能的系统的实施例四的结构示 意框图。 在本实施例中, 具有信息发布和搜索功能的系统进一步包括: 用户配置模 块 111和用户配置识别模块 112。
其中, 用户配置模块 111 , 用于根据用户的选择为搜索引擎系统配置默认的行 为模式。 默认的行为模式包括默认为搜索或默认为发布, 其中当系统被配置成默认 为搜索时, 系统只执行检索不执行发布; 当系统被配置成默认为发布时, 系统只执 行发布不执行检索。
默认为搜索或默认为发布可以进一步分为更细化的配置, 例如默认为发布时, 又可以配置为每次接收到用户输入的信息, 都通过发布模块直接发布或通过展示模 块向用户显示发布信息前的提示, 或者是配置为发布到某个特定的平台等等。
用户配置识别模块 112, 用于识别用户配置信息, 并根据用户配置信息对输入 信息进行搜索发布, 其中在发布前还可由展示模块对用户进行提示以获取用户对提 示的确认信息。
系统如果识别出用户配置为默认搜索,则将用户输入信息都当作查询信息看待, 对输入信息发出搜索请求; 系统如果识别出用户配置为默认发布, 则根据进一步细 化的配置, 判断是直接发布的配置还是在发布前进行提示的配置。 如果是直接发布 配置, 则对用户输入信息发出直接发布的请求, 否则就对输入信息发出在发布前对 用户进行提示的请求, 当接收到用户的确认信息时, 就可以对输入信息进行发布。 例如, 用户配置为将输入信息直接发布到新浪微博, 这说明用户使用本发明的搜索 引擎系统, 是有特定目的的, 用户自己本身非常明确这种目的, 在这种情况下, 只 要按照用户配置进行相应的处理, 就能够很好地满足用户需要了, 因此没有必要再 让搜索引擎执行其他的操作。
如果用户识别模块识别的结果是用户并未进行任何配置,则将用户输入信息输 出至下一个处理模块。
图 12为本发明实施例中信息发布方法的实施例一的流程示意图。 在本实施例 中, 信息发布方法包括步驟 201 : 接收用户输入信息。 在一种方式下, 是从搜索引 擎的使用界面接收用户输入信息,其中搜索引擎的使用界面包括了 WEB页面、 WAP 页面、带有搜索插件的浏览器与 WEB页面的结合、或带有搜索插件的浏览器与 WAP 页面的结合。 在上述的 WEB页面或 WAP页面内, 可以包含搜索框、 地址栏、 输入 法框或信息输入界面, 其中的信息输入界面可以用来输入各种需要发表的信息, 包 括微博信息、 社交网络信息、 论坛信息或电子公告信息等。 接收用户输入信息是进 行后续处理的前提。
步骤 202: 根据分类器模型对用户输入信息进行分析, 以得到输入信息具有信 息发布需求的第一概率, 其中第一概率用于描述输入信息基于语义特征角度的具有 信息发布需求的可能性。
分类器模型是根据线下挖掘的历史数据或第三方信息发布平台提供的语料数 据, 采用机器学习算法进行构建的。 线下挖掘的历史数据与第三方信息发布平台提 供的语料数据, 指的是用户的 query输入数据, 是采用机器学习算法构建分类器模 型时的训练样本, 下面以 SVM (支持向量机)这种机器学习算法为例, 对构建分类 器模型进行介绍。
SVM 的分类原理可概括为: 寻找一个分类超平面, 使得训练样本中的两类样 本点能被分开, 并且距离该平面尽可能地远; 而对线性不可分的问题, 通过核函数 将低维输入空间的数据映射到高位空间, 从而将原低维空间的线性不可分问题转化 为高维空间上的线性可分问题。 对于两类问题, 给定样本集 ^ ' y' ) , ' e Rd , y' = ^1 "1^ , i = 12"J , 以及核函 数 κ、 ' χ J = (φ(χ' ) " Φ( )) ,其中 Φ是非线性映射函数。 SVM训练出的学习机器为: /(Λ) = (ιν · Φ(χ)) + &, 其中 w是权重, b是偏置。 对本发明而言,样本集 (xi,yi)中的 xi是由训练语料(即线下挖掘数据或第三方 信息发布平台提供的数据)的特征组成的特征向量, yi表示两类问题中的其中一个, 如果存在多类问题, 可以把它变为多个两类问题进行处理。 在本发明中的类型, 指 的是信息属于哪类发布需求, 包括有发布平台一的发布需求、 有发布平台二的发布 需求等等。
由此可见, 要用样本训练出具有较好分类效果的分类器, 也就是得到理想的分 类器权重 w和偏置 b, 特征选取是个关键因素。 在本发明中, 可以釆用下述变量作 为特征: query中各种标点符号的数量以及位置、 query中字符串长度、 query的末 尾是否为字符、 query是否有特殊字符串、 query中数字的个数、 query中是否有属 于分类词汇表中的词语、 query中每个词的搜索量、 query中每个词由搜索引擎得到 的搜索结果数等, 其中的分类词汇表指的是诸如经济、 历史、 天文、 地理之类的类 别表。
信息发布需求包括具体信息发布需求或通用信息发布需求, 其中具体信息发布 需求指的是诸如发布于微博、 SNS等特定的发布平台的发布需求, 而通用的信息发 布需求指的是用户没有指定发布平台的发布需求, 例如用户可能想要转让火车票, 或者需要在某个区域租一套房子, 这些信息发布需求并不指定特定的发布平台, 用 户只是希望将此类型的信息发布到互联网上, 只要能被其他用户看到, 那么信息发 布的用户并不介意信息在哪个平台上发布。
依靠分类器模型, 可以对用户输入的信息在语义上进行分析, 判断用户输入的 信息从语义角度考虑, 在各种发布平台上发布的第一概率大小。 例如用户输入的信 息为 "求租中关村三居室一套" , 分析的结果为 (求租中关村三居室一套, P综合 = 0.9 ) 、 (求租中关村三居室一套, P微博 = 0.7 ) 、 (求租中关村三居室一套, P 知识问答社区 = 0.2 )等等, 其中 P代表的是第一概率。 如果用户输入的信息为 "哪 儿的菜最好吃 ",那么信息分析模块输出的结果为(哪儿的菜最好吃, P综合 = 0.2 )、 (哪儿的菜最好吃, P 4 :博 = 0.1 ) 、 (哪儿的菜最好吃, P知识问答社区 = 0.1 ) 等 等, 由于在各个平台上得到的第一概率都很小, 那么 "哪儿的菜最好吃" 从语义角 度考虑, 就很可能不具备信息发布需求, 而有可能是用户输入的查询信息。 另外如 果用户输入 "发微博" , 则表明用户就是打算要在微博平台上发布信息, 相应的在 微博平台上的第一概率就非常大, 在其他平台上的第一概率就很小。 上述结果的数 据结构仅是为了说明本发明而釆用的示意性描述, 在具体实现时可根据需要采取其 他方式, 本发明对此不作限定。
步骤 203: 根据第一概率对输入信息进行检索或将输入信息发布到第三方信息 发布平台。 其中将所述输入信息发布到第三方信息发布平台进一步包括在发布前对 用户进行发布提示。 所述提示可包括一个或多个关于第三方信息发布平台的提示信 息。 根据第一概率的不同, 可以预先设置一系列的策略对用户输入信息执行检索还 是发布进行判断。
例如当第一概率很小时 (假设概率 <阈值一) , 可以对用户输入的信息执行普 通搜索引擎的检索流程, 把用户输入信息当作查询信息来处理。
当第一概率位于某个区间时(假设阈值二 <概率 <阈值三) , 提示用户是否需要 发布信息, 例如在搜索结果 面提示 "在新浪微博发布这条信息: 我今天买了一件 衣服", 当接收到用户返回的确认信息后,就可以对输入信息进行发布。进一步地, 提示可包括登录提示或第三方信息发布平台的选择提示或在第三方信息发布平台 的帐号提示,相应的,确认信息可包括第三方信息发布平台的登录信息或选择信息。 其中的登录信息包括帐号或密码, 选择信息包括对第三方信息发布平台的选择或对 在第三方信息发布平台上的帐号的选择。 值得注意的是, 在对用户进行发布提示的 时候, 也可以同时对输入信息执行检索操作, 将发布提示与查询结果一并返回。
当第二概率非常大时(假设概率>阈值四) , 系统就直接发布该条信息。 在发布 成功后, 还可以给用户返回发布成功的通知。 除了这种策略之外, 也可以根据概率 的不同, 对用户输入信息分别执行单纯的搜索、 搜索与发布提示并存、 单纯的发布 提示或直接发布等操作。
当需要对用户输入信息进行发布时,通过调用第三方信息发布平台的数据接口, 就可以将用户输入的信息发布到第三方信息发布平台。 根据具体需要的不同, 可以 将具有某一具体信息发布需求的信息发布到微博、 知识问答社区、 社交网络之类的 发布平台, 也可以将具有通用信息发布需求的信息发布到诸如 58 同城之类的分类 信息发布平台。 此外, 本发明的方法还可以将用户输入的信息发布于多个第三方信 息发布平台。 例如用户想要发布一条转让火车票的信息, 以往用户为了尽快让更多 人找到自己的这条信息, 会在多个分类信息发布平台上发布这条信息, 但是通过本 发明的方法, 用户只需要在搜索引擎的搜索框中输入信息, 就可以实现将该信息发 布于多个信息发布平台的目的, 大大筒化了信息发布流程。
当需要对用户输入信息进行检索时, 可以执行普通搜索引擎的检索流程,这里 不再详细描述。
图 13为本发明实施例中信息发布方法的实施例二的流程示意图。请参考图 13 , 在本实施例中, 步骤 301、 302分别与实施例一中的步驟 201、 202、 相同, 在此不 再赘述。
步驟 303: 获取用户在第三方信息发布平台中的用户信息, 以得到输入信息具 有信息发布需求的第二概率, 或分析用户以往的行为, 以得到输入信息具有信息发 布需求的第三概率, 其中第二概率用于描述输入信息基于用户信息特征角度的具有 信息发布需求的可能性, 第三概率用于描述输入信息基于用户历史行为特征角度的 具有信息发布需求的可能性。
用户在第三方信息发布平台中的用户信息,包括用户的帐号信息或用户的使用 频率信息, 其中帐号信息指用户是否在第三方信息发布平台具有帐号及帐号是什么, 使用频率信息指哪一个第三方信息发布平台是用户的常用平台。 通过用户的 cookie 数据或用户的 IP地址或电脑的 MAC地址等多种方式向多个第三方信息发布平台进 行轮询, 就得到用户在第三方信息发布平台的用户信息。 用户信息获取的方式还包 括检验用户在第三方信息发布平台的在线状态、 调用用户在第三方信息发布平台的 帐号的使用记录或由用户自行输入。
用户过去的行为对推导用户当前行为意图具有指导作用。如果用户常常用同一 种语言方式发布信息, 那么当用户仍以这种语言方式输入信息时, 就较有可能是为 了发布信息。 用户以往的行为包括其在搜索引擎、 微博、 论坛、 博客等平台上的行 为, 表现为语言方式、 行为习惯 (如常向别人提问还是常回答别人的问题) 等。
对用户行为进行分析, 采用的技术手段包括数据挖掘和机器学习, 即通过用户 日志, 挖掘出用户的行为数据作为训练样本, 由特征选择算法及机器学习方法, 对 用户的行为进行分类并输出第三概率。
步骤 304: 利用第一概率与第二概率或利用第一概率与第三概率对输入信 息进行检索或将输入信息发布到三方信息发布平台。
由于有了第二、第三概率,在步骤 304中根据第一概率对输入信息进行检索或 将输入信息发布到第三方信息发布平台的具体实施方式为利用第一概率与第二概 率、 第三概率的任意组合对输入信息进行检索或将输入信息发布到第三方信息发布 平台, 具体可采用为各个概率预置权重的方式, 来决定最终的判断逻辑。
图 14为本发明实施例中信息发布方法的实施例三的流程示意图。请参考图 14, 在本实施例中, 步骤 401、 404、 405分别与实施例二中的步骤 301、 303、 304相同, 在此不再赘述。
步骤 402: 根据描述词语库对用户输入信息进行匹配验证, 以判断用户使用高 级语法的第四概率。
其中所述描述词语库是通过对互联网数据进行挖掘,从所述互联网数据中提取 用户描述所述第三方信息发布平台的关键词, 并对关键词进行语义扩展后生成的。 例如, 微博平台通常被称为围脖, 可以由微博扩展出 "wb: " 或 "围脖: " 作为对 微博第三方信息发布平台的描述词语。 常用的数据挖掘方法包括: 神经网络方法、 遗传算法、 决策树方法、 粗集方法、 统计分析方法、 模糊集方法等, 由于这些都属 于现有技术, 在此不再进行详细描述。 所谓的高级语法, 指的是符合描述词语库写 法的信息输入方式, 例如输入信息为 "wb: 今天钓到一条很大的鱼" 就使用了高级 语法。
对用户输入信息进行匹配验证可以根据预先定义的策略进行。例如用户输入信 息的开头部分完全匹配描述词语库中的描述词语时, 得到第四概率为 0.9; 用户输 入信息的开头部分不完全匹配描述词语库中的描述词语时, 得到第四概率为 0.5 ; 用户输入信息的中间部分匹配描述词语库中的描述词语时, 得到第四概率为 0.3等 等。
更进一步的, 在第四概率大于阈值 X时, 步骤 402进一步包括步骤 4021 : 将 用户输入信息分解为内容部分与语法部分。 例如用户输入 "wb: 今天钓到一条很大 的鱼" , 得到第四概率为 0.9。 假设阈值 X为 0.5 , 由于第四概率大于阈值 X , 则步 骤 4021会将 "wb: 今天钓到一条很大的鱼" 分解为 "wb: " 和 "今天钓到一条很 大的鱼" , 其中 "wb: " 为语法部分, "今天钓到一条很大的鱼" 为内容部分。
步驟 403根据第四概率的不同,可以执行两条分支, 当第四概率不大于第一阈 值时, 步驟 403利用分类器模型对输入信息进行分析, 以得到输入信息具有信息发 布需求的第一概率; 当第四概率大于第一阈值时, 步骤 403'利用内容部分与第四概 率及分类器模型对输入信息进行分析, 以得到输入信息具有信息发布需求的第一概 率, 这是因为在第四概率比较大时, 把第四概率也作为计算第一概率的一个依据, 可以有效地提高第一概率的置信度。
图 15为本发明实施例中信息发布方法的实施例四的流程示意图。 请参考图 15 , 在本实施中, 步骤 501、 503、 5031、 504(504')、 505、 506与实施例三中的步驟 401、 402、 4021、 403(403')、 404、 405相同, 在此不再赘述。
步驟 502: 识别用户配置信息, 其中用户配置信息是根据用户的选择为系统配 置的默认的行为方式。
默认的行为模式包括默认为搜索或默认为发布,其中当系统被配置成默认为搜 索时, 系统只执行检索不执行发布; 当系统被配置成默认为发布时, 系统只执行发 布不执行检索。 默认为搜索或默认为发布可以进一步分为更细化的配置, 例如默认 为发布时, 又可以配置为每次接收到用户输入的信息, 都直接发布或显示发布信息 前的提示, 或者是配置为发布到某个特定的平台等等。
更进一步地, 当识别出系统具有用户配置信息时, 步驟 502 进一步包括步驟 5021 : 根据用户的配置信息对用户输入信息进行检索或发布。 在发布前还可对用户 进行提示以获取用户对所述提示的确认信息。 例如用户配置为对每次接收到的用户 输入信息, 都直接发布到新浪微博, 这说明用户发布信息的目的非常明确, 在这种 情况下, 只要按照用户配置进行相应的处理, 就能够很好地满足用户需要了, 因此 没有必要再执行其他的操作。
以上所述仅为本发明的较佳实施例而已, 并不用以限制本发明, 凡在本发明的 精神和原则之内, 所做的任何修改、 等同替换、 改进等, 均应包含在本发明保护的 范围之内。

Claims

权 利 要 求 书
1、 一种具有信息发布和搜索功能的系统, 其特征在于, 所述系统包括: 展示模块, 用于为用户提供所述系统的使用界面, 所述使用界面用于接收用户 的输入信息和将所述系统的处理结果展示给用户;
分类器构建模块, 用于根据线下挖掘的历史数据或第三方信息发布平台提供的 语料数据, 构建分类器模型;
信息分析模块, 用于根据所述分类器模型对所述输入信息进行分析, 输出所述 输入信息具有信息发布需求的第一概率, 所述第一概率用于描述所述输入信息基于 语义特征角度的具有所述信息发布需求的可能性;
综合决策模块, 用于根据所述第一概率决定是否对所述输入信息进行检索或发 布;
发布模块, 用于调用所述第三方信息发布平台的数据接口, 连接互联网以将所 述输入信息发布到所述第三方信息发布平台;
检索模块, 用于根据所述输入信息查询索引库, 并将查询结果返回给所述展示 模块。
2、 根据权利要求 1所述的系统, 其特征在于, 所述系统为搜索引擎系统。
3、 根据权利要求 1所述的系统, 其特征在于, 所述使用界面的形式包括 WEB 页面、 WAP页面、 带有搜索插件的浏览器与所述 WEB页面的结合、 或带有搜索插 件的浏览器与所述 WAP页面的结合。
4、 根据权利要求 3所述的系统, 其特征在于, 所述 WEB页面或所述 WAP页 面内包括搜索框、 地址栏、 输入法框或信息输入界面。
5、 根据权利要求 1所述的系统, 其特征在于, 所述第三方信息发布平台包括 微博平台、 社交网络平台、 论坛平台或电子公告平台。
6、 根据权利要求 1所述的系统, 其特征在于, 所述分类器模型是根据所述历 史数据或所述语料数据, 采用机器学习算法构建的。
7、 根据权利要求 1所述的系统, 其特征在于, 所述信息发布需求包括具体信 息发布需求或通用信息发布需求。
8、 根据权利要求 1所述的系统, 其特征在于, 所述系统在发布前由所述展示 模块对用户进行提示以获取用户对所述提示的确认信息。
9、 根据权利要求 8所述的系统, 其特征在于, 所述展示模块在对用户进行提 示时返回所述检索模块对所述输入信息的检索结果。
10、 根据权利要求 8所述的系统, 其特征在于, 所述提示包括多个关于所述第 三方信息发布平台的提示信息。
11、 根据权利要求 8所述的系统, 其特征在于, 所述确认信息包括对所述第三 方信息发布平台的选择信息或登录信息。
12、 根据权利要求 8所述的系统, 其特征在于, 所述展示模块在接收到用户对 所述提示的确认信息后, 所述发布模块对所述输入信息进行发布。
13、 根据权利要求 1所述的系统, 其特征在于, 所述发布模块进一步用于将所 述输入信息发布于多个所述第三方信息发布平台。
14、 根据权利要求 1所述的系统, 其特征在于, 所述系统进一步包括: 用户信 息获取模块, 用于获取用户在所述第三方信息发布平台中的用户信息, 以得到所述 输入信息具有所述信息发布需求的第二概率, 其中所述第二概率用于描述所述输入 信息基于用户信息特征角度的具有信息发布需求的可能性; 所述综合决策模块利用 所述第一概率与所述第二概率, 决定是否对所述输入信息进行检索或发布。
15、 根据权利要求 14所述的系统, 其特征在于, 所述用户信息包括用户的帐 号信息或用户的使用频率信息。
16、 根据权利要求 14所述的系统, 其特征在于, 所述用户信息的获取方式包 括检验用户在所述第三方信息发布平台的在线状态、 调用用户在所述第三方信息发 布平台的帐号的使用记录或接收用户在所述展示模块的输入。
17、 根据权利要求 1所述的系统, 其特征在于, 所述系统进一步包括: 用户行 为分析模块, 用于分析用户以往的行为, 以得到所述输入信息具有所述信息发布需 求的第三概率, 其中所述第三概率用于描述所述输入信息基于用户历史行为特征角 度的具有所述信息发布需求的可能性; 所述综合决策模块利用所述第一概率与所述 第三概率, 决定是否对所述输入信息进行检索或发布。
18、 根据权利要求 1所述的系统, 其特征在于, 所述系统进一步包括: 高级语法挖掘模块, 用于对互联网数据进行挖掘, 从所述互联网数据中提取用 户描述所述第三方信息发布平台的关键词, 并对所述关键词进行语义扩展, 生成针 对所述第三方信息发布平台的描述词语库;
高级语法匹配模块, 用于根据所述描述词语库对所述输入信息进行匹配验证, 以判断用户使用高级语法的第四概率, 当所述第四概率大于第一阈值时, 所述高级 语法匹配模块进一步将所述输入信息分解为内容部分与语法部分, 并将所述内容部 分与所述第四概率传递给所述信息分析模块, 当所述第四概率不大于第一阈值时, 所述高级语法匹配模块进一步将所述输入信息直接传递给所述信息分析模块; 所述 信息分析模块利用所述高级语法匹配模块传递的数据及所述分类器模型, 输出所述 第一概率。
19、 根据权利要求 1所述的系统, 其特征在于, 所述系统进一步包括: 用户配置模块, 用于根据用户的选择为所述系统配置默认的行为模式, 所述默 认的行为模式包括默认为搜索或默认为发布, 其中当系统被配置成所述默认为搜索 时, 系统只执行检索不执行发布, 系统被配置成所述默认为发布时, 系统只执行发 布不执行检索;
用户配置识别模块, 用于识别用户配置信息, 并根据所述用户配置信息对所述 输入信息进行检索或发布。
20、 根据权利要求 19所述的系统, 其特征在于, 所述系统在发布前由所述展 示模块对用户进行提示以获取用户对所述提示的确认信息。
21、 一种信息发布的方法, 其特征在于, 所述方法包括步驟:
a.接收用户输入信息;
b. 根据分类器模型对所述输入信息进行分析, 以得到所述输入信息具有信息 发布需求的第一概率, 所述第一概率用于描述所述输入信息基于语义特征角度的具 有所述信息发布需求的可能性;
c 根据所述第一概率对所述输入信息进行检索或将所述输入信息发布到第三 方信息发布平台。
22、 根据权利要求 21所述的方法, 其特征在于, 所述步骤 a中从搜索引擎的 使用界面接收用户输入信息。
23、 根据权利要求 22所述的方法, 其特征在于, 所述使用界面的具体形式包 括 WEB页面、 WAP页面、 带有搜索插件的浏览器与所述 WEB页面的结合、 或带 有搜索插件的浏览器与所述 WAP页面的结合。
24、 根据权利要求 23所述的方法, 其特征在于, 所述 WEB页面或所述 WAP 页面内包括搜索框、 地址栏、 输入法框或信息输入界面。
25、 根据权利要求 21所述的方法, 其特征在于, 所述第三方信息发布平台包 括微博平台、 社交网络平台、 论坛平台或电子公告平台。
26、 根据权利要求 21所述的方法, 其特征在于, 所述分类器模型是根据线下 挖掘的历史数据或所述第三方信息发布平台提供的语料数据, 采用机器学习算法构 建的。
27、 根据权利要求 21所述的方法, 其特征在于, 所述信息发布需求包括具体 信息发布需求或通用信息发布需求。
28、 根据权利要求 21所述的方法, 其特征在于, 所述步驟 c中, 在所述发布 前对用户进行提示以获取用户对所述提示的确认信息。
29、 根据权利要求 28所述的方法, 其特征在于, 在对用户进行所述提示时返 回对所述输入信息的检索结果。
30、 根据权利要求 28所述的方法, 其特征在于, 所述提示包括多个关于所述 第三方信息发布平台的提示信息。
31、 根据权利要求 28所述的方法, 其特征在于, 所述确认信息包括对所述第 三方信息发布平台的选择信息或登录信息。
32、 根据权利要求 28所述的方法, 其特征在于, 在接收到用户对所述提示的 确认信息后, 对所述输入信息进行发布。
33、 根据权利要求 21所述的方法, 其特征在于, 所述步驟 c中, 将所述输入 信息发布于多个所述第三方信息发布平台。
34、 根据权利要求 21所述的方法, 其特征在于, 所述方法在步骤 c前进一步 包括步驟: d. 获取用户在所述第三方信息发布平台中的用户信息, 以得到所述输入 信息具有所述信息发布需求的第二概率, 其中所述第二概率用于描述所述输入信息 基于用户信息特征角度的具有所述信息发布需求的可能性; 所述步骤 c中利用所述 第一概率与所述第二概率对所述输入信息进行检索或将所述输入信息发布到第三 方信息发布平台。
35、 根据权利要求 34所述的方法, 其特征在于, 所述用户信息包括用户的帐 号信息或用户的使用频率信息。
36、 根据权利要求 34所述的方法, 其特征在于, 所述步骤 d中用户信息的获 取方式包括检验用户在所述第三方信息发布平台的在线状态、 调用用户在所述第三 方信息发布平台的帐号的使用记录或接收用户在所述展示模块的输入。
37、 根据权利要求 21所述的方法, 其特征在于, 所述方法在步驟 c前进一步 包括步驟: e. 分析用户以往的行为, 以得到所述输入信息具有所述信息发布需求的 第三概率, 其中所述第三概率用于描述所述输入信息基于用户历史行为特征角度的 具有所述信息发布需求的可能性; 所述步骤 c中利用所述第一概率与所述第三概率 对所述输入信息进行检索或将所述输入信息发布到第三方信息发布平台。
38、 根据权利要求 21所述的方法, 其特征在于, 所述方法在步驟 b前进一步 包括步骤: f. 根据描述词语库对所述输入信息进行匹配验证, 以判断用户使用高级 语法的第四概率, 其中所述描述词语库是通过对互联网数据进行挖掘, 从所述互联 网数据中提取用户描述所述第三方信息发布平台的关键词, 并对所述关键词进行语 义扩展后生成的; 当所述第四概率大于第一阈值时, 将所述输入信息分解为内容部 分与语法部分, 所述步骤 b利用所述内容部分与所述第四概率及所述分类器模型得 到第一概率。
39、 根据权利要求 21所述的方法, 其特征在于, 所述方法在步骤 b前进一步 包括步骤: g. 识别用户配置信息, 其中所述用户配置信息是根据用户的选择为系统 配置的默认的行为模式, 所述默认的行为模式包括默认为搜索或默认为发布, 其中 当系统被配置成所述默认为搜索时, 系统只执行检索不执行发布, 系统被配置成所 述默认为发布时, 系统只执行发布不执行检索; 当识别出系统具有用户配置信息时, 根据所述用户配置信息对所述输入信息进行检索或发布。
40、 根据权利要求 39 所述的方法, 其特征在于, 在所述发布前对用户进行提 示以获取用户对所述提示的确认信息。
PCT/CN2011/083412 2011-03-18 2011-12-03 一种具有信息发布和搜索功能的系统及信息发布方法 WO2012126259A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2011100661354A CN102110170B (zh) 2011-03-18 2011-03-18 一种具有信息发布和搜索功能的系统及信息发布方法
CN201110066135.4 2011-03-18

Publications (1)

Publication Number Publication Date
WO2012126259A1 true WO2012126259A1 (zh) 2012-09-27

Family

ID=44174331

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/083412 WO2012126259A1 (zh) 2011-03-18 2011-12-03 一种具有信息发布和搜索功能的系统及信息发布方法

Country Status (2)

Country Link
CN (1) CN102110170B (zh)
WO (1) WO2012126259A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617278A (zh) * 2013-12-09 2014-03-05 北京奇虎科技有限公司 一种地址栏搜索的控制方法及装置
CN104375996A (zh) * 2013-08-13 2015-02-25 苏州广海信息科技有限公司 一种查询分析系统

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102110170B (zh) * 2011-03-18 2013-07-31 北京百度网讯科技有限公司 一种具有信息发布和搜索功能的系统及信息发布方法
CN103051514B (zh) * 2011-10-14 2016-08-03 腾讯科技(深圳)有限公司 一种个人动态信息发布方法及装置、系统
CN103065027B (zh) * 2011-10-19 2017-02-22 腾讯科技(深圳)有限公司 一种提供给第三方sns网页游戏的留言方法及装置
CN102591475B (zh) * 2011-12-29 2016-01-13 北京百度网讯科技有限公司 一种在线编辑器的内容输入方法及系统
JP2013214133A (ja) * 2012-03-30 2013-10-17 Sony Corp 情報処理装置、情報処理方法及びプログラム
CN103389989B (zh) * 2012-05-10 2016-03-09 腾讯科技(深圳)有限公司 一种跨社区搜索方法和装置
WO2014012452A1 (zh) * 2012-07-16 2014-01-23 He Jin 用于微博服务系统的方法及微博服务系统、在微博系统中发布包括若干微博信息的微博信息组合的方法及微博系统
CN103955458B (zh) * 2012-07-16 2019-02-19 华为技术有限公司 发布信息集合的方法和新微博系统
CN103092956B (zh) * 2013-01-17 2016-02-10 上海交通大学 社交网络平台上话题关键词自适应扩充的方法及系统
CN104461247B (zh) * 2014-12-12 2017-10-03 百度在线网络技术(北京)有限公司 通信方法和装置
CN106815224A (zh) * 2015-11-27 2017-06-09 大唐软件技术股份有限公司 服务获取方法和装置
CN108257600B (zh) * 2016-12-29 2020-12-22 中国移动通信集团浙江有限公司 语音处理方法和装置
CN108074077A (zh) * 2017-12-26 2018-05-25 文盈(广州)科技咨询有限公司 一种科技信息咨询服务管理系统
CN110991551B (zh) * 2019-12-13 2023-09-15 北京百度网讯科技有限公司 样本处理方法、装置、电子设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308507A (zh) * 2008-06-06 2008-11-19 北京九城网络软件有限公司 互联网信息发布和搜索方法
CN101331475A (zh) * 2005-12-14 2008-12-24 微软公司 在线商业意图的自动检测
CN102110170A (zh) * 2011-03-18 2011-06-29 北京百度网讯科技有限公司 一种具有信息发布和搜索功能的系统及信息发布方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075320A (zh) * 2006-05-16 2007-11-21 申凌 信息发布、查询系统和方法
CN100552680C (zh) * 2007-05-17 2009-10-21 腾讯科技(深圳)有限公司 一种知识共享系统及问题搜索方法、问题发布方法
US7984004B2 (en) * 2008-01-17 2011-07-19 Microsoft Corporation Query suggestion generation
CN101520784B (zh) * 2008-02-29 2011-09-28 富士通株式会社 信息发布系统和信息发布方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101331475A (zh) * 2005-12-14 2008-12-24 微软公司 在线商业意图的自动检测
CN101308507A (zh) * 2008-06-06 2008-11-19 北京九城网络软件有限公司 互联网信息发布和搜索方法
CN102110170A (zh) * 2011-03-18 2011-06-29 北京百度网讯科技有限公司 一种具有信息发布和搜索功能的系统及信息发布方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104375996A (zh) * 2013-08-13 2015-02-25 苏州广海信息科技有限公司 一种查询分析系统
CN103617278A (zh) * 2013-12-09 2014-03-05 北京奇虎科技有限公司 一种地址栏搜索的控制方法及装置

Also Published As

Publication number Publication date
CN102110170B (zh) 2013-07-31
CN102110170A (zh) 2011-06-29

Similar Documents

Publication Publication Date Title
WO2012126259A1 (zh) 一种具有信息发布和搜索功能的系统及信息发布方法
Poongodi et al. Chat-bot-based natural language interface for blogs and information networks
US10552544B2 (en) Methods and systems of automated assistant implementation and management
Mishra et al. Analyzing machine learning enabled fake news detection techniques for diversified datasets
Zhao et al. Cyberbullying detection based on semantic-enhanced marginalized denoising auto-encoder
Hu et al. Social spammer detection with sentiment information
Ratkiewicz et al. Detecting and tracking political abuse in social media
Ratkiewicz et al. Detecting and tracking the spread of astroturf memes in microblog streams
US9565305B2 (en) Methods and systems of an automated answering system
CN113055386B (zh) 一种攻击组织的识别分析方法和装置
Chatzakou et al. Detecting variation of emotions in online activities
US11488599B2 (en) Session message processing with generating responses based on node relationships within knowledge graphs
CN110298029B (zh) 基于用户语料的好友推荐方法、装置、设备及介质
Wang et al. Mining multi-aspect reflection of news events in twitter: Discovery, linking and presentation
KR20150096295A (ko) 문답 데이터베이스 구축 시스템 및 방법, 그리고 이를 이용한 검색 시스템 및 방법
CN106462807A (zh) 根据大规模非结构化数据学习多媒体语义
CN108268450B (zh) 用于生成信息的方法和装置
Paul et al. Focused domain contextual AI chatbot framework for resource poor languages
CN112989208B (zh) 一种信息推荐方法、装置、电子设备及存储介质
CN107656997A (zh) 自然语言处理方法、装置、存储介质及终端设备
Shao et al. Automated Twitter author clustering with unsupervised learning for social media forensics
Vu et al. Ask, don't search: A social help engine for online social network mobile users
Jin et al. Filtering spam in Weibo using ensemble imbalanced classification and knowledge expansion
CN112492606B (zh) 垃圾短信的分类识别方法、装置、计算机设备及存储介质
CN114186040A (zh) 一种智能机器人客服的运作方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11861813

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11861813

Country of ref document: EP

Kind code of ref document: A1