CN110580313A - Data processing method and device and data processing device - Google Patents

Data processing method and device and data processing device Download PDF

Info

Publication number
CN110580313A
CN110580313A CN201810589724.2A CN201810589724A CN110580313A CN 110580313 A CN110580313 A CN 110580313A CN 201810589724 A CN201810589724 A CN 201810589724A CN 110580313 A CN110580313 A CN 110580313A
Authority
CN
China
Prior art keywords
answer
question
search
search result
answer information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810589724.2A
Other languages
Chinese (zh)
Other versions
CN110580313B (en
Inventor
姚婷
梁素维
周梦瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201810589724.2A priority Critical patent/CN110580313B/en
Publication of CN110580313A publication Critical patent/CN110580313A/en
Application granted granted Critical
Publication of CN110580313B publication Critical patent/CN110580313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment of the invention provides a data processing method and device and a device for data processing. The method specifically comprises the following steps: determining question and answer intentions corresponding to the search terms; determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word; and displaying the answer information contained in the landing page in a search result item corresponding to the search word. The embodiment of the invention can shorten the operation path of the user and improve the information acquisition efficiency of the user.

Description

Data processing method and device and data processing device
Technical Field
the present invention relates to the field of search technologies, and in particular, to a data processing method and apparatus, and an apparatus for data processing.
Background
at present, the amount of information brought by the development of the internet is increased, so that users increasingly rely on search engines when screening information. The search engine is a system that collects information from the internet by using a specific computer program according to a certain policy, provides a search service for a user after organizing and processing the information, and displays information related to user search to the user.
In the process of using the search engine, a user can input a keyword in a search box provided by the search engine, the search engine queries to obtain a webpage or a document matched with the keyword so as to obtain a search result item, and the ranked search result item is returned to the user by utilizing a certain ranking strategy.
Current search result items typically include title information, links to pages, and summary information, which is used to generally describe the pages to which the search result item corresponds. The search result item can enable a user to judge whether the page corresponding to the search result item contains information required by the user, if so, the user can click the search result item and enter the corresponding page, and the required information is searched from the entered page, namely, the user can obtain the required information only by clicking the search result item, so that the operation path of the user is longer, and the information obtaining efficiency of the user is lower.
disclosure of Invention
Embodiments of the present invention provide a data processing method and apparatus, and an apparatus for data processing, which can shorten an operation path of a user and improve information acquisition efficiency of the user.
In order to solve the above problem, an embodiment of the present invention discloses a data processing method, including:
determining question and answer intentions corresponding to the search terms;
Determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word;
and displaying the answer information contained in the landing page in a search result item corresponding to the search word.
on the other hand, the embodiment of the invention discloses a data processing device, which comprises:
the question-answer intention determining module is used for determining the question-answer intention corresponding to the search word;
The answer information determining module is used for determining answer information matched with the question and answer intentions from the landing page of the search result item corresponding to the search word; and
And the answer information display module is used for displaying the answer information contained in the landing page in the search result item corresponding to the search word.
in yet another aspect, an embodiment of the present invention discloses an apparatus for data processing, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors includes instructions for:
determining question and answer intentions corresponding to the search terms;
determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word;
And displaying the answer information contained in the landing page in a search result item corresponding to the search word.
In yet another aspect, an embodiment of the invention discloses a machine-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of the preceding.
The embodiment of the invention has the following advantages:
According to the embodiment of the invention, the answer information contained in the landing page of the search result item is displayed in the search result item, and the answer information can be matched with the question and answer intentions corresponding to the search words, so that the answer information can meet the information requirements of users; therefore, the embodiment of the invention directly displays the answer information meeting the information requirement of the user in the search result item, and can enable the user to obtain the required information without page jump, thereby shortening the operation path of the user and improving the information obtaining efficiency of the user.
drawings
in order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a schematic representation of an application environment for a data processing method of an embodiment of the present invention;
FIG. 2 is a flow chart of the steps of one data processing method embodiment of the present invention;
FIG. 3 is an illustration of one embodiment of the invention showing search result items;
FIG. 4 is an illustration of one embodiment of the invention showing search result items;
FIG. 5 is an illustration of one embodiment of the invention showing answer information in a search result item;
FIG. 6 is an illustration of displaying answer information in a search result item, in accordance with an embodiment of the present invention;
FIG. 7 is a block diagram of an embodiment of a data processing apparatus of the present invention;
FIG. 8 is a block diagram of an apparatus 800 for data processing of the present invention; and
Fig. 9 is a schematic diagram of a server in some embodiments of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the technical field of Search, a SERP (Search engine results page) may be provided to a user according to a Search term input by the user, where the SERP refers to a result page fed back by a Search engine to a Search request. Typically a typical SERP contains a list of search result items. The page corresponding to a search result item may be referred to as a landing page, which refers to the first page that is linked to after clicking on the search result item, and may be a web page or a document.
Current search result items typically include: title, link, abstract and other information corresponding to the landing page. The title and the abstract enable a user to judge whether the landing page contains information required by the user, and if yes, the user can click the link and enter the corresponding landing page to search the required information from the entered landing page. However, the user needs to click on the search result item to obtain the required information, which results in a long operation path for the user and thus a low information obtaining efficiency for the user.
The embodiment of the invention provides a data processing scheme which can determine question and answer intentions corresponding to search words; determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word; and displaying answer information contained in the landing page in a search result item corresponding to the search word.
According to the embodiment of the invention, the answer information contained in the landing page is displayed in the search result item, and the answer information can be matched with the question and answer intentions corresponding to the search words, so that the answer information can meet the information requirements of the user; therefore, the embodiment of the invention directly displays the answer information meeting the information requirement of the user in the search result item, and can enable the user to obtain the required information without page jump, thereby shortening the operation path of the user and improving the information obtaining efficiency of the user.
In an example of the embodiment of the present invention, assuming that a search term is "what is included in five-risk one-fund", it may be determined that a question-answering intention corresponding to the search term is "a composition of five-risk one-fund", and answer information matching the question-answering intention is determined from a landing page of a certain search result item corresponding to the search term. Examples of the answer information may include: "five insurance" says five insurance, including endowment insurance, medical insurance, unemployment insurance, industrial injury insurance and fertility insurance; "A fund" refers to a house accumulation fund … ".
In another example of the present invention, assuming that a search term is "how to deal with five risks and one fund", it may be determined that a question-answer intention corresponding to the search term is "a payment proportion of five risks and one fund", and answer information matching the question-answer intention is determined from a landing page of a certain search result item corresponding to the search term. Examples of the answer information may include: the payment proportion of five insurance and one deposit is different according to different places, different properties of units and units, and different payment proportions, and the payment proportions of various insurance and accumulation deposits are shown below. Medical insurance: 2% of individuals (unit 8%); and (4) endowment with insurance: 8% of individuals (unit 20%); insurance for unemployment: 1% per person (unit 2%); insurance of industrial injury: no person (unit 0.5%); birth insurance: no person (unit 0.8%); housing accumulation fund: 7% -12% of the individual (the proportion of each unit is different, and the unit payment amount is the same as the individual payment amount). …'.
In still another example of the present invention, assuming that a search term is "what is useful in five-risk one-gold", it may be determined that a question-and-answer intention corresponding to the search term is "use in five-risk one-gold", and answer information matching the question-and-answer intention is determined from a landing page of a certain search result item corresponding to the search term. Examples of the answer information may include: the five-insurance one-money represents the experience of work of people in the city in a certain sense, in some special periods, the examination and verification of a plurality of qualifications are based on social insurance payment, so that the five-insurance one-money is not enough to be seen, the longer the five-insurance one-money is, the better the five-insurance one-money is, for example, the current house purchasing limit is based on two-year continuous social insurance payment, even if the subject study has priority, the social insurance payment condition needs to be referred, and the five-insurance one-money is very important. …'
the data processing method provided by the embodiment of the invention can be applied to Application environments such as websites and/or APPs (Application programs) to shorten the operation path of a user and improve the information acquisition efficiency of the user.
The data processing method provided by the embodiment of the invention can be applied to the application environment shown in FIG. 1, such as
as shown in fig. 1, the client 100 and the server 200 are located in a wired or wireless network, through which the client 100 and the server 200 perform data interaction.
In one embodiment of the invention, the client 100 may receive a user's search term and send the search term to the server 200. The server 200 may perform a search according to the search term to obtain a search result corresponding to the search term. The search result may be a web page or a document, and the pages corresponding to the search result may be collectively referred to as a landing page. The search result may be derived from a data source such as a database of a search engine, a database of a vertical website, and the like, and it is understood that the specific source of the search result is not limited by the embodiment of the present invention.
According to one embodiment, the server 200 may determine a question and answer intention corresponding to a search term, determine answer information matching the question and answer intention from a landing page of a search result item corresponding to the search term, and send the search result item to the client 100, where the search result item may include: the landing page of the search result item contains answer information. And the client 100 may display answer information included in the landing page of the search result item in the search result item corresponding to the search word.
According to an embodiment, the server 200 may send the search result to the client 100, and the client 100 may display answer information included in the landing page in the search result item corresponding to the search term by performing the data processing method according to the embodiment of the present invention.
optionally, the client 100 may run on a terminal, which specifically includes but is not limited to: smart phones, tablet computers, electronic book readers, MP3 (Moving Picture experts Group Audio Layer III) players, MP4 (Moving Picture experts Group Audio Layer IV) players, laptop portable computers, car-mounted computers, desktop computers, set-top boxes, smart televisions, wearable devices, and the like.
Method embodiment
Referring to fig. 2, a flowchart illustrating steps of an embodiment of a data processing method according to the present invention is shown, which may specifically include the following steps:
Step 201, determining a question-answer intention corresponding to a search word;
Step 202, determining answer information matched with the question and answer intention from a landing page of a search result item corresponding to the search word;
And step 203, displaying the answer information contained in the landing page in the search result item corresponding to the search word.
At least one step of the embodiment shown in fig. 2 may be performed by a server and/or a client, and of course, the embodiment of the present invention does not limit the specific execution subject of each step.
in step 201, a client of the search APP or the search website may provide a UI (User Interface) so that a User submits a search term to the client through a search box, a voice Interface, and the like on the UI. It is understood that the embodiment of the present invention does not limit the specific submission manner of the search term.
The question-answer intention can refer to an intention corresponding to a question-answer requirement expressed by the search word, and the question-answer requirement refers to that the search word needs to obtain an answer.
in an optional embodiment of the present invention, step 201 may first determine whether the search term corresponds to a question-answering requirement, and if so, determine a question-answering intention corresponding to the search term. For example, the search term "how to deal with a lost bus card" corresponds to an obvious question and answer requirement, the search term "efficacy and effect of liuwei dihuang pills" corresponds to an implicit question and answer requirement, and the search term "star xxx" does not correspond to a question and answer requirement. In practical application, methods such as syntactic analysis and syntactic analysis can be used for judging whether the search word corresponds to the question and answer requirement, and the specific method for judging whether the search word corresponds to the question and answer requirement is not limited in the embodiment of the invention.
the embodiment of the invention can provide the following determination scheme of the question-answering intention:
determination of scheme 1,
In the determination scheme 1, the process of determining the question-answering intention corresponding to the search term in step 201 may include: identifying a current LAT (Lexical Answer Type) directive word from the search words; searching in a mapping relation between a preset LAT directional word and an LAT word according to the current LAT directional word to obtain a target LAT word corresponding to the current LAT directional word; and obtaining the question-answer intention corresponding to the search word according to the target LAT word.
The determining scheme 1 can firstly identify a current LAT directional word contained in a search word, and then obtain a target LAT word corresponding to the current LAT directional word according to a mapping relation between the LAT directional word and the LAT word; the LAT directional words can be used for expressing words with directivity to the question-answering intention, and the LAT words can be used for representing the question-answering intention. In this way, the mapping relationship between the LAT directional words and the LAT words can describe the directional relationship from the LAT directional words to the LAT words corresponding to the question-answer intention.
according to the embodiment of the invention, a target LAT word corresponding to the current LAT directional word is obtained according to the mapping relation between the LAT directional word and the LAT word; because the target LAT word is obtained through derivation of the current LAT directional word, even if the target LAT word is not contained in the search word, the question-answering intention corresponding to the search word can still be obtained through derivation. Therefore, the embodiment of the invention can obtain the question-answer intention corresponding to the search word according to the deduced question-answer intention under the condition that the search word does not carry complete question-answer requirements, thereby improving the accuracy of the answer intention.
LAT words, which may be used to represent text in a question that indicates the type of answer. Alternatively, a large number of questions may be collected and analyzed statistically to build a bank of LATs that can be used to store LAT words. For example, the LAT words stored in the LAT bank may include: emperor, island, mountain peak, event, country, flower, river, etc. It is to be understood that embodiments of the present invention are not limited to the specific LAT words.
Optionally, the complete problem can be analyzed, LAT directional words are mined out according to corresponding analysis results, and the mined LAT directional words are stored in an LAT directional word bank; and establishing a mapping relation between the LAT directional words and the LAT words. Referring to table 1, an illustration of a mapping relationship between LAT directional words and LAT words of the present invention is shown. It is understood that the LAT words shown in table 1 are only examples, and actually, the LAT words such as "person" may also be subdivided into "emperor", "scientist", "poetry", "physicist", etc., and it is understood that the LAT words of the embodiment of the present invention may be any entity type and/or entity words corresponding to any entity type, and the mapping relationship between specific LAT directional words and LAT words is not limited by the embodiment of the present invention.
TABLE 1
In practical applications, the above process of identifying the current LAT directional word from the search words may include: and matching each vocabulary contained in the search word with each LAT directional word in the LAT directional word stock, and if the matching is successful, taking the successfully matched vocabulary contained in the search word as the current LAT directional word. It is to be understood that the specific process of identifying the current LAT directional word from the search words is not limited by the embodiments of the present invention.
In an application example 1 of the present invention, if the search word is "known as" and if "known as" exists in the LAT directional dictionary, the "known" target LAT word "person and/or thing" can be obtained from the lookup table 1. Further, assuming that the search word is "known as the parent of CD", and assuming that "the parent" exists in the LAT directional word stock, the target LAT word "person" corresponding to "the parent" can be obtained from the look-up table 1, and finally it can be determined that "the parent known as CD" corresponds to the question-answer intention "person". Similarly, assuming that the search word is "honored as physical holy sword", it can be determined that the corresponding question and answer intention is "weapon".
In an application example 2 of the present invention, assuming that "location" exists in the LAT directional lexicon when the search word is "location of taj tom, the" location "of the corresponding target LAT word" geographic location "can be obtained by looking up table 1.
In an application example 3 of the present invention, assuming that the search word is "proposed of mass-energy equation", if "proposed" exists in the LAT directional word stock, the target LAT word "person" corresponding to "proposed" can be obtained by the lookup table 1.
in an application example 3 of the present invention, assuming that the search word is "five-risk one-gold meaning", and assuming that "what meaning" exists in the LAT directional word stock, "a target LAT word" concept "corresponding to" five-risk one-gold meaning "can be obtained by looking up table 1.
Since the target LAT word may be used as a core word or a focus word of a question corresponding to a search word, and may reflect an answer type of the question corresponding to the search word, the target LAT word may be directly used as a question-answering intention corresponding to the search word, or the target LAT word may be further processed (e.g., a fusion process of a plurality of target LAT words, etc.) to obtain a question-answering intention corresponding to the search word.
determination of scheme 2,
in the determination scheme 2, the process of determining the question-answering intention corresponding to the search term in step 201 may include: performing dependency syntax analysis on the search terms to obtain corresponding dependency syntax analysis results; extracting core semantic units from the dependency syntax analysis result; and obtaining the question-answer intention corresponding to the search word according to the core semantic unit.
Determining scheme 2, extracting a core semantic unit from a dependency syntax analysis result corresponding to a search word, and obtaining a question-answer intention corresponding to the search word according to the core semantic unit; the core semantic unit for characterizing the question-answering intention may include: core words, etc.
In practical applications, the dependency parsing result may include: the dependency tree can be used for representing the dependency relationship among words included in the search word, analyzing the dependency tree, and extracting the core semantic unit from the dependency tree according to the analysis result.
in practical application, the dependency tree may be analyzed according to a preset extraction rule, and the core semantic unit may be extracted from the dependency tree according to the analysis result.
optionally, the extracting the core semantic unit from the dependency syntax analysis result may include: and if the words immediately after the query words in the dependency tree are nouns or noun phrases, extracting the nouns or noun phrases as the core semantic unit. If a query word is followed by a noun or noun phrase in the dependency tree, then the noun or noun phrase may be the core semantic unit described above for characterizing the intent of the question-answering. For example, the search term "which scientist has helped kosher to escape from germany" which questioning word "is followed by the term" scientist ", so" scientist "can be taken as the core semantic unit.
optionally, the extracting the core semantic unit from the dependency syntax analysis result may include: and if the query word in the dependency tree is at the end of the search word, extracting a noun or noun phrase closest to the query word as the core semantic unit. If the query word is at the end of the search word, the noun or noun phrase closest to the query word can be the core semantic unit that characterizes the question-answering intention. For example, suppose the search term is "what is known as the parent of CD", and its corresponding search term includes: "is known as the father of the CD", the noun phrase closest to the query word "which" in the search word is the father of the CD ", and the" father of the CD "can be used as the core semantic unit.
optionally, the extracting the core semantic unit from the dependency syntax analysis result may include: and if the word next to the question word in the dependency tree is a verb, extracting the last noun or noun phrase in the search word as the core semantic unit. If a question word is followed by a verb, the last appearing noun or noun phrase in the search term may be the core semantic unit that characterizes the intent of the question-answer. For example, assuming that the search term is "how to fold a paper plane", the query term "how" in the search term follows the verb "to fold", the last appearing noun phrase "paper plane" can be used as a core semantic unit. For another example, assuming that the search term is "how to download the complete content of the hundred-degree library file free of charge", and the query term "how" follows the verb "to download", the last appearing noun phrase "complete content of the hundred-degree library file" may be used as the core semantic unit.
It should be understood that the preset extraction rule is only an alternative embodiment, and the embodiment of the present invention does not limit the specific extraction rule. Since the core semantic unit may be used as a core word or a focus word of the search word, and may reflect an answer type of the search word, the core semantic unit may be directly used as a question-answer intention corresponding to the search word, or the core semantic unit may be further processed (such as fusion processing of a plurality of core semantic units, etc.) to obtain a question-answer intention corresponding to the search word.
Determination of protocol 3
in the determination scheme 3, the process of determining the question-answering intention corresponding to the search term in step 201 may include: and performing intention recognition on the search words of the question-answer pairs through a field recognition module and a field intention recognition module.
The domain identification module can be used for identifying the domain to which the search word belongs; examples of fields may include: "olympic sports", "geographic problems", "computer digital", "laws and regulations", "life", "education science", "economic finance", "emotional family", "social life", "leisure and entertainment", "medical health", "artistic words", "games", etc., although the embodiment of the present invention is not limited to specific fields.
the domain intention recognition module can be used for recognizing question and answer intentions corresponding to the search words in the domain.
according to one embodiment, the intention recognition can be regarded as a multi-classification task, so that the domain intention recognition module can recognize question and answer intentions corresponding to the search words in the domain through the classifier. The classification tasks of the classifier can be obtained according to a plurality of question and answer intentions (one question and answer intention can correspond to one classification task), and the training samples of the classifier are obtained according to the question and answer input corresponding to the plurality of question and answer intentions, for example, the training samples can be question corpus and labeled question and answer intention categories, and the training samples are trained on the basis to obtain the classifier.
According to another embodiment, the domain intention identifying module may identify a question-answer intention corresponding to the search term in the domain through a question-answer intention sentence pattern, where the question-answer intention sentence pattern may be used to represent a sentence pattern corresponding to the question-answer intention in the domain, and may include at least one keyword, and the at least one keyword may conform to a corresponding grammar rule; in this way, the question-answering intention pattern corresponding to the search word can be obtained based on the matching between the search word and the question-answering intention sentence pattern. For example, the question-answer intent statement patterns may include: the question and answer intention in this case may be "an operation (verb) scheme corresponding to a noun". For another example, if the end of the question-and-answer intent sentence pattern is "how to do", then the question-and-answer intent can be "solution to problem"
it can be understood that, according to the actual application requirements, a person skilled in the art may adopt any one or a combination of the above determination schemes 1 to 3, wherein in step 201, the question-answering intention corresponding to the search term is determined, and the embodiment of the present invention does not limit a specific process of determining the question-answering intention corresponding to the search term in step 201.
step 202 may determine answer information matched with the question-answer intention from the landing page of the search result item corresponding to the search term, where the answer information may be information included in the landing page of the search result item, and the answer information may be extracted from the landing page in the embodiment of the present invention.
In an optional embodiment of the present invention, in an offline state, question-answer pairs may be mined from a web page in advance, and in an online state, answer information that matches with a question-answer intention and is included in a landing page of a search result item is determined by querying the question-answer pairs. Specifically, the step 202 determines answer information matched with the question-answer intention from the landing page of the search result item corresponding to the search term, and specifically includes: searching question-answer pairs corresponding to landing pages of the search result items according to the question-answer intentions to obtain answer information matched with the question-answer intentions; the question-answer pair may include: questions and answers. Because the question-answer pairs are mined in the off-line state and the answer information is determined by inquiring the question-answer pairs in the on-line state, the determination efficiency of the answer information can be improved.
the embodiment of the invention does not limit the specific mining mode adopted for mining the question and answer pairs from the webpage. For example, the excavation method may include: a manual excavation mode, an automatic excavation mode and the like, wherein the automatic excavation mode can comprise the following steps: and an extraction template mining mode can be configured, and the extraction template can specify question sentences and connection words between the question sentences and answers, so that question and answer pairs can be extracted. However, the manual digging method needs to spend more labor cost; the extraction template mining method needs to rely on an effective extraction template, and if the extraction template is not matched with a certain text segment of a webpage, a question-answer pair may not be extracted from the text segment.
In an optional embodiment of the present invention, the question-answer pair corresponding to the landing page of the search result item may be extracted from the landing page of the search result item according to a page structure of the landing page of the search result item. The page structure can refer to the layout of page content, question and answer pairs are extracted according to the page structure, and the extraction is not limited by limited extraction templates, so that the coverage rate of the question and answer pairs can be improved.
In practical applications, the page structure may be determined by the page source code. The page source code may refer to the source code of the page, which may represent the language makeup of the page. Alternatively, the page structure may be characterized by page elements or tags. Alternatively, the page structure may be characterized by a DOM (Document Object Model) tree, and it should be understood that the specific way of characterizing the page structure is not limited in the embodiment of the present invention.
The computer language corresponding to the page code mainly comprises: HTML (Hypertext Markup Language), vb (visual basic) Language, JAVA Language, and the like. Among them, HTML is the most common and basic language, and is an indispensable language in a page. The setting of page elements such as title, frame, background, font, hyperlink, color, etc. of a page may be done by the HTML language. Of course, the embodiment of the present invention does not limit the specific computer language corresponding to the page code.
The page source code is actually a page file made up of a large number of various page elements, and the browser can usually run the page file directly, for example, as an HTML file. The page elements may serve as basic objects constituting the pagefile. The page elements may be defined by tags.
Tags are used to tag HTML elements. The text located between the start tab and the end tab may serve as the content of the page element. In one example, tags may be < head > (used to define information about a document), < body > (used to define the body of a document), < table > (used to define tables), < div > (sections), etc. objects enclosed by angle brackets "<" and ">", and some tags may appear in pairs, such as < table > </table >, < form > </form >, where < form > is used to define an HTML form for user input. Of course, there are also tags that do not appear in pairs, such as < br >, < hr >, etc., where < br > is used to define simple break lines and < hr > is used to define horizontal lines. The label and the page element have the corresponding relation, so that the page element can be represented by the label.
The page elements also correspond to attributes. Attributes are used to provide additional information for page elements. The attribute may be present in the form of a name-value pair such as "attribute name-attribute value", and the attribute may be defined in a start tag of a page element.
In an optional embodiment of the present invention, the process of extracting and obtaining question-answer pairs from the landing pages of the search result items according to the page structure of the landing pages of the search result items may specifically include: clustering text segments included in the webpage according to a page structure of the webpage to obtain a text segment category; determining candidate problems corresponding to the text segment categories; and extracting answer information corresponding to the candidate question from the text segment corresponding to the candidate question.
the embodiment of the invention can obtain the page structure, such as label information, corresponding to each text segment of the webpage based on the analysis of the page structure. And clustering the plurality of text segments according to the page structure of the text segments, wherein the clustering can be used for aggregating the text segments with similar page structures into the same text segment category.
The manner of determining the candidate question corresponding to the text segment category may include: template feature mode, and/or rule scoring mode. The template features may correspond to problematic features, such as word features, sentence features, or phrase features; a rule scoring approach may be used to evaluate candidate questions. Alternatively, the connection relationship between the vocabularies corresponding to the candidate questions can be scored according to the language model. According to one embodiment, if a text segment class includes a language unit (word, phrase or sentence) that matches the template feature and the corresponding score exceeds the score threshold, the language unit may be regarded as a candidate question.
Taking a webpage A with URL (Uniform Resource Locator) as http:// www.66law.cn/specific/wxyj as an example, the method can cluster text segments included in the webpage according to the page structure of the webpage to obtain text segment categories, and determine the following candidate problems corresponding to the text segment categories: the method includes the following steps of 'what the five-risk one-money includes', 'what the five-risk one-money uses', 'the five-risk one-money changes into the four-risk one-money latest message', 'the five-risk one-money minimum standard', 'the five-risk one-money payment proportion', 'how to handle the five-risk one-money after leaving', 'how to handle the five-risk one-money by oneself', 'the consequence of not paying the five-risk one-money', 'how to handle the five-risk one-money by a company', 'no provision is made for the law for signing the five-risk one-money in a labor contract', and the like.
Optionally, the type of the answer information may include a step type, for example, an answer information corresponding to a webpage B with https:// zhinan. sougu.com/guide/detail/? id 316512868864 is a step type.
in an optional embodiment of the present invention, the method of the embodiment of the present invention may further include: extracting candidate question-answer pairs from the landing pages of the search result items according to the page structures of the landing pages of the search result items; and filtering the candidate question-answer pairs according to the attribute information of the candidate question-answer pairs. The embodiment of the invention can filter the candidate question-answer pairs according to the attribute information, can remove the candidate question-answer pairs which do not accord with the preset condition through the filtering, and can reserve the candidate question-answer pairs which accord with the preset condition so as to improve the quality of the question-answer pairs.
Wherein the attribute information may include: semantic representation information and quality information.
the semantic representation information may be used to determine the similarity between candidate question-answer pairs, so that candidate question-answer pairs with higher similarity may be filtered out.
Alternatively, the semantic representation information may be obtained by performing semantic analysis on the candidate question-answer pairs. Semantic analysis methods that may be employed may include: a topic model method, a deep learning method, and the like. The topic model method may include: LDA (document theme generation model), etc. The deep learning method may include: word embedding (word embedding), Recurrent Neural Network (Recurrent Neural Network), convolutional Neural Network (Recurrent Neural Network), etc.
the quality information may reflect the quality of the candidate question-answer pairs, so that candidate question-answer pairs with poor quality may be filtered out, and candidate question-answer pairs with better quality may be retained.
The quality information may include: and the candidate question-answer pairs correspond to the page quality information and/or the site quality information. Through the quality information, data which do not relate to questions and answers or data which are not asked for answers can be removed, and data which are clear in questions and answers, relatively related in answers and relatively credible in source are reserved.
It is understood that the manner of determining answer information by querying question-answer pairs in step 202 is only an optional embodiment, and in fact, the step 202 may also extract answer information matching with question-answer intentions from the text of the landing page of the search result item by using a text extraction technique, and the embodiment of the present invention does not impose any limitation on the specific process of determining answer information matching with question-answer intentions from the landing page of the search result item corresponding to the search term in step 202.
step 203 displays answer information contained in the landing page of the search result item in the search result item corresponding to the search word.
Traditional search result items play the role of: and enabling the user to judge whether the landing page of the search result item contains the information required by the user. Thus, the structure of a conventional search result item typically includes: and searching information such as titles, links, abstracts and the like corresponding to the landing pages of the result items.
The embodiment of the invention improves the structure of the search result item, and sets the answer information contained in the landing page of the search result item, so that the answer information contained in the landing page of the search result item can be preposed to shorten the operation path of the user, and the user can obtain the required information without page jump.
In practical applications, the search result items may be displayed in a search term feedback page, and generally, one search term feedback page may display N search result items, where N is a natural number.
Optionally, a tag corresponding to answer information may be set in the search result item, for example, the tag may carry "choiceness" text to prompt the user to obtain the required answer information through the content of the tag.
In an embodiment of the present invention, the step 203 may specifically include, in the search result item corresponding to the search term, a display manner used for displaying answer information included in the landing page of the search result item, where the display manner includes:
Displaying mode 1, if the length of answer information contained in the landing page of the search result item does not exceed a length threshold, displaying all answer information contained in the landing page of the search result item in the search result item corresponding to the search word; or
and 2, if the length of the answer information contained in the landing page of the search result item exceeds a length threshold, displaying a part of the answer information contained in the landing page of the search result item in the search result item corresponding to the search word, and displaying an expansion interface so that the user can view all the answer information through the expansion interface.
The length threshold may be used to constrain the length of characters corresponding to answer information for the search result items. The length threshold may be determined by those skilled in the art according to the actual application requirements, for example, the above character length may be determined according to the page area occupied by the search result item.
in the display mode 1, when the length of answer information included in the landing page of the search result item does not exceed the length threshold, all of the answer information may be displayed.
In the display mode 2, when the length of answer information included in the landing page of the search result item exceeds the length threshold, a part of the answer information may be displayed to save a page area. And, an expansion interface may also be displayed so that the user views the entirety of the answer information through the expansion interface.
Referring to FIG. 3, an illustration of displaying search result items according to an embodiment of the present invention is shown, where the search result items may include: title 301, "pick tag" 302 corresponds to a location that may display a portion of answer information contained in a landing page of a search result item; an expansion interface 303 may also be displayed to allow the user to view the entirety of the answer information through the expansion interface 303.
referring to fig. 4, a schematic diagram of displaying a search result item according to an embodiment of the present invention is shown, where if a trigger operation of the expansion interface 303 shown in fig. 3 is received, a jump may be made to the interface shown in fig. 4, all answer information included in a landing page of the search result item may be displayed in fig. 4, and a collapse interface 304 may also be displayed, so that a user collapses a part of the answer information included in the landing page of the search result item through the collapse interface 304, thereby saving a page area.
In an alternative embodiment of the present invention, the landing page type of the search result item may be a step type, and the answer information may include steps and step contents, in which case the steps and step contents may be displayed in the search result item, the landing page type of the step type search result item is, for example, web page B and web page C (https:// zhinan. sogout. com/guide/detail/? id ═ 316512749211). referring to fig. 5, an illustration of displaying answer information in the search result item is shown, in which a title 501 and answer information 502 of the search result item may be displayed, and a portion 502 may display answer information in order of step numbers, where m is a natural number.
In another optional embodiment of the present invention, the type of landing page of the search result item may be a title type, and the answer information may include: the title and title content, since the title typically matches the problem intent, may be displayed in the search result item in this case, and of course both the title and the title content may be displayed simultaneously. Referring to fig. 6, an illustration of displaying answer information in a search result item according to an embodiment of the present invention is shown, wherein a title 601 and answer information 602 of the search result item may be displayed; section 602 may display the title content included with the answer information. Referring to fig. 3 and 4, a position corresponding to the "pick label" 302 may display the title content included in the answer information.
In summary, according to the data processing method of the embodiment of the present invention, answer information included in a landing page of a search result item is displayed in the search result item, and since the answer information can be matched with a question-answer intention corresponding to a search word, the answer information can meet information requirements of a user; therefore, the embodiment of the invention directly displays the answer information meeting the information requirement of the user in the search result item, and can enable the user to obtain the required information without page jump, thereby shortening the operation path of the user and improving the information obtaining efficiency of the user.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Device embodiment
Referring to fig. 7, a block diagram of a data processing apparatus according to an embodiment of the present invention is shown, which may specifically include: a question-answering intention determining module 701, a question-answering intention determining module 702 and an answer information display module 703.
The question-answer intention determining module 701 is used for determining a question-answer intention corresponding to the search term;
an answer information determination module 702, configured to determine answer information that matches the question-answer intention from a landing page of a search result item corresponding to the search word; and
the answer information display module 703 is configured to display the answer information included in the landing page in the search result item corresponding to the search term.
Optionally, the type of landing page is a step type, and the answer information may include: steps and step contents;
Or
the landing page type is a title type, and the answer information may include: title and title content.
alternatively, the answer information determination module 702 may include:
The searching submodule is used for searching the question and answer pairs corresponding to the landing pages of the search result items according to the question and answer intentions so as to obtain answer information matched with the question and answer intentions; the question-answer pair may include: questions and answers.
optionally, the question-answer pairs corresponding to the landing pages may be extracted from the landing pages according to the page structures of the landing pages.
Optionally, the apparatus may further include:
The text segment clustering module is used for clustering the text segments which can be included in the webpage according to the page structure of the webpage so as to obtain the text segment categories;
The candidate question determining module is used for determining candidate questions corresponding to the text segment categories;
And the answer information extraction module is used for extracting answer information corresponding to the candidate question from the text segment corresponding to the candidate question.
Optionally, the apparatus may further include:
the candidate answer pair extraction module is used for extracting candidate answer pairs from the landing pages of the search result items according to the page structures of the landing pages of the search result items;
The candidate answer pair filtering module is used for filtering the candidate answer pairs according to the attribute information of the candidate answer pairs;
Wherein the attribute information may include: semantic representation information and quality information.
optionally, the answer information display module may include:
A first answer information display sub-module, configured to display all the answer information included in the landing page in a search result item corresponding to the search term if the length of the answer information included in the landing page does not exceed a length threshold; or
And the second answer information display sub-module is used for displaying a part of the answer information contained in the landing page in the search result item corresponding to the search word and displaying an expansion interface if the length of the answer information contained in the landing page exceeds a length threshold value, so that a user can view all the answer information through the expansion interface.
for the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
an embodiment of the present invention provides an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for: determining question and answer intentions corresponding to the search terms; determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word; and displaying the answer information contained in the landing page in a search result item corresponding to the search word.
Fig. 8 is a block diagram illustrating an apparatus 800 for data processing in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 8, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
the processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
the multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice data processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency data processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 9 is a schematic diagram of a server in some embodiments of the invention. The server 1900, which may vary widely in configuration or performance, may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
a non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the data processing method shown in fig. 2 or 3.
A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a data processing method, the method comprising: determining question and answer intentions corresponding to the search terms; determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word; and displaying the answer information contained in the landing page in a search result item corresponding to the search word.
The embodiment of the invention discloses A1 and a data processing method, wherein the method comprises the following steps:
determining question and answer intentions corresponding to the search terms;
Determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word;
And displaying the answer information contained in the landing page in a search result item corresponding to the search word.
A2, according to the method of A1,
The type of the landing page is a step type, and the answer information comprises: steps and step contents;
Or
the type of the landing page is a title type, and the answer information comprises: title and title content.
A3, according to the method in A1, the determining answer information matched with the question-answer intention from the landing page of the search result item corresponding to the search word includes:
Searching question-answer pairs corresponding to landing pages of the search result items according to the question-answer intentions to obtain answer information matched with the question-answer intentions; the question-answer pairs comprise: questions and answers.
A4, according to the method of A3, the question-answer pairs corresponding to the landing pages are extracted from the landing pages according to the page structures of the landing pages.
A5, the method of any one of A1 to A4, the method further comprising:
Clustering text segments included in the webpage according to a page structure of the webpage to obtain a text segment category;
Determining candidate problems corresponding to the text segment categories;
And extracting answer information corresponding to the candidate question from the text segment corresponding to the candidate question.
A6, the method of any one of A1 to A4, the method further comprising:
extracting candidate question-answer pairs from the landing pages of the search result items according to the page structures of the landing pages of the search result items;
Filtering the candidate question-answer pairs according to the attribute information of the candidate question-answer pairs;
Wherein the attribute information includes: semantic representation information and quality information.
A7, according to the method of any one of A1 to A4, the displaying the answer information contained in the landing page in the search result item corresponding to the search word includes:
if the length of answer information contained in the landing page does not exceed a length threshold, displaying all the answer information contained in the landing page in a search result item corresponding to the search word; or
If the length of the answer information contained in the landing page exceeds a length threshold value, displaying a part of the answer information contained in the landing page in a search result item corresponding to the search word, and displaying an expansion interface so that a user can view all the answer information through the expansion interface.
The embodiment of the invention discloses B8 and a data processing device, which comprises:
The question-answer intention determining module is used for determining the question-answer intention corresponding to the search word;
the answer information determining module is used for determining answer information matched with the question and answer intentions from the landing page of the search result item corresponding to the search word; and
And the answer information display module is used for displaying the answer information contained in the landing page in the search result item corresponding to the search word.
B9, the device according to B8,
the type of the landing page is a step type, and the answer information comprises: steps and step contents;
or
The type of the landing page is a title type, and the answer information comprises: title and title content.
b10, the apparatus according to B8, the answer information determination module comprising:
The searching submodule is used for searching the question and answer pairs corresponding to the landing pages of the search result items according to the question and answer intentions so as to obtain answer information matched with the question and answer intentions; the question-answer pairs comprise: questions and answers.
B11, according to the device of B10, the question-answer pairs corresponding to the landing pages are extracted from the landing pages according to the page structures of the landing pages.
B12, the apparatus according to any one of B8 to B11, further comprising:
the text segment clustering module is used for clustering the text segments included in the webpage according to the page structure of the webpage to obtain the text segment categories;
The candidate question determining module is used for determining candidate questions corresponding to the text segment categories;
and the answer information extraction module is used for extracting answer information corresponding to the candidate question from the text segment corresponding to the candidate question.
b13, the apparatus according to any one of B8 to B11, further comprising:
The candidate answer pair extraction module is used for extracting candidate answer pairs from the landing pages of the search result items according to the page structures of the landing pages of the search result items;
The candidate answer pair filtering module is used for filtering the candidate answer pairs according to the attribute information of the candidate answer pairs;
Wherein the attribute information includes: semantic representation information and quality information.
B14, the device according to any one of B8 to B11, the answer information display module comprising:
a first answer information display sub-module, configured to display all the answer information included in the landing page in a search result item corresponding to the search term if the length of the answer information included in the landing page does not exceed a length threshold; or
and the second answer information display sub-module is used for displaying a part of the answer information contained in the landing page in the search result item corresponding to the search word and displaying an expansion interface if the length of the answer information contained in the landing page exceeds a length threshold value, so that a user can view all the answer information through the expansion interface.
The embodiment of the invention discloses C15, an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors comprise instructions for:
Determining question and answer intentions corresponding to the search terms;
Determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word;
and displaying the answer information contained in the landing page in a search result item corresponding to the search word.
c16, the device of claim 15,
The type of the landing page is a step type, and the answer information comprises: steps and step contents;
or
The type of the landing page is a title type, and the answer information comprises: title and title content.
c17, the method according to C15, wherein the determining answer information matching with the question-answer intention from the landing page of the search result item corresponding to the search word includes:
searching question-answer pairs corresponding to landing pages of the search result items according to the question-answer intentions to obtain answer information matched with the question-answer intentions; the question-answer pairs comprise: questions and answers.
and C18, according to the device of C17, the question-answer pairs corresponding to the landing pages are extracted from the landing pages according to the page structures of the landing pages.
C19, the apparatus according to any one of C15 to C18, the apparatus further comprising:
Clustering text segments included in the webpage according to a page structure of the webpage to obtain a text segment category;
determining candidate problems corresponding to the text segment categories;
And extracting answer information corresponding to the candidate question from the text segment corresponding to the candidate question.
c20, the apparatus according to any one of C15 to C18, the apparatus further comprising:
Extracting candidate question-answer pairs from the landing pages of the search result items according to the page structures of the landing pages of the search result items;
Filtering the candidate question-answer pairs according to the attribute information of the candidate question-answer pairs;
Wherein the attribute information includes: semantic representation information and quality information.
C21, the displaying the answer information contained in the landing page in the search result item corresponding to the search word according to the device of any one of C15 to C18, including:
if the length of answer information contained in the landing page does not exceed a length threshold, displaying all the answer information contained in the landing page in a search result item corresponding to the search word; or
If the length of the answer information contained in the landing page exceeds a length threshold value, displaying a part of the answer information contained in the landing page in a search result item corresponding to the search word, and displaying an expansion interface so that a user can view all the answer information through the expansion interface.
Embodiments of the present invention disclose D22, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of a 1-a 7.
other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
The data processing method, the data processing apparatus and the apparatus for data processing provided by the present invention are described in detail above, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the description of the above embodiments is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method of data processing, the method comprising:
Determining question and answer intentions corresponding to the search terms;
Determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word;
and displaying the answer information contained in the landing page in a search result item corresponding to the search word.
2. The method of claim 1,
The type of the landing page is a step type, and the answer information comprises: steps and step contents;
or
the type of the landing page is a title type, and the answer information comprises: title and title content.
3. The method according to claim 1, wherein the determining answer information matching the question-answer intention from the landing page of the search result item corresponding to the search word comprises:
Searching question-answer pairs corresponding to landing pages of the search result items according to the question-answer intentions to obtain answer information matched with the question-answer intentions; the question-answer pairs comprise: questions and answers.
4. the method according to claim 3, wherein the question-answer pairs corresponding to the landing pages are extracted from the landing pages according to the page structures of the landing pages.
5. The method according to any one of claims 1 to 4, further comprising:
clustering text segments included in the webpage according to a page structure of the webpage to obtain a text segment category;
Determining candidate problems corresponding to the text segment categories;
and extracting answer information corresponding to the candidate question from the text segment corresponding to the candidate question.
6. the method according to any one of claims 1 to 4, further comprising:
Extracting candidate question-answer pairs from the landing pages of the search result items according to the page structures of the landing pages of the search result items;
Filtering the candidate question-answer pairs according to the attribute information of the candidate question-answer pairs;
Wherein the attribute information includes: semantic representation information and quality information.
7. the method according to any one of claims 1 to 4, wherein the displaying the answer information included in the landing page in the search result item corresponding to the search term comprises:
If the length of answer information contained in the landing page does not exceed a length threshold, displaying all the answer information contained in the landing page in a search result item corresponding to the search word; or
if the length of the answer information contained in the landing page exceeds a length threshold value, displaying a part of the answer information contained in the landing page in a search result item corresponding to the search word, and displaying an expansion interface so that a user can view all the answer information through the expansion interface.
8. A data processing apparatus, comprising:
the question-answer intention determining module is used for determining the question-answer intention corresponding to the search word;
The answer information determining module is used for determining answer information matched with the question and answer intentions from the landing page of the search result item corresponding to the search word; and
And the answer information display module is used for displaying the answer information contained in the landing page in the search result item corresponding to the search word.
9. An apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:
Determining question and answer intentions corresponding to the search terms;
determining answer information matched with the question-answer intention from a landing page of a search result item corresponding to the search word;
And displaying the answer information contained in the landing page in a search result item corresponding to the search word.
10. a machine-readable medium having stored thereon instructions which, when executed by one or more processors, cause an apparatus to perform a data processing method as claimed in one or more of claims 1 to 7.
CN201810589724.2A 2018-06-08 2018-06-08 Data processing method a treatment method apparatus and apparatus for data processing Active CN110580313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810589724.2A CN110580313B (en) 2018-06-08 2018-06-08 Data processing method a treatment method apparatus and apparatus for data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810589724.2A CN110580313B (en) 2018-06-08 2018-06-08 Data processing method a treatment method apparatus and apparatus for data processing

Publications (2)

Publication Number Publication Date
CN110580313A true CN110580313A (en) 2019-12-17
CN110580313B CN110580313B (en) 2024-02-02

Family

ID=68808962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810589724.2A Active CN110580313B (en) 2018-06-08 2018-06-08 Data processing method a treatment method apparatus and apparatus for data processing

Country Status (1)

Country Link
CN (1) CN110580313B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157881A (en) * 2021-03-26 2021-07-23 联想(北京)有限公司 Information processing method and device

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010107111A (en) * 2000-05-25 2001-12-07 서정연 Natural Language Question-Answering System for Integrated Access to Database, FAQ, and Web Site
CN102004794A (en) * 2010-12-09 2011-04-06 百度在线网络技术(北京)有限公司 Search engine system and implementation method thereof
US8412514B1 (en) * 2005-10-27 2013-04-02 At&T Intellectual Property Ii, L.P. Method and apparatus for compiling and querying a QA database
CN103914543A (en) * 2014-04-03 2014-07-09 北京百度网讯科技有限公司 Search result displaying method and device
WO2015058604A1 (en) * 2013-10-21 2015-04-30 北京奇虎科技有限公司 Apparatus and method for obtaining degree of association of question and answer pair and for search ranking optimization
WO2015062482A1 (en) * 2013-11-01 2015-05-07 Tencent Technology (Shenzhen) Company Limited System and method for automatic question answering
CN105653738A (en) * 2016-03-01 2016-06-08 北京百度网讯科技有限公司 Search result broadcasting method and device based on artificial intelligence
CN105786875A (en) * 2014-12-23 2016-07-20 北京奇虎科技有限公司 Method and device for providing question and answer pair data search results
CN105786872A (en) * 2014-12-23 2016-07-20 北京奇虎科技有限公司 Method and device for providing question-answer onebox based on user searches
CN105786871A (en) * 2014-12-23 2016-07-20 北京奇虎科技有限公司 Question-answer search result display method and device based on search terms
CN105786874A (en) * 2014-12-23 2016-07-20 北京奇虎科技有限公司 Method and device for constructing question-answer knowledge base data items based on encyclopedic entries
CN106649760A (en) * 2016-12-27 2017-05-10 北京百度网讯科技有限公司 Question type search work searching method and question type search work searching device based on deep questions and answers
CN106874467A (en) * 2017-02-15 2017-06-20 百度在线网络技术(北京)有限公司 Method and apparatus for providing Search Results

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010107111A (en) * 2000-05-25 2001-12-07 서정연 Natural Language Question-Answering System for Integrated Access to Database, FAQ, and Web Site
US8412514B1 (en) * 2005-10-27 2013-04-02 At&T Intellectual Property Ii, L.P. Method and apparatus for compiling and querying a QA database
CN102004794A (en) * 2010-12-09 2011-04-06 百度在线网络技术(北京)有限公司 Search engine system and implementation method thereof
WO2015058604A1 (en) * 2013-10-21 2015-04-30 北京奇虎科技有限公司 Apparatus and method for obtaining degree of association of question and answer pair and for search ranking optimization
WO2015062482A1 (en) * 2013-11-01 2015-05-07 Tencent Technology (Shenzhen) Company Limited System and method for automatic question answering
CN103914543A (en) * 2014-04-03 2014-07-09 北京百度网讯科技有限公司 Search result displaying method and device
CN105786871A (en) * 2014-12-23 2016-07-20 北京奇虎科技有限公司 Question-answer search result display method and device based on search terms
CN105786875A (en) * 2014-12-23 2016-07-20 北京奇虎科技有限公司 Method and device for providing question and answer pair data search results
CN105786872A (en) * 2014-12-23 2016-07-20 北京奇虎科技有限公司 Method and device for providing question-answer onebox based on user searches
CN105786874A (en) * 2014-12-23 2016-07-20 北京奇虎科技有限公司 Method and device for constructing question-answer knowledge base data items based on encyclopedic entries
CN105653738A (en) * 2016-03-01 2016-06-08 北京百度网讯科技有限公司 Search result broadcasting method and device based on artificial intelligence
CN106649760A (en) * 2016-12-27 2017-05-10 北京百度网讯科技有限公司 Question type search work searching method and question type search work searching device based on deep questions and answers
CN106874467A (en) * 2017-02-15 2017-06-20 百度在线网络技术(北京)有限公司 Method and apparatus for providing Search Results

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
TATSUNORI MORI; MITSURU SATO; MADOKA ISHIOROSHI: "Answering Any Class of Japanese Non-factoid Question by Using the Web and Example Q&A Pairs from a Social Q&A Website", 2008 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, pages 59 - 65 *
何贤江;左航;李远红;: "面向移动平台的FAQD自动问答系统", 四川大学学报(自然科学版), no. 03, pages 560 - 564 *
刘庆明;胡艳胜;: "基于WEB搜索引擎的中文问答系统", 科技资讯, no. 04, pages 90 - 91 *
刘秉权;徐振;刘峰;刘铭;孙承杰;王晓龙;: "面向问答社区的答案摘要方法研究综述", 中文信息学报, no. 01, pages 1 - 7 *
李舟军, 李水华: "基于Web的问答系统综述", 计算机科学, vol. 44, no. 6, pages 1 - 7 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157881A (en) * 2021-03-26 2021-07-23 联想(北京)有限公司 Information processing method and device

Also Published As

Publication number Publication date
CN110580313B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
US11188711B2 (en) Unknown word predictor and content-integrated translator
US10515147B2 (en) Using statistical language models for contextual lookup
US20200320116A1 (en) Providing a summary of a multimedia document in a session
CN109614482B (en) Label processing method and device, electronic equipment and storage medium
US10592571B1 (en) Query modification based on non-textual resource context
CN106462640B (en) Contextual search of multimedia content
US11861319B2 (en) Chatbot conducting a virtual social dialogue
CN108345612B (en) Problem processing method and device for problem processing
US9613093B2 (en) Using question answering (QA) systems to identify answers and evidence of different medium types
CN110770694A (en) Obtaining response information from multiple corpora
US20150154295A1 (en) Searching method, system and storage medium
US11651015B2 (en) Method and apparatus for presenting information
CN111708943B (en) Search result display method and device for displaying search result
CN112631437A (en) Information recommendation method and device and electronic equipment
CN111538830A (en) French retrieval method, French retrieval device, computer equipment and storage medium
CN107784037B (en) Information processing method and device, and device for information processing
CN110851692A (en) Data processing method and device and data processing device
US20170293683A1 (en) Method and system for providing contextual information
CN110580313B (en) Data processing method a treatment method apparatus and apparatus for data processing
CN111460177A (en) Method and device for searching film and television expression, storage medium and computer equipment
CN113033163A (en) Data processing method and device and electronic equipment
CN109446406B (en) Data processing method and device and data processing device
CN114610163A (en) Recommendation method, apparatus and medium
CN113177170A (en) Comment display method and device and electronic equipment
CN111966267A (en) Application comment method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant