CN110851692A

CN110851692A - Data processing method and device and data processing device

Info

Publication number: CN110851692A
Application number: CN201810846026.6A
Authority: CN
Inventors: 林建素
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2020-02-28
Anticipated expiration: 2038-07-27
Also published as: CN110851692B

Abstract

The embodiment of the invention provides a data processing method and device and a device for data processing. The method specifically comprises the following steps: determining a text fragment sequence corresponding to a search text of a user; the text segment sequence comprises: a plurality of text segments arranged in sequence; fusing a query result entity word corresponding to the ith text segment in the text segment sequence with the (i +1) th text segment in the text segment sequence to obtain a query string corresponding to the (i +1) th text segment; wherein i is a natural number, a query string corresponding to one text segment is used for determining a corresponding query result entity word, and a query string corresponding to a first text segment is the first text segment itself; and obtaining a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence. The embodiment of the invention can improve the quality of the search result corresponding to the search text.

Description

Data processing method and device and data processing device

Technical Field

The present invention relates to the field of internet information processing technologies, and in particular, to a data processing method and apparatus, and an apparatus for data processing.

Background

The search engine is a system that collects information from the internet by using a specific computer program according to a certain policy, provides a search service for a user after organizing and processing the information, and displays information related to user search to the user.

Currently, in the process of using a search engine, a user may input a search text in a search box provided by the search engine, the search engine searches to obtain a web page or a document matched with the search text as a search result, and returns the ranked search result to the user by using a certain ranking policy.

At present, under the condition of long search texts, a search engine searches search results matched with the search texts based on text matching or semantic matching, so that the quality of the search results is poor. For example, the search text is "the emperor of the smallest country in the world", and the titles of the top 3 search results corresponding to the search text are: "the smallest land area of the world is only 0.44 square kilometer", "the smallest world-Vatican", "the smallest world-Rich oil flow, one sixth of the world want to go! "etc., it can be seen that the topics of the search results do not match the search intention corresponding to the search text, that is, the current search results do not conform to the search intention of the user, resulting in poor quality of the search results.

Disclosure of Invention

The embodiment of the invention provides a data processing method and device and a data processing device, which can improve the quality of a search result corresponding to a search text.

In order to solve the above problem, an embodiment of the present invention discloses a data processing method, including:

determining a text fragment sequence corresponding to a search text of a user; the text segment sequence comprises: a plurality of text segments arranged in sequence;

fusing a query result entity word corresponding to the ith text segment in the text segment sequence with the (i +1) th text segment in the text segment sequence to obtain a query string corresponding to the (i +1) th text segment; wherein i is a natural number, a query string corresponding to one text segment is used for determining a corresponding query result entity word, and a query string corresponding to a first text segment is the first text segment itself;

and obtaining a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence.

On the other hand, the embodiment of the invention discloses a data processing device, which comprises:

the text segment sequence determining module is used for determining a text segment sequence corresponding to a search text of a user; the text segment sequence comprises: a plurality of text segments arranged in sequence;

a fusion module, configured to fuse a query result entity word corresponding to an ith text segment in the text segment sequence with an (i +1) th text segment in the text segment sequence to obtain a query string corresponding to the (i +1) th text segment; wherein i is a natural number, a query string corresponding to one text segment is used for determining a corresponding query result entity word, and a query string corresponding to a first text segment is the first text segment itself; and

and the search result determining module is used for obtaining a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence.

In yet another aspect, an embodiment of the present invention discloses an apparatus for data processing, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors includes instructions for:

In yet another aspect, an embodiment of the invention discloses a machine-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

the embodiment of the invention realizes the simplification of semantic hierarchy based on the splitting of the search text and the fusion between adjacent text fragments in the text fragment sequence; the fusion between the adjacent text segments may refer to the fusion between the query result entity word corresponding to the ith text segment and the (i +1) th text segment, so that the semantic level corresponding to the ith text segment and the semantic level corresponding to the (i +1) th text segment may be fused, and further, the semantic level may be simplified. The simplification of the semantic level can reduce the semantic level of the search text, namely, compared with the search text, the query string corresponding to the last text segment can have fewer semantic levels; therefore, the query result entity word corresponding to the last text segment is obtained according to the query string corresponding to the last text segment, so that the quality of the query result entity word corresponding to the last text segment can be improved, and the quality of the search result corresponding to the search text can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic representation of an application environment for a data processing method of an embodiment of the present invention;

FIG. 2 is a flow chart of steps of a first embodiment of a data processing method of the present invention;

FIG. 3 is a flowchart illustrating steps of a second embodiment of a data processing method according to the present invention;

FIG. 4 is a flowchart of the third step of a data processing method according to a third embodiment of the present invention;

FIG. 5 is a block diagram of an embodiment of a data processing apparatus of the present invention;

FIG. 6 is a block diagram of an apparatus 800 for data processing of the present invention; and

fig. 7 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a data processing scheme, which can comprise the following steps: determining a text fragment sequence corresponding to a search text of a user; the sequence of text segments may include: a plurality of text segments arranged in sequence; fusing a query result entity word corresponding to the ith text segment in the text segment sequence with the (i +1) th text segment in the text segment sequence to obtain a query string corresponding to the (i +1) th text segment; wherein, i may be a natural number, the query string corresponding to one text segment is used to determine the corresponding query result entity word, and the query string corresponding to the first text segment may be the first text segment itself; and obtaining a search result corresponding to the search text according to the query result body word corresponding to the last text segment in the text segment sequence.

The embodiment of the invention can be suitable for the search texts with more semantic levels, and can realize the simplification of the semantic levels based on the recognition of the semantic levels corresponding to the search texts and the fusion between adjacent text fragments in the text fragment sequence; the fusion between the adjacent text segments may refer to the fusion between the query result entity word corresponding to the ith text segment and the (i +1) th text segment, so that the semantic level corresponding to the ith text segment and the semantic level corresponding to the (i +1) th text segment may be fused, and further, the semantic level may be simplified. The simplification of the semantic level can reduce the semantic level of the search text, namely, compared with the search text, the query string corresponding to the last text segment can have fewer semantic levels; therefore, the query result entity word corresponding to the last text segment is obtained according to the query string corresponding to the last text segment, so that the quality of the query result entity word corresponding to the last text segment can be improved, and the quality of the search result corresponding to the search text can be improved.

The entity word is used to describe an entity, which is a distinguishable and independent object or concept of practical significance. Examples of entity words may include: name of person, place name, organization name, product name, etc. According to the embodiment of the application, the query result entity word corresponding to the ith text segment represents the summary of the semantic level corresponding to the ith text segment, and the semantic level can be simplified due to the simple semantic of the query result entity word. Because the entity can be used for describing the object or concept with practical significance, the semantic hierarchy corresponding to the ith text segment can be more accurately described.

In an example 1 of the embodiment of the present invention, it is assumed that the search text a is "church of the smallest country in the world", and an existing search engine searches for a search result matching the search text a based on text matching or semantic matching, so that the quality of the search result is poor. The embodiment of the invention can determine the following 2 text segments corresponding to the search text A: "minimum countries in the world", and "the emperor of XX X"; moreover, in the embodiment of the present invention, the query result entity word "vatica" corresponding to the 1 st text segment may be fused with the 2 nd text segment to obtain the query string "feijian of vatica" corresponding to the 2 nd text segment, and further, the search result corresponding to the search text a, such as the related information of "feijia" or the page corresponding to "feijia" may be obtained according to the query result entity word corresponding to the 2 nd text segment.

In another example 2 of an embodiment of the present invention, assuming that the search text B is "what dad of dad, existing search engines search for search results that match the search text B based on text matching or semantic matching, which would make the search results of the search poor in quality. The embodiment of the invention can determine the following 3 text segments corresponding to the search text B: "dad of dad", "dad of XXX", and "%% of dad what to call"; moreover, the embodiment of the invention can fuse the query result entity word 'grandpa' corresponding to the 1 st text segment with the 2 nd text segment to obtain the query string 'dad of grandpa' corresponding to the 2 nd text segment; and correspondingly fusing the query result entity word 'great grandfather' corresponding to the 2 nd text segment with the 3 rd text segment to obtain a query string 'what dad of great grandfather' corresponding to the 3 rd text segment, and further obtaining a search result corresponding to the search text B according to the query result entity word corresponding to the 3 rd text segment, such as related information of 'high grandfather' or a page corresponding to 'high grandfather'.

The data processing method provided by the embodiment of the invention can be applied to Application environments such as websites and/or APPs (Application programs) to improve the quality of search results corresponding to search texts.

The data processing method provided by the embodiment of the invention can be applied to the application environment shown in FIG. 1, such as

As shown in fig. 1, the client 100 and the server 200 are located in a wired or wireless network, through which the client 100 and the server 200 perform data interaction.

In one embodiment of the invention, the client 100 may receive a search text of a user and transmit the search text to the server 200. The server 200 may obtain a search result corresponding to the search text by executing the data processing method according to the embodiment of the present invention.

In another embodiment of the present invention, the client 100 may receive a search text of a user, and obtain a search result corresponding to the search text by executing the data processing method according to the embodiment of the present invention.

In the embodiment of the present invention, the search result may be in the form of text, or a picture, or audio, or video. Moreover, the search result may correspond to a web page or a document, and the pages corresponding to the search result may be collectively referred to as a landing page. The search result may be derived from a data source such as a database of a search engine, a database of a vertical website, and the like, and it is understood that the specific source of the search result is not limited by the embodiment of the present invention. For example, in example 1, the page corresponding to the search result "Fangji" may include: encyclopedia pages, etc.

Optionally, the client 100 may run on a terminal, which specifically includes but is not limited to: smart phones, tablet computers, electronic book readers, MP3 (Moving Picture experts Group Audio Layer III) players, MP4 (Moving Picture experts Group Audio Layer IV) players, laptop portable computers, car-mounted computers, desktop computers, set-top boxes, smart televisions, wearable devices, and the like.

Method embodiment one

Referring to fig. 2, a flowchart illustrating steps of a first embodiment of a data processing method according to the present invention is shown, which may specifically include the following steps:

step 201, determining a text fragment sequence corresponding to a search text of a user; the sequence of text segments may include: a plurality of text segments arranged in sequence;

step 202, fusing a query result entity word corresponding to the ith text segment in the text segment sequence with the (i +1) th text segment in the text segment sequence to obtain a query string corresponding to the (i +1) th text segment;

wherein, i may be a natural number, the query string corresponding to one text segment may be used to determine the corresponding query result entity word, and the query string corresponding to the first text segment may be the first text segment itself;

step 203, obtaining a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence.

At least one step of the embodiment shown in fig. 2 may be performed by a server and/or a client, and of course, the embodiment of the present invention does not limit the specific execution subject of each step.

In step 201, a client of the search APP or the search website may provide a UI (user interface) so that a user submits a search text to the client through a search box, a voice interface, and the like on the UI. It is understood that the embodiment of the present invention does not impose a limitation on the specific manner of acquiring the search text.

In an optional embodiment of the present invention, the determining, in step 201, a text segment sequence corresponding to a search text of a user may specifically include: and carrying out semantic analysis on the search text of the user so as to enable one text fragment in the obtained text fragment sequence to correspond to one semantic level. The sequentially arranging may include: the semantic levels are arranged from front to back. The embodiment of the invention can arrange a plurality of text segments according to the sequence of the semantic hierarchy.

In practical application, a semantic analysis method can be adopted to determine the semantic hierarchy of the search text. Semantic analysis refers to a method of mining and learning deep concepts such as texts and pictures. The semantic analysis method may include: sentence component analysis method, machine learning method, etc., it can be understood that the embodiment of the present invention does not impose any limitation on the specific semantic analysis method.

For example, the search text a of example 1 may include: 2 semantic levels, so that 2 text segments corresponding to the search text a can be determined, wherein 1 text segment corresponds to one semantic level. As another example, the search text B of example 2 may include: 3 semantic levels, so that 3 text segments corresponding to the search text B can be determined, wherein 1 text segment corresponds to one semantic level.

In another optional embodiment of the present invention, the determining, in step 201, a text segment sequence corresponding to a search text of a user may specifically include: determining at least one target word unit in the search text of the user; the target word unit may include: the word property of the target word can be a preset word property; and aiming at one target word unit, determining a corresponding text segment and the position information of the text segment.

In the embodiment of the present invention, the predetermined part of speech may be a noun or a pronoun. Pronouns may refer to words that replace formal names. For example, "Dukang" can be a substitute term for "wine"; alternatively, "iron rooster" may be a term in the sense of parsimony, and so on.

The modifier is a word, phrase or clause for modifying other components in the sentence, and the noun, adjective clause, participle and the like can be used as the modifier of the noun or pronoun.

A target word unit in the embodiments of the present invention may refer to a semantic unit corresponding to a semantic hierarchy, and a target word unit may include: the target words and the corresponding target word units. At least one target word unit in the search text can be obtained according to a semantic analysis method or a syntax analysis method.

For example, in example 1, the target word units in the search text a "church of the smallest country in the world" may include, in order from front to back:

target word unit a 1: "smallest world country"; and

target word unit a 2: "Czar of XXX".

Wherein, the noun of the target word unit A1 can be "state", and the modifier can be "minimum in the world"; the noun of the target word unit a2 may be "church", and the modifier may be "smallest country in the world", i.e., "XXX" represents "smallest country in the world".

As another example, in example 2, searching for target word units in the text B "what is dad of dad" can include, in order from front to back:

target word unit B1: "father of father";

target word unit B2: "fath of XXX"; and

target word unit B3: "%% of dad is what.

Wherein, the noun of the target word unit B1 can be "dad", and the modifier can be "dad"; the noun of target word unit B2 can be "dad", and the modifier can be "dad's dad", that is, "XXX" represents "dad's dad"; the noun of the target word unit B3 can be "dad", and the modifier of the target word unit B3 can be "dad of dad", that is, "%%%" can be "dad of dad".

The embodiment of the present invention may determine at least one target word unit in the search text of the user according to the foregoing semantic analysis method, and the specific determination process of the target word unit is not limited in the embodiment of the present invention.

In the embodiment of the present invention, one target word unit may correspond to one text fragment.

Optionally, the determining, for one target word unit, the position information of the text segment in the search text may specifically include: and determining the position information of the text segment corresponding to the target word unit according to the position information of the target word or the modifier in the search text in the target word unit.

Optionally, the position information may be represented by numbers in a sequence from front to back, for example, the position information of the text segment corresponding to the 1 st target word unit is number 1, and the position information of the text segment corresponding to the 2 nd target word unit is number 2 …, and it is understood that the specific representation manner of the position information is not limited in the embodiment of the present invention. The sequence of the target word units in the embodiment of the present invention may be from front to back, and may also be from back to front.

In an optional embodiment of the present invention, the determining, for a target word unit, a text segment corresponding to the target word unit may specifically include:

determining a mode 1, replacing a modifier in a target word unit by adopting a preset character string, and obtaining a text segment according to the replaced target word unit; or

And determining a mode 2, and taking a target word unit as a text fragment.

Determination 2 may be applied to the first target word unit from front to back. For example, "the smallest country in the world" in example 1 may be directly taken as the 1 st text fragment; as another example, "dad's of dad" in example 2 can be taken directly as the 1 st text segment.

Determination mode 1 may be applied to non-first target word units from front to back. For example, the modifier "the smallest country in the world" in the "church of the smallest country in the world" in example 1 may be replaced with a preset character string, and examples of the preset character string may include: "XXX"%, "%", etc., to obtain the text fragment "emperor of XXX". Similarly, the modifier "dad of dad" in "dad of dad" in example 2 can be replaced with a preset character string to get the text fragment "dad of XXX"; and, the modifier "dad of dad's dad" in "dad's dad ' in example 2 can be replaced with a preset character string to get what is dad of text fragment"% ".

In step 202, adjacent text segments in the text segment sequence may be fused to simplify semantic hierarchy; the fusion between the adjacent text segments can mean the fusion between the query result entity word corresponding to the ith text segment and the (i +1) th text segment, so that the semantic level corresponding to the ith text segment and the semantic level corresponding to the (i +1) th text segment can be fused, and the semantic level can be simplified; where i may be a natural number greater than 0.

For example, in example 1, the query result entity word corresponding to the 1 st text segment may be fused with the 2 nd text segment to obtain the query string corresponding to the 2 nd text segment. For another example, in example 2, the query result entity word corresponding to the 1 st text segment may be fused with the 2 nd text segment to obtain a query string corresponding to the 2 nd text segment; and fusing the query result entity word corresponding to the 2 nd text segment with the 3 rd text segment to obtain a query string corresponding to the 3 rd text segment.

In the embodiment of the present invention, a query string corresponding to one text segment may be used to determine a corresponding query result entity word, a query string corresponding to a first text segment may be the first text segment itself, and a query string corresponding to a non-first text segment may be obtained by fusing a query result entity word corresponding to a previous text segment with a current text segment.

In an alternative embodiment of the present invention, the query result entity word corresponding to one text segment may be determined by the following steps:

determining question and answer intentions of a query string corresponding to one text segment;

and determining answer information matched with the question-answer intention from the webpage or the document corresponding to the query string of the text segment, wherein the answer information is used as a query result entity word corresponding to the text segment.

The embodiment of the invention can provide the following determination scheme of the question-answering intention:

determination of scheme 1,

In the scheme 1, the process of determining the question-answering intention of the query string corresponding to one text segment may include: identifying a current LAT (Lexical Answer Type) directive word from the query string; searching in a mapping relation between a preset LAT directional word and an LAT word according to the current LAT directional word to obtain a target LAT word corresponding to the current LAT directional word; and obtaining the question-answer intention corresponding to the query string according to the target LAT words.

The determining scheme 1 can firstly identify a current LAT directional word contained in a query string, and then obtain a target LAT word corresponding to the current LAT directional word according to a mapping relation between the LAT directional word and the LAT word; the LAT directional words can be used for expressing words with directivity to the question-answering intention, and the LAT words can be used for representing the question-answering intention. In this way, the mapping relationship between the LAT directional words and the LAT words can describe the directional relationship from the LAT directional words to the LAT words corresponding to the question-answer intention.

According to the embodiment of the invention, a target LAT word corresponding to the current LAT directional word is obtained according to the mapping relation between the LAT directional word and the LAT word; because the target LAT word is obtained through derivation of the current LAT directional word, the question-answering intention corresponding to the query string can be obtained through derivation even if the query string does not contain the target LAT word. Therefore, the embodiment of the invention can obtain the question-answer intention corresponding to the query string according to the deduced question-answer intention under the condition that the query string does not carry complete question-answer requirements, so that the accuracy of the answer intention can be improved.

LAT words, which may be used to represent text in a question that indicates the type of answer. Alternatively, a large number of questions may be collected and analyzed statistically to build a bank of LATs that can be used to store LAT words. For example, the LAT words stored in the LAT bank may include: emperor, island, mountain peak, event, country, flower, river, church, etc. It is to be understood that embodiments of the present invention are not limited to the specific LAT words.

Optionally, the complete problem can be analyzed, LAT directional words are mined out according to corresponding analysis results, and the mined LAT directional words are stored in an LAT directional word bank; and establishing a mapping relation between the LAT directional words and the LAT words. Referring to table 1, an illustration of a mapping relationship between LAT directional words and LAT words of the present invention is shown. It is understood that the LAT words shown in table 1 are only examples, and actually, the LAT words such as "person" may also be subdivided into "emperor", "scientist", "poetry", "physicist", etc., and it is understood that the LAT words of the embodiment of the present invention may be any entity type and/or entity words corresponding to any entity type, and the mapping relationship between specific LAT directional words and LAT words is not limited by the embodiment of the present invention.

TABLE 1

In practical applications, the above process of identifying the current LAT directional word from the query string may include: and matching each vocabulary contained in the query string with each LAT directional word in the LAT directional word stock, and if the matching is successful, taking the successfully matched vocabulary contained in the query string as the current LAT directional word. It is to be appreciated that embodiments of the invention are not limited to the particular process of identifying a current LAT directional word from a query string.

In an application example 1 of the present invention, assuming that the query string is "known as" and "known as" exists in the LAT directional lexicon, the "known" target LAT word "person and/or thing" can be obtained by looking up the table 1. Further, assuming that the query string is "known as the parent of CD", and assuming that "the parent" exists in the LAT directional dictionary, the target LAT word "person" corresponding to "the parent" can be obtained from the look-up table 1, and finally it can be determined that "the parent known as CD" corresponds to the question and answer intention "person". Similarly, assuming that the query string is "known as physical holy sword", it can be determined that the corresponding question and answer intention is "weapon".

In an application example 2 of the present invention, assuming that "location" exists in the LAT directional lexicon when the query string is "location of taj jiling, the" location "of the corresponding target LAT word" geographic location "can be obtained by looking up table 1.

In an application example 3 of the present invention, when the query string is "proposed mass-energy equation", if "proposed" exists in the LAT directional lexicon, the target LAT word "person" corresponding to "proposed" can be obtained by looking up the table 1.

In an application example 4 of the present invention, assuming that the query string is "five-risk one-gold meaning", and assuming that "what meaning" exists in the LAT directional word stock, "a target LAT word" concept "corresponding to" five-risk one-gold meaning "can be obtained by looking up table 1.

Since the target LAT word may be used as a core word or a focus word of the question corresponding to the query string, which may reflect the answer type of the question corresponding to the query string, the target LAT word may be directly used as the question-answer intention corresponding to the query string, or the target LAT word may be further processed (e.g., a fusion process of a plurality of target LAT words, etc.) to obtain the question-answer intention corresponding to the query string.

Determination of scheme 2,

In the determining scheme 2, the process of determining the question-answering intention of the query string corresponding to one text segment may include: performing dependency syntax analysis on the query string to obtain a corresponding dependency syntax analysis result; extracting core semantic units from the dependency syntax analysis result; and obtaining the question-answer intention corresponding to the query string according to the core semantic unit.

Determining scheme 2, extracting a core semantic unit from a dependency syntax analysis result corresponding to the query string, and obtaining a question-answer intention corresponding to the query string according to the core semantic unit; the core semantic unit for characterizing the question-answering intention may include: core words, etc.

In practical applications, the dependency parsing result may include: the dependency tree can be used for representing the dependency relationship among words included in the query string, analyzing the dependency tree, and extracting the core semantic unit from the dependency tree according to the analysis result.

In practical application, the dependency tree may be analyzed according to a preset extraction rule, and the core semantic unit may be extracted from the dependency tree according to the analysis result.

Optionally, the extracting the core semantic unit from the dependency syntax analysis result may include: and if the words immediately after the query words in the dependency tree are nouns or noun phrases, extracting the nouns or noun phrases as the core semantic unit. If a query word is followed by a noun or noun phrase in the dependency tree, then the noun or noun phrase may be the core semantic unit described above for characterizing the intent of the question-answering. For example, the query string "which scientist has helped kosher to escape from germany" which interrogative word "which" is followed by the noun "scientist", so "scientist" can be taken as the core semantic unit.

Optionally, the extracting the core semantic unit from the dependency syntax analysis result may include: and if the query word in the dependency tree is at the end of the query string, extracting a noun or noun phrase closest to the query word as the core semantic unit. If the query is at the end of the query string, the noun or noun phrase closest to the query may be the core semantic unit that characterizes the question-answer intent. For example, suppose the query string is "what is known as the parent of CD", and the corresponding query string includes: "is known as the father of the CD", the noun phrase closest to the query word "which" in the query string is the father of the CD ", and the" father of the CD "can be used as the core semantic unit.

Optionally, the extracting the core semantic unit from the dependency syntax analysis result may include: and if the word next to the query word in the dependency tree is a verb, extracting the last noun or noun phrase in the query string as the core semantic unit. If a question word is followed by a verb, the last appearing noun or noun phrase in the query string may be the core semantic unit that characterizes the intent of the question-answer. For example, assuming that the query string is "how to fold a paper plane," the query string is "how" followed by the verb "to fold," the last appearing noun phrase "paper plane" can be taken as a core semantic unit. For another example, assuming that the query string is "how to download the complete content of the hundred-degree library file free of charge", the query string is "how" to follow the verb "to download", so that the last noun phrase "complete content of the hundred-degree library file" can be used as the core semantic unit.

It should be understood that the preset extraction rule is only an alternative embodiment, and the embodiment of the present invention does not limit the specific extraction rule. Since the core semantic unit may be used as a core word or a focus word of the query string, which may reflect the answer type of the query string, the core semantic unit may be directly used as the question-answer intention corresponding to the query string, or the core semantic unit may be further processed (such as fusion processing of multiple core semantic units, etc.) to obtain the question-answer intention corresponding to the query string.

Determination of protocol 3

In the determination scheme 3, the process of determining the question-answering intention of the query string corresponding to one text segment may include: and performing intention recognition on the query string of the question-answer pair through a field recognition module and a field intention recognition module.

The domain identification module can be used for identifying the domain to which the query string belongs; examples of fields may include: "olympic sports", "geographic problems", "computer digital", "laws and regulations", "life", "education science", "economic finance", "emotional family", "social life", "leisure and entertainment", "medical health", "artistic words", "games", etc., although the embodiment of the present invention is not limited to specific fields.

The domain intention identifying module can be used for identifying question and answer intentions corresponding to the query strings in the domain.

According to one embodiment, the intent recognition module may be regarded as a multi-classification task, so that the domain intent recognition module may recognize the question-answering intent corresponding to the query string in the domain through the classifier. The classification tasks of the classifier can be obtained according to a plurality of question and answer intentions (one question and answer intention can correspond to one classification task), and the training samples of the classifier are obtained according to the question and answer input corresponding to the plurality of question and answer intentions, for example, the training samples can be question corpus and labeled question and answer intention categories, and the training samples are trained on the basis to obtain the classifier.

According to another embodiment, the domain intention identifying module may identify a question-answer intention corresponding to the query string in the domain through a question-answer intention statement mode, where the question-answer intention statement mode may be used to represent a statement mode corresponding to the question-answer intention in the domain, and may include at least one keyword, and the at least one keyword may conform to a corresponding grammar rule; in this way, the question-answer intention pattern corresponding to the query string can be obtained based on the matching between the query string and the question-answer intention statement pattern. For example, the question-answer intent statement patterns may include: the question and answer intention can be the concept corresponding to the modifier and the noun in the case of the modifier and the noun "

It can be understood that, according to the actual application requirements, a person skilled in the art may determine the question-answering intention of the query string corresponding to one text segment by using any one or a combination of the determination schemes 1 to 3.

In the embodiment of the present invention, a web page or a document corresponding to a query string of a text fragment may be obtained by querying from a data source according to the query string. The query principle corresponding to the web page or document corresponding to the query string is similar to that of the existing search engine, and therefore, the details are not described herein.

According to the embodiment of the invention, answer information matched with the question-answer intention can be determined from the webpage or the document corresponding to the query string and is used as the query result entity word corresponding to the text segment. For example, the 1 st text segment "the smallest country in the world" of example 1, the corresponding question-answering intent being "country", and answer information "Vatican" matching the question-answering intent may be determined from the web page or document corresponding to the query string.

The following describes in detail the process of determining answer information matching the question-answer intention from a web page or document corresponding to the query string.

In an optional embodiment of the present invention, in an offline state, question-answer pairs may be mined from a web page in advance, and in an online state, answer information that is included in a page corresponding to a query string and matches with a question-answer intention may be determined by querying the question-answer pairs. Specifically, the process of determining answer information matched with the question-answer intention from the web page or the document corresponding to the query string may specifically include: searching question-answer pairs corresponding to pages of the query string according to the question-answer intentions to obtain answer information matched with the question-answer intentions; the question-answer pair may include: questions and answers. Because the question-answer pairs are mined in the off-line state and the answer information is determined by inquiring the question-answer pairs in the on-line state, the determination efficiency of the answer information can be improved.

The embodiment of the invention does not limit the specific mining mode adopted for mining the question and answer pairs from the webpage. For example, the excavation method may include: a manual excavation mode, an automatic excavation mode and the like, wherein the automatic excavation mode can comprise the following steps: and an extraction template mining mode can be configured, and the extraction template can specify question sentences and connection words between the question sentences and answers, so that question and answer pairs can be extracted. However, the manual digging method needs to spend more labor cost; the extraction template mining method needs to rely on an effective extraction template, and if the extraction template is not matched with a certain text segment of a webpage, a question-answer pair may not be extracted from the text segment.

In an optional embodiment of the present invention, the question-answer pair corresponding to the page of the query string may be extracted from the page of the query string according to the page structure of the page of the query string. The page structure can refer to the layout of page content, question and answer pairs are extracted according to the page structure, and the extraction is not limited by limited extraction templates, so that the coverage rate of the question and answer pairs can be improved.

In practical applications, the page structure may be determined by the page source code. The page source code may refer to the source code of the page, which may represent the language makeup of the page. Alternatively, the page structure may be characterized by page elements or tags. Alternatively, the page structure may be characterized by a DOM (Document Object Model) tree, and it should be understood that the specific way of characterizing the page structure is not limited in the embodiment of the present invention.

In an optional embodiment of the present invention, the process of extracting and obtaining question-answer pairs from the pages of the query string according to the page structure of the pages of the query string may specifically include: clustering text segments included in the webpage according to a page structure of the webpage to obtain a text segment category; determining candidate problems corresponding to the text segment categories; and extracting answer information corresponding to the candidate question from the text segment corresponding to the candidate question.

The embodiment of the invention can obtain the page structure, such as label information, corresponding to each text segment of the webpage based on the analysis of the page structure. And clustering the plurality of text segments according to the page structure of the text segments, wherein the clustering can be used for aggregating the text segments with similar page structures into the same text segment category.

The manner of determining the candidate question corresponding to the text segment category may include: template feature mode, and/or rule scoring mode. The template features may correspond to problematic features, such as word features, sentence features, or phrase features; a rule scoring approach may be used to evaluate candidate questions. Alternatively, the connection relationship between the vocabularies corresponding to the candidate questions can be scored according to the language model. According to one embodiment, if a text segment class includes a language unit (word, phrase or sentence) that matches the template feature and the corresponding score exceeds the score threshold, the language unit may be regarded as a candidate question.

Taking a webpage A with URL (Uniform Resource Locator) as http:// www.66law.cn/specific/wxyj as an example, the method can cluster text segments included in the webpage according to the page structure of the webpage to obtain text segment categories, and determine the following candidate problems corresponding to the text segment categories: the method includes the following steps of 'what the five-risk one-money includes', 'what the five-risk one-money uses', 'the five-risk one-money changes into the four-risk one-money latest message', 'the five-risk one-money minimum standard', 'the five-risk one-money payment proportion', 'how to handle the five-risk one-money after leaving', 'how to handle the five-risk one-money by oneself', 'the consequence of not paying the five-risk one-money', 'how to handle the five-risk one-money by a company', 'no provision is made for the law for signing the five-risk one-money in a labor contract', and the like.

According to the embodiment of the invention, answer information corresponding to the candidate question can be extracted from the text segment corresponding to the candidate question. Optionally, the answer information may include: the title type, for example, the answer information corresponding to the web page a is the title type. Optionally, the answer information may include: step types, such as https:// zhinan. sogou. com/guide/detail/? The answer information corresponding to the web page B with id of 316512868864 is the step type. Of course, the embodiment of the present invention does not limit the specific type of answer information.

In an optional embodiment of the present invention, the method of the embodiment of the present invention may further include: extracting candidate question-answer pairs from the pages of the query string according to the page structure of the pages of the query string; and filtering the candidate question-answer pairs according to the attribute information of the candidate question-answer pairs. The embodiment of the invention can filter the candidate question-answer pairs according to the attribute information, can remove the candidate question-answer pairs which do not accord with the preset condition through the filtering, and can reserve the candidate question-answer pairs which accord with the preset condition so as to improve the quality of the question-answer pairs.

Wherein the attribute information may include: semantic representation information and quality information.

The semantic representation information may be used to determine the similarity between candidate question-answer pairs, so that candidate question-answer pairs with higher similarity may be filtered out.

Alternatively, the semantic representation information may be obtained by performing semantic analysis on the candidate question-answer pairs. Semantic analysis methods that may be employed may include: a topic model method, a deep learning method, and the like. The topic model method may include: LDA (document theme generation model), etc. The deep learning method may include: word embedding (word embedding), Recurrent Neural Network (Recurrent Neural Network), convolutional Neural Network (Recurrent Neural Network), etc.

The quality information may reflect the quality of the candidate question-answer pairs, so that candidate question-answer pairs with poor quality may be filtered out, and candidate question-answer pairs with better quality may be retained.

The quality information may include: and the candidate question-answer pairs correspond to the page quality information and/or the site quality information. Through the quality information, data which do not relate to questions and answers or data which are not asked for answers can be removed, and data which are clear in questions and answers, relatively related in answers and relatively credible in source are reserved.

It can be understood that the above-mentioned manner of determining answer information by querying question-answer pairs is only an optional embodiment, and actually, answer information matching with question-answer intentions may also be extracted from the text of the page of the query string by using a text extraction technique.

In this embodiment of the present invention, step 202 may fuse adjacent text segments according to the relationship of semantic hierarchy. It is to be understood that the embodiments of the present invention are not limited to the specific fusion process.

In an alternative embodiment of the present invention, the (i +1) th text segment may include: presetting a character string and a target word, and fusing a query result entity word corresponding to the ith text segment in the text segment sequence with the (i +1) th text segment in the text segment sequence in step 202, which may specifically include:

and replacing a preset character string in the (i +1) th text segment by adopting a query result entity word corresponding to the ith text segment, and taking the (i +1) th text segment after replacement as a query string corresponding to the (i +1) th text segment.

The query result entity word corresponding to the ith text segment is the summary of the previous semantic level, so that the summary of the previous semantic level and the expression of the current semantic level can be fused.

For example, in example 1, the query result concrete word corresponding to the 1 st text segment may be "Vatican", and thus "Vatican" may be fused with the 2 nd text segment "the church of XXX" to obtain the query string corresponding to the 2 nd text segment, "the church of Vatican".

For example, in example 2, the query result entity word corresponding to the 1 st text segment may be "grandpa", and "grandpa" and the 2 nd text segment may be fused to obtain the query string "dad of grandpa" corresponding to the 2 nd text segment; and the query result entity word "great grandfather" corresponding to the 2 nd text segment can be fused with the 3 rd text segment correspondingly to obtain the query string "what dad of great grandfather" corresponding to the 3 rd text segment.

In an example 3 of the embodiment of the present invention, the search text C is "who was given the thrilling female hero step by step", and the search text C is split to obtain the 1 st text fragment "who was given the thrilling female hero step by step" and the 2 nd text fragment "XXX"; further, the query result entity word corresponding to the 1 st text segment may be "rays", and the "rays" may be fused with the 2 nd text segment to obtain the query string corresponding to the 2 nd text segment, "who was followed by the rays".

In an example 4 of the embodiment of the present invention, the search text D is "how a starbucks at five road junctions can be paid in a WeChat", and the search text D is split to obtain a1 st text segment "starbucks at five road junctions" and a2 nd text segment "XXX can be paid in a WeChat manner"; further, the query result entity word corresponding to the 1 st text segment may be "starbucks (five crossing stores)", and the "starbucks (five crossing stores)" may be fused with the 2 nd text segment to obtain the query string corresponding to the 2 nd text segment, "the starbucks (five crossing stores) can pay with little information".

In an example 5 of an embodiment of the present invention, the search text E is "what the cultural heritage owned by the most developed countries in the world is", and the search text D is split to obtain the 1 st segment of text "which the most developed countries in the world" and the 2 nd segment of text "which the cultural heritage owned by XXX is"; further, the query result entity word corresponding to the 1 st text segment may include: "sweden", "finland", "switzerland", "australia", "germany", etc., each query result entity word may be fused with the 2 nd text segment, respectively, to obtain a query string corresponding to the 2 nd text segment, such as "what the cultural heritage owned by sweden has".

In step 203, the query result entity word corresponding to the last text segment in the text segment sequence may be used as a basis for searching a search result corresponding to the text.

In an optional embodiment of the present invention, the query result body word corresponding to the last text segment in the text segment sequence includes: and the target answer information is matched with the question-answering intention corresponding to the last text segment.

Step 203 obtains a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence, which may specifically include:

taking the query result entity word corresponding to the last text segment in the text segment sequence as a search result corresponding to the search text; or

And taking the webpage or the document corresponding to the query result corresponding to the last text segment in the text segment sequence as the search result corresponding to the search text.

For example, in example 1, the target answer information (i.e., the query result entity words) matching the question-answer intention corresponding to the last text segment may be "any one of them", so that "any one of them" may be used as the search result corresponding to the search text, or a web page or document corresponding to "any one of them" may be used as the search result corresponding to the search text, and the web page corresponding to "any one of them" may include: the encyclopedia pages corresponding to Fang Ji, respectively, and the like.

The web page or document corresponding to the target answer information may refer to a web page or document related to the target answer information. According to one embodiment, a target question-answer pair matching the target answer information may be determined, and a web page or a document containing the target question-answer pair may be used as the web page or the document corresponding to the target answer information. According to another embodiment, a web page or a document with the subject term including the target answer information may be used as the web page or the document corresponding to the target answer information. It can be understood that the embodiment of the present invention does not limit the specific determination process of the web page or the document corresponding to the target answer information.

It should be noted that, in the embodiment of the present invention, the search result corresponding to the search text may further include: and searching the obtained search result matched with the search text based on text matching or semantic matching.

In practical applications, the search result obtained in step 203 may be presented for the user to view.

In summary, the data processing method of the embodiment of the present invention realizes semantic level simplification based on the splitting of the search text and the fusion between adjacent text fragments in the text fragment sequence; the fusion between the adjacent text segments may refer to the fusion between the query result entity word corresponding to the ith text segment and the (i +1) th text segment, so that the semantic level corresponding to the ith text segment and the semantic level corresponding to the (i +1) th text segment may be fused, and further, the semantic level may be simplified. The simplification of the semantic level can reduce the semantic level of the search text, namely, compared with the search text, the query string corresponding to the last text segment can have fewer semantic levels; therefore, the query result entity word corresponding to the last text segment is obtained according to the query string corresponding to the last text segment, so that the quality of the query result entity word corresponding to the last text segment can be improved, and the quality of the search result corresponding to the search text can be improved.

Method embodiment two

Referring to fig. 3, a flowchart illustrating steps of a second embodiment of the data processing method of the present invention is shown, which may specifically include the following steps:

step 301, receiving a search text of a user;

step 302, determining a text fragment sequence corresponding to a search text of a user; the text segment sequence may include, in order from front to back: query1, query2, … …, query N, N is a natural number greater than 1;

step 303, taking out the ith text segment query i from the text segment sequence; i is more than or equal to 1 and less than or equal to N;

step 304, determining a query string corresponding to the query i;

step 305, determining query result entity words corresponding to the query i according to the query string corresponding to the query i;

step 306, determining whether i is N, if so, executing step 307, otherwise, if i is i +1, and returning to execute step 303;

and 307, obtaining a search result corresponding to the search text according to the query result entity word corresponding to the query N.

It should be noted that, in the case that i is greater than 1, step 304 may fuse the query result concrete word corresponding to query (i-1) with query i to obtain the query string corresponding to query i.

The process of determining the query result entity word corresponding to the query i may include: determining question and answer intentions of query strings corresponding to the query i; and determining answer information matched with the question-answer intention from the webpage or the document corresponding to the query string of the query i, and taking the answer information as a query result entity word corresponding to the query i. Furthermore, answer information matched with the question-answering intention can be fused with query (i +1) to obtain a query string corresponding to query (i + 1).

Method embodiment three

Referring to fig. 4, a flowchart illustrating steps of a third embodiment of the data processing method of the present invention is shown, which may specifically include the following steps:

step 401, receiving a search text of a user;

step 402, judging whether a search text of a user corresponds to a plurality of semantic levels, if so, executing step 404;

step 403, judging whether the search text of the user comprises a plurality of target word units, if so, executing step 404;

step 404, determining a text fragment sequence corresponding to a search text of a user; the sequence of text segments may include: a plurality of text segments arranged in sequence;

step 405, fusing a query result entity word corresponding to the ith text segment in the text segment sequence with the (i +1) th text segment in the text segment sequence to obtain a query string corresponding to the (i +1) th text segment;

and 406, obtaining a search result corresponding to the search text according to the query result body word corresponding to the last text segment in the text segment sequence.

Before determining the text segment sequence corresponding to the search text of the user, the embodiment of the invention can judge whether semantic hierarchy simplification needs to be carried out on the search text, namely whether the data processing method of the embodiment of the invention needs to be utilized to process the search text.

The judgment of the embodiment of the invention can comprise the following steps: a first judgment of step 402 and a second judgment of step 403.

The first judgment in step 402 is used to judge whether the search text of the user corresponds to multiple semantic levels. Optionally, a semantic analysis method may be adopted to determine whether the search text of the user corresponds to multiple semantic levels.

The second judgment of step 403 is used to judge whether a plurality of target word units are included in the search text of the user. Alternatively, a syntax analysis method may be used to determine whether the search text of the user includes a plurality of target word units.

If the result of the first judgment is yes and/or the result of the second judgment is yes, the text segment sequence corresponding to the search text of the user can be determined, that is, the subsequent data processing flow can be executed. In this way, when the result of the first determination is negative or the result of the second determination is negative, the subsequent data processing flow can be executed, and the amount of calculation for data processing can be reduced.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Device embodiment

Referring to fig. 5, a block diagram of a data processing apparatus according to an embodiment of the present invention is shown, which may specifically include: a text segment sequence determination module 501, a fusion module 502 and a search result determination module 503.

The text segment sequence determining module 501 is configured to determine a text segment sequence corresponding to a search text of a user; the text segment sequence comprises: a plurality of text segments arranged in sequence;

a fusion module 502, configured to fuse a query result entity word corresponding to an ith text segment in the text segment sequence with an (i +1) th text segment in the text segment sequence to obtain a query string corresponding to the (i +1) th text segment; wherein i is a natural number, a query string corresponding to one text segment is used for determining a corresponding query result entity word, and a query string corresponding to a first text segment is the first text segment itself; and

and the search result determining module 503 is configured to obtain a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence.

Optionally, the apparatus may further include:

the query result entity word determining module is used for determining a query result entity word corresponding to one text segment:

the query result entity word determination module may include:

the question-answer intention determining module is used for determining the question-answer intention of the query string corresponding to one text segment;

and the answer determining module is used for determining answer information matched with the question-answer intention from the webpage or the document corresponding to the query string of the text segment, and the answer information is used as the query result entity word corresponding to the text segment.

Alternatively, the (i +1) th text fragment may include: presetting a character string and a target word;

the fusion module 502 may include:

and the replacing module is used for replacing the preset character string in the (i +1) th text segment by adopting the query result entity word corresponding to the ith text segment, and taking the (i +1) th text segment after replacement as the query string corresponding to the (i +1) th text segment.

Optionally, the search result determining module 503 may include:

the first search result determining module is used for taking the query result entity word corresponding to the last text segment in the text segment sequence as the search result corresponding to the search text; or

And the second search result determining module is used for taking the webpage or the document corresponding to the query result entity word corresponding to the last text segment in the text segment sequence as the search result corresponding to the search text.

Optionally, the text segment sequence determining module 501 may include:

and the semantic analysis module is used for performing semantic analysis on the search text of the user so as to enable one text fragment in the obtained text fragment sequence to correspond to one semantic level.

Optionally, the text segment sequence determining module 501 may include:

the target word unit determining module is used for determining at least one target word unit in the search text of the user; the target word unit may include: the word recognition method comprises the following steps of (1) target words and modifier words corresponding to the target words, wherein the part of speech of the target words is a preset part of speech;

and the segment position determining module is used for determining a corresponding text segment and the position information of the text segment in the search text aiming at a target word unit.

Optionally, the segment position determination module may include:

the first segment determining module is used for replacing a modifier in a target word unit by adopting a preset character string and obtaining a text segment according to the replaced target word unit; or

And the second segment determining module is used for taking a target word unit as a text segment.

Optionally, the segment position determination module may include:

and the position determining module is used for determining the position information of the text segment corresponding to the target word unit according to the position information of the target word or the modifier in the search text in the target word unit.

Optionally, the apparatus may further include:

the first judging module is used for judging whether the search text of the user corresponds to a plurality of semantic levels before the text segment sequence determining module determines the text segment sequence corresponding to the search text of the user, and if so, the text segment sequence determining module is triggered to determine the text segment sequence corresponding to the search text of the user; or

The second judging module is used for judging whether the search text of the user can comprise a plurality of target word units before the text segment sequence determining module determines the text segment sequence corresponding to the search text of the user, and if so, triggering the text segment sequence determining module to determine the text segment sequence corresponding to the search text of the user.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention provides an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for: determining a text fragment sequence corresponding to a search text of a user; the text segment sequence comprises: a plurality of text segments arranged in sequence; fusing a query result entity word corresponding to the ith text segment in the text segment sequence with the (i +1) th text segment in the text segment sequence to obtain a query string corresponding to the (i +1) th text segment; wherein i is a natural number, a query string corresponding to one text segment is used for determining a corresponding query result entity word, and a query string corresponding to a first text segment is the first text segment itself; and obtaining a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence.

Fig. 6 is a block diagram illustrating an apparatus 800 for data processing in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice data processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency data processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 7 is a schematic diagram of a server in some embodiments of the invention. The server 1900, which may vary widely in configuration or performance, may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the data processing method shown in fig. 2 or fig. 3 or fig. 4.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a data processing method, the method comprising: determining a text fragment sequence corresponding to a search text of a user; the text segment sequence comprises: a plurality of text segments arranged in sequence; fusing a query result entity word corresponding to the ith text segment in the text segment sequence with the (i +1) th text segment in the text segment sequence to obtain a query string corresponding to the (i +1) th text segment; wherein i is a natural number, a query string corresponding to one text segment is used for determining a corresponding query result entity word, and a query string corresponding to a first text segment is the first text segment itself; and obtaining a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence.

The embodiment of the invention discloses A1 and a data processing method, wherein the method comprises the following steps:

A2, according to the method A1, determining a query result entity word corresponding to a text segment by the following steps:

A3, the method of A1 or A2, wherein the (i +1) th text fragment comprises: presetting a character string and a target word;

the fusing the query result entity word corresponding to the ith text segment in the text segment sequence with the (i +1) th text segment in the text segment sequence includes:

A4, according to the method of A1 or A2, obtaining a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence, including:

And taking the webpage or the document corresponding to the query result entity word corresponding to the last text segment in the text segment sequence as the search result corresponding to the search text.

A5, the method according to A1 or A2, wherein the determining the text segment sequence corresponding to the user's search text includes:

and carrying out semantic analysis on the search text of the user so as to enable one text fragment in the obtained text fragment sequence to correspond to one semantic level.

A6, the method according to A1 or A2, wherein the determining the text segment sequence corresponding to the user's search text includes:

determining at least one target word unit in the search text of the user; the target word unit comprises: the word recognition method comprises the following steps of (1) target words and modifier words corresponding to the target words, wherein the part of speech of the target words is a preset part of speech;

and determining a corresponding text segment and position information of the text segment in the search text aiming at a target word unit.

A7, according to the method of A6, the determining the corresponding text segment for a target word unit includes:

replacing a modifier in a target word unit by adopting a preset character string, and obtaining a text segment according to the replaced target word unit; or

A target word unit is taken as a text fragment.

A8, according to the method of A6, the determining the position information of the text segment in the search text for a target word unit includes:

and determining the position information of the text segment corresponding to the target word unit according to the position information of the target word or the modifier in the search text in the target word unit.

A9, according to the method of A1 or A2, before the determining a text snippet sequence corresponding to a user's search text, the method further comprising:

judging whether the search text of the user corresponds to a plurality of semantic levels, if so, determining a text fragment sequence corresponding to the search text of the user; or

Judging whether the search text of the user comprises a plurality of target word units, and if so, determining a text fragment sequence corresponding to the search text of the user.

The embodiment of the invention discloses B10 and a data processing device, which comprises:

B11, the apparatus of B10, the apparatus further comprising:

the query result entity word determining module comprises:

B12, the apparatus of B10 or B11, the (i +1) th text passage comprising: presetting a character string and a target word;

the fusion module includes:

B13, the apparatus of B10 or B11, the search result determination module comprising:

B14, the apparatus of B10 or B11, the text snippet sequence determination module comprising:

B15, the apparatus of B10 or B11, the text snippet sequence determination module comprising:

the target word unit determining module is used for determining at least one target word unit in the search text of the user; the target word unit comprises: the word recognition method comprises the following steps of (1) target words and modifier words corresponding to the target words, wherein the part of speech of the target words is a preset part of speech;

B16, the apparatus of B15, the segment position determination module comprising:

B17, the apparatus of B15, the segment position determination module comprising:

B18, the apparatus of B10 or B11, the apparatus further comprising:

The second judging module is used for judging whether the search text of the user comprises a plurality of target word units or not before the text segment sequence determining module determines the text segment sequence corresponding to the search text of the user, and if so, triggering the text segment sequence determining module to determine the text segment sequence corresponding to the search text of the user.

The embodiment of the invention discloses C19, an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors comprise instructions for:

C20, the device of C19, the device also configured to execute the one or more programs by one or more processors including instructions for:

C21, the apparatus of C19 or C20, the (i +1) th text passage comprising: presetting a character string and a target word;

C22, the obtaining a search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence according to the apparatus of C19 or C20, including:

C23, the apparatus according to C19 or C20, the determining a text fragment sequence corresponding to the user's search text, comprising:

C24, the apparatus according to C19 or C20, the determining a text fragment sequence corresponding to the user's search text, comprising:

C25, the apparatus according to C24, the determining the corresponding text segment for a target word unit includes:

A target word unit is taken as a text fragment.

C26, the method according to C24, the determining the position information of the text segment in the search text for a target word unit includes:

C27, the device of C19 or C20, the device also configured to execute the one or more programs by one or more processors including instructions for:

before determining the text fragment sequence corresponding to the search text of the user, judging whether the search text of the user corresponds to a plurality of semantic levels, if so, determining the text fragment sequence corresponding to the search text of the user; or

Before determining the text segment sequence corresponding to the search text of the user, judging whether the search text of the user comprises a plurality of target word units, and if so, determining the text segment sequence corresponding to the search text of the user.

Embodiments of the present invention disclose D28, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of a 1-a 9.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The data processing method, the data processing apparatus and the apparatus for data processing provided by the present invention are described in detail above, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the description of the above embodiments is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of data processing, the method comprising:

2. The method of claim 1, wherein the query result entity word corresponding to a text segment is determined by:

3. The method according to claim 1 or 2, wherein the (i +1) th text segment includes: presetting a character string and a target word;

4. The method according to claim 1 or 2, wherein obtaining the search result corresponding to the search text according to the query result entity word corresponding to the last text segment in the text segment sequence comprises:

5. The method according to claim 1 or 2, wherein the determining a text segment sequence corresponding to the search text of the user comprises:

6. The method according to claim 1 or 2, wherein the determining a text segment sequence corresponding to the search text of the user comprises:

7. The method of claim 6, wherein the determining the corresponding text segment for a target word unit comprises:

A target word unit is taken as a text fragment.

8. A data processing apparatus, comprising:

9. An apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:

10. A machine-readable medium having stored thereon instructions which, when executed by one or more processors, cause an apparatus to perform a data processing method as claimed in one or more of claims 1 to 7.