WO2014114175A1 - 一种提供搜索引擎标签的方法和装置 - Google Patents

一种提供搜索引擎标签的方法和装置 Download PDF

Info

Publication number
WO2014114175A1
WO2014114175A1 PCT/CN2013/091105 CN2013091105W WO2014114175A1 WO 2014114175 A1 WO2014114175 A1 WO 2014114175A1 CN 2013091105 W CN2013091105 W CN 2013091105W WO 2014114175 A1 WO2014114175 A1 WO 2014114175A1
Authority
WO
WIPO (PCT)
Prior art keywords
words
attribute
dependency
sentence
word
Prior art date
Application number
PCT/CN2013/091105
Other languages
English (en)
French (fr)
Inventor
沈玮
刘尚堃
Original Assignee
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东世纪贸易有限公司 filed Critical 北京京东世纪贸易有限公司
Priority to EP13872347.3A priority Critical patent/EP2950223A4/en
Priority to SG11201505727PA priority patent/SG11201505727PA/en
Priority to MYPI2015702412A priority patent/MY194297A/en
Publication of WO2014114175A1 publication Critical patent/WO2014114175A1/zh
Priority to US14/808,215 priority patent/US20150331953A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries

Definitions

  • the present invention relates to a method and apparatus for providing a search engine tag. Background technique
  • the main technical route is to automatically identify the viewpoint information from the review text and analyze the viewpoint, to obtain the user's evaluation of each attribute characteristic of the product, and then to associate the extracted evaluation with the product to form a search engine label.
  • the existing search engine technology can be used to provide the user with a search service including evaluation data, the search engine tag can express the subjective intention of the user, and thus the search engine tag can support the subjective intention to the user. Search service.
  • One method in the prior art for obtaining the above-mentioned search engine label is to first recognize the opinion words in the comment text according to the semantic dictionary, for example: good, great, good, etc., and then obtain a viewpoint by intercepting the context of the viewpoint word.
  • Attribute word Adjectives ", "discriminating words” are a new class of words that are separated from nouns, verbs, and adjectives in traditional grammar. Attribute words only represent the attributes or characteristics of people, things, and have the function of distinction or classification. Can only be used as an attributive, can not be a predicate.
  • the extraction of opinion words depends on the dictionary, and it is difficult to extract the opinion words for the words not included in the dictionary, thus providing a limited range of labels.
  • the attribute words are extracted. Previously, it was necessary to intercept the text based on the viewpoint words, and the processing method was inefficient.
  • the present invention provides a method and apparatus for providing a search engine tag that is capable of providing search engine tags over a wider range and having a higher processing speed.
  • a method of providing a search engine tag includes: extracting one or more attribute words in a sentence; performing dependency analysis on the sentence, and obtaining a dependency relationship from the attribute word to the word containing the word for each attribute word a path; extracting, according to the dependency relationship path, a view word corresponding to each of the attribute words in the sentence; and using the attribute word and the view word to form a search engine tag.
  • the step of extracting one or more attribute words in the sentence further comprises: filtering the text data according to a preset rule; and acquiring the sentence from the text data.
  • the step of obtaining a sentence from the text data comprises: performing clause division on the text data by punctuation to obtain a short clause; and obtaining the short clause as the sentence.
  • performing a dependency analysis on the sentence, and obtaining a step for the attribute path from the attribute word to the dependency path containing the opinion word for each attribute word includes: The child performs dependency analysis to obtain a series of dependencies of the sentence; according to the attribute words and the series of dependencies, the attribute words are extracted from the attribute words through the series of dependencies to the viewpoint words Dependency; traversing the dependency relationship containing the opinion words to derive the dependency path.
  • the step of extracting the respective opinion words corresponding to each of the attribute words in the sentence according to the dependency relationship path comprises: selecting, from the dependency relationship path, a dependency path having a higher frequency of occurrence;
  • the dependency relationship path is derived from the dependency relationship rule; and the opinion words corresponding to the respective attribute words in the sentence are extracted according to the dependency relationship rule.
  • the method further comprises: combining the plurality of tags including the synonymous opinion words into one tag according to the synonym table.
  • the apparatus for providing a search engine tag includes: an attribute word extraction module, configured to extract one or more attribute words in a sentence; a dependency relationship analysis module, configured to perform dependency analysis on the sentence, and obtain a specific attribute for the sentence a word path from the attribute word to the dependency word containing the opinion word; a point of view word extraction module, configured to extract, according to the dependency relationship path, a view word corresponding to each of the attribute words in the sentence; a search engine tag module, Used to compose a search engine tag with the attribute word and the opinion word.
  • a preprocessing module is further included, configured to filter the text data according to a preset rule, and then obtain a sentence from the text data.
  • the pre-processing module is further configured to perform clause division on the text data by punctuation to obtain a short clause, and then obtain the short clause as the sentence.
  • the dependency analysis module is further configured to: depend on the sentence The system analyzes a series of dependencies of the sentence; according to the attribute words and the series of dependencies, derives a dependency relationship for each attribute word from the attribute word through the series of dependencies to the word containing the word Traversing the dependency relationship containing the opinion words to derive the dependency path.
  • the opinion word extraction module is further configured to: select a dependency relationship path with a higher frequency of occurrence from the dependency relationship path; and obtain a dependency relationship rule according to the selected dependency relationship path; according to the dependency relationship rule Extracting the viewpoint words corresponding to the respective attribute words in the sentence.
  • a normalization module is further included for combining the plurality of labels including the synonymous opinion words into one label according to the synonyms table.
  • the attribute words are mined and the corresponding opinion words are mined according to the dependency relationship, and the mined attribute words can also be filtered without the corresponding viewpoint words.
  • the technical solution of the embodiment does not depend on the dictionary, so it helps to provide search engine tags in a wider range, and does not need to perform context interception on the sentences, which helps to improve the processing speed.
  • FIG. 1 is a schematic diagram of a method of providing a search engine tag according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram showing a basic structure of an apparatus for providing a search engine tag according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a method for providing a search engine tag according to an embodiment of the present invention. As shown in FIG. 1, the method mainly includes steps S11 to S14. Step S11: Extract one or more attribute words in the sentence.
  • a compound form such as noun (NN), verb (NN), and noun + verb (NN+VV) in the comment sentence can be extracted as a candidate attribute word by using part-of-speech pattern matching.
  • the sentence here is obtained from the text data.
  • the text data can be filtered according to the preset rules, and then the text data is divided into clauses by the punctuation to obtain a short clause, and the short clause is used as the sentence in this step. .
  • the above filtering is to pre-process the original comments crawled on the website, and filter the marketing advertisements, stop words and default comments in the comment data according to the rules. Insignificant statements, then remove a large number of repeated fields or statements in the same comment.
  • Step S12 Perform a dependency analysis on the sentence in step S11, and obtain a dependency path for each attribute word from the attribute word to the point of view word.
  • the specificity may be that the dependency analysis of the sentence is performed to obtain a series of dependencies of the sentence, and then according to the attribute word and the series of dependencies, the attribute word is derived from the attribute word.
  • a series of dependencies relate to the dependencies containing the opinion words, and finally traverse the dependencies containing the opinion words to arrive at the dependency path. It can be seen that multiple transfer dependencies are used in this step to form a dependency path, which helps to dig out the viewpoint words in depth.
  • Step S13 Extract the opinion words corresponding to the respective attribute words in the sentence according to the dependency relationship path in step S12.
  • the attribute word is deleted from the attribute word set obtained in step S11.
  • the dependency relationship path with a higher frequency of occurrence may be selected from the above dependent relationship path, and then the dependency relationship rule is obtained according to the selected dependency relationship path, and then the corresponding attribute words in the sentence are extracted according to the dependency relationship rule.
  • Step S14: The search engine tag is composed of the attribute word and the opinion word.
  • the attribute word here is the attribute word set after step S13.
  • a combination of the synonyms of the opinion words in the search engine tag can be combined, that is, the multiple tags containing the synonymous opinion words are combined into one tag according to the synonym table.
  • Step S15 The search engine tag in step S14 is output.
  • the search engine tag is presented in the human-machine interface of the terminal device used by the user, for example, on a webpage, when the user clicks on the search engine tag, the search engine tag is provided to the search engine to start the search, so that the user can enable the user to Filter the products according to the various attribute words displayed on the page.
  • the apparatus 20 for providing search engine tags basically includes an attribute word extraction module 21, a dependency relationship analysis module 22, a viewpoint word extraction module 23, and a search engine tag module 24.
  • the attribute word extraction module 21 is configured to extract one or more attribute words in the sentence.
  • the dependency analysis module 22 performs a dependency analysis on the sentence, and derives a dependency path for each attribute word from the attribute word to the point of view word.
  • the opinion word extraction module 23 extracts the viewpoint words respectively corresponding to the respective attribute words in the sentence according to the dependency relationship path.
  • the search engine tag module 24 composes the search engine tags with the attribute words and the opinion words.
  • the apparatus 20 for providing a search engine tag may further include a pre-processing module (not shown) for filtering the text data according to a preset rule and then acquiring a sentence from the text data.
  • the preprocessing module can also be used to divide clauses of text data by punctuation A short clause is obtained, and then the short clause is obtained as the sentence.
  • the apparatus 20 for providing search engine tags may further include a normalization module (not shown) for combining a plurality of tags including synonymous opinion words into one tag according to the synonyms table.
  • the dependency analysis module 22 can also be configured to perform dependency analysis on the sentence to obtain a series of dependencies of the sentence; and according to the attribute word and the series of dependencies, obtain the attribute word for each attribute word Passing the series of dependencies to the dependency relationship containing the opinion words; traversing the dependency relationship containing the opinion words to derive the dependency relationship path.
  • the opinion word extraction module 23 is further configured to select a dependency relationship path with a higher frequency of occurrence from the dependency relationship path; obtain a dependency relationship rule according to the selected dependency relationship path; and extract the sentence according to the dependency relationship rule The viewpoint words corresponding to each attribute word.
  • the attribute words are mined and the corresponding opinion words are mined according to the dependency relationship, and the mined attribute words can also be filtered without the corresponding viewpoint words.
  • the technical solution of the embodiment does not depend on the dictionary, so it helps to provide search engine tags in a wider range, and does not need to perform context interception on the sentences, which helps to improve the processing speed.
  • the object of the invention is also to run a program on any computing device Or a set of programs to achieve.
  • the computing device can be a well-known general purpose device. Accordingly, the object of the present invention can also be achieved by merely providing a program product comprising program code for implementing the method or apparatus. That is to say, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention.
  • the storage medium may be any well-known storage medium or any storage medium developed in the future. It should also be noted that in the apparatus and method of the present invention, it will be apparent that various components or steps may be decomposed and/or recombined. These decompositions and/or recombinations should be considered as equivalents to the invention. Also, the steps of performing the above-described series of processing may naturally be performed in chronological order in the order illustrated, but need not necessarily be performed in chronological order. Certain steps may be performed in parallel or independently of one another. The above specific embodiments do not constitute a limitation of the scope of the present invention. Those skilled in the art will appreciate that a wide variety of modifications, combinations, sub-combinations and substitutions can occur depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种提供搜索引擎标签的方法和装置,能够在更广的范围内提供搜索引擎标签并且有较高的处理速度。该方法包括:提取句子中的一个或多个属性词;对所述句子进行依存关系分析,得出针对各个属性词的从含有该属性词到含有观点词的依存关系路径;根据所述依存关系路径提取所述句子中各个所述属性词分别对应的观点词;用所述属性词和所述观点词组成搜索引擎标签。

Description

一种提供搜索引擎标签的方法和装置 技术领域
本发明涉及一种提供搜索引擎标签的方法和装置。 背景技术
目前, 用户在电子商务网站上搜索商品时只能根据商品的客观属 性来搜索筛选, 例如颜色、 尺码等, 而对于一些带有主观色彩的搜索, 例如搜索词为 "性价比好的相机" , 通常都是无结果。 对于主观语义 搜索, 当前用户一般需要通过通用搜索引擎先查询到一些商品型号再 去电商网站上搜索具体商品。 这无疑会增加用户操作, 并且分析可知 通用搜索引擎给出的搜索结果大多基于用户在 BBS等网站上的评价。 电子商务网站本身就拥有相当丰富的用户评论数据, 因此现有技 术中也基于电子商务网站中的用户评论数据来获取搜索引擎的标签。 其中的主要技术路线是从评论文本中自动识别观点信息并分析观点, 用以获得用户对商品的各个属性特征的评价, 然后将挖掘出的评价和 商品关联形成搜索引擎标签。 在得到搜索引擎标签之后即可以使用已 有的搜索引擎技术向用户提供包含评价数据的搜索服务, 该搜索引擎 标签能够表达用户的主观意图, 于是采用这种搜索引擎标签能够支持 向用户提供主观意图搜索服务。 现有技术中的一种得到上述搜索引擎标签的方法是, 先根据语义 词典识别出评论文本中的观点词, 例如: 好、 棒、 不错等, 然后通过 对观点词上下文的截取得到一个包含观点词的具有合适长度并且语义 相对完整的短句, 再利用语义分析工具例如斯坦福大学分析器分析该 短句, 得到一系列依存关系, 最后对这些依存关系进行分析, 抽取出 观点词所修饰的对象即属性词, 例如性价比、 外观等。 属性词也称"非 谓形容词"、 "区别词", 是从传统语法中名词、 动词、 形容词中脱离出 来的一类新兴词类。 属性词只表示人、 事物的属性或特征, 具有区别或 分类的作用。 属性词一般只能作定语, 不能作谓语。 上述方法中, 对观点词的提取依赖于词典, 对于词典未收录的词 则难以提取观点词, 从而提供标签的范围有限。 另外上述方法中, 在 提取属性词之前需要对文本基于观点词作上下文截取, 处理方式效率 较低。 发明内容
有鉴于此, 本发明提供一种提供搜索引擎标签的方法和装置, 能 够在更广的范围内提供搜索引擎标签并且有较高的处理速度。 为实现上述目的, 根据本发明的一个方面, 提供了一种提供搜索 引擎标签的方法。 本发明的提供搜索引擎标签的方法包括: 提取句子中的一个或多 个属性词; 对所述句子进行依存关系分析, 得出针对各个属性词的从 含有该属性词到含有观点词的依存关系路径; 根据所述依存关系路径 提取所述句子中各个所述属性词分别对应的观点词; 用所述属性词和 所述观点词组成搜索引擎标签。 可选地,所述提取句子中的一个或多个属性词的步骤之前还包括: 按预设的规则对文本数据进行过滤; 从所述文本数据中获取句子。 可选地, 从所述文本数据中获取句子的步骤包括: 对文本数据按 标点符号进行子句划分得到短子句; 获取所述短子句作为所述句子。 可选地, 对所述句子进行依存关系分析, 得出针对各个属性词的 从含有该属性词到含有观点词的依存关系路径的步骤包括: 对所述句 子进行依存关系分析得出该句子的一系列依存关系; 根据所述属性词 和所述一系列依存关系得出针对各个属性词的从含有该属性词经由所 述一系列依存关系到含有观点词的依存关系; 遍历所述含有观点词的 依存关系从而得出所述依存关系路径。 可选地, 根据所述依存关系路径提取所述句子中各个所述属性词 分别对应的观点词的步骤包括: 从所述依存关系路径中选择出现频率 较高的依存关系路径; 根据选择出的依存关系路径得出依存关系规则; 根据所述依存关系规则提取所述句子中各个属性词对应的观点词。 可选地, 用所述属性词和所述观点词组成搜索引擎标签的步骤之 后, 还包括: 根据同义词表, 将包含同义的观点词的多个标签合并成 一个标签。 根据本发明的另一方面, 提供了一种提供搜索引擎标签的装置。 本发明的提供搜索引擎标签的装置包括: 属性词提取模块, 用于 提取句子中的一个或多个属性词; 依存关系分析模块, 用于对所述句 子进行依存关系分析, 得出针对各个属性词的从含有该属性词到含有 观点词的依存关系路径; 观点词提取模块, 用于根据所述依存关系路 径提取所述句子中各个所述属性词分别对应的观点词; 搜索引擎标签 模块, 用于用所述属性词和所述观点词组成搜索引擎标签。 可选地, 还包括预处理模块, 用于按预设的规则对文本数据进行 过滤, 然后从所述文本数据中获取句子。 可选地, 所述预处理模块还用于对文本数据按标点符号进行子句 划分得到短子句, 然后获取所述短子句作为所述句子。 可选地, 所述依存关系分析模块还用于: 对所述句子进行依存关 系分析得出该句子的一系列依存关系; 根据所述属性词和所述一系列 依存关系得出针对各个属性词的从含有该属性词经由所述一系列依存 关系到含有观点词的依存关系; 遍历所述含有观点词的依存关系从而 得出所述依存关系路径。 可选地, 所述观点词提取模块还用于: 从所述依存关系路径中选 择出现频率较高的依存关系路径; 根据选择出的依存关系路径得出依 存关系规则; 根据所述依存关系规则提取所述句子中各个属性词对应 的观点词。 可选地, 还包括归一化模块, 用于根据同义词表, 将包含同义的 观点词的多个标签合并成一个标签。 根据本发明的技术方案, 挖掘出属性词并根据依存关系挖掘对应 的观点词, 同时也可以在没有对应的观点词的情况下过滤挖掘的属性 词。 本实施例的技术方案不依赖于词典, 所以有助于在更广的范围内 提供搜索引擎标签, 而且不需要对语句做上下文截取, 有助于提高处 理速度。 附图说明
附图用于更好地理解本发明, 不构成对本发明的不当限定。 其中: 图 1是根据本发明实施例的提供搜索引擎标签的方法的示意图; 图 2是根据本发明实施例的提供搜索引擎标签的装置的基本结构 的示意图。 具体实施方式
以下结合附图对本发明的示范性实施例做出说明, 其中包括本发 明实施例的各种细节以助于理解, 应当将它们认为仅仅是示范性的。 因此, 本领域普通技术人员应当认识到, 可以对这里描述的实施例做 出各种改变和修改, 而不会背离本发明的范围和精神。 同样, 为了清 楚和简明, 以下的描述中省略了对公知功能和结构的描述。 图 1 是根据本发明实施例的提供搜索引擎标签的方法的示意图, 如图 1所示, 该方法主要包括步骤 S11至步骤 S14。 步骤 S11 : 提取句子中的一个或多个属性词。 可以采用词性模式 匹配的方式提取评论语句中的名词 (NN) 、 动词 (NN) 及名词 +动词 (NN+VV) 等复合形式作为候选属性词。 这里的句子是从文本数据中 获取, 可以先按预设的规则对文本数据进行过滤, 然后对文本数据按 标点符号进行子句划分得到短子句, 将该短子句作为本步骤中的句子。 以上述的文本数据是电子商务网站的商品评论信息为例, 上述的过滤 即为对网站上抓取的原始评论进行预处理, 按规则过滤掉评论数据中 营销广告、 停用词及默认评论等无意义语句, 再去除同条评论中大量 重复的字段或语句。 步骤 S12: 对步骤 S11 中的句子进行依存关系分析, 得出针对各 个属性词的从含有该属性词到含有观点词的依存关系路径。 在本步骤 中, 具体可以是先对上述句子进行依存关系分析得出该句子的一系列 依存关系, 然后根据属性词和这一系列依存关系得出针对各个属性词 的从含有该属性词经由这一系列依存关系到含有观点词的依存关系, 最后遍历该含有观点词的依存关系从而得出依存关系路径。 可以看出 本步骤中采用了多个传递依存关系来形成依存关系路径, 有助于深入 地挖掘出观点词。 步骤 S13 : 根据步骤 S12 中的依存关系路径提取句子中各个属性 词分别对应的观点词。 如果针对某个属性词没有提取到观点词, 则将 该属性词从步骤 S11 中获得的属性词集合中删除。 本步骤中, 具体可 以先从上述依存关系路径中选择出现频率较高的依存关系路径, 然后 根据选择出的依存关系路径得出依存关系规则, 再根据该依存关系规 则提取句子中各个属性词对应的观点词。 步骤 S 14 : 用属性词和观点词组成搜索引擎标签。 这里的属性词 是步骤 S13 之后的属性词集合。 在本步骤之后, 可以按搜索引擎标签 中的观点词的同义词作一个合并, 即根据同义词表, 将包含同义的观 点词的多个标签合并成一个标签。 例如, 将 "性价比好" 、 "性价比 高" 、 和 "性价比无敌" 合并成 "性价比高" 。 标签可以给商品建索引供用户搜索。 但在有些情况下用户自己输 入的搜索词可能不是按图 1 所示的步骤得出的属性词, 因此可以继续 执行步骤 S15。 步骤 S15 : 输出步骤 S14 中的搜索引擎标签。 根据本步骤, 搜索 引擎标签被呈现在用户使用的终端设备的人机界面中, 例如网页上, 用户点击这种搜索引擎标签时就将该搜索引擎标签提供给搜索引擎从 而启动搜索, 使用户能够根据页面中展示的各种属性词来实现商品的 筛选。 图 2是根据本发明实施例的提供搜索引擎标签的装置的基本结构 的示意图。 如图 2所示, 提供搜索引擎标签的装置 20基本地包括属性 词提取模块 21、 依存关系分析模块 22、 观点词提取模块 23、 和搜索引 擎标签模块 24。属性词提取模块 21用于提取句子中的一个或多个属性 词。 依存关系分析模块 22对所述句子进行依存关系分析, 得出针对各 个属性词的从含有该属性词到含有观点词的依存关系路径。 观点词提 取模块 23根据所述依存关系路径提取所述句子中各个所述属性词分别 对应的观点词。 搜索引擎标签模块 24用所述属性词和所述观点词组成 搜索引擎标签。 提供搜索引擎标签的装置 20 还可以包括预处理模块 (图中未示 出) , 用于按预设的规则对文本数据进行过滤, 然后从该文本数据中 获取句子。 预处理模块还可用于对文本数据按标点符号进行子句划分 得到短子句, 然后获取该短子句作为所述的句子。 提供搜索引擎标签的装置 20 还可以包括归一化模块 (图中未示 出) , 用于根据同义词表, 将包含同义的观点词的多个标签合并成一 个标签。 依存关系分析模块 22 还可用于对所述句子进行依存关系分析得 出该句子的一系列依存关系; 根据所述属性词和所述一系列依存关系 得出针对各个属性词的从含有该属性词经由所述一系列依存关系到含 有观点词的依存关系; 遍历所述含有观点词的依存关系从而得出所述 依存关系路径。 观点词提取模块 23 还可用于从所述依存关系路径中选择出现频 率较高的依存关系路径; 根据选择出的依存关系路径得出依存关系规 贝 lj ; 根据所述依存关系规则提取所述句子中各个属性词对应的观点词。 根据本发明实施例的技术方案, 挖掘出属性词并根据依存关系挖 掘对应的观点词, 同时也可以在没有对应的观点词的情况下过滤挖掘 的属性词。 本实施例的技术方案不依赖于词典, 所以有助于在更广的 范围内提供搜索引擎标签, 而且不需要对语句做上下文截取, 有助于 提高处理速度。 以上结合具体实施例描述了本发明的基本原理, 但是, 需要指出 的是, 对本领域的普通技术人员而言, 能够理解本发明的方法和设备 的全部或者任何步骤或者部件, 可以在任何计算装置 (包括处理器、 存储介质等) 或者计算装置的网络中, 以硬件、 固件、 软件或者它们 的组合加以实现, 这是本领域普通技术人员在阅读了本发明的说明的 情况下运用他们的基本编程技能就能实现的。 因此, 本发明的目的还可以通过在任何计算装置上运行一个程序 或者一组程序来实现。 所述计算装置可以是公知的通用装置。 因此, 本发明的目的也可以仅仅通过提供包含实现所述方法或者装置的程序 代码的程序产品来实现。 也就是说, 这样的程序产品也构成本发明, 并且存储有这样的程序产品的存储介质也构成本发明。 显然, 所述存 储介质可以是任何公知的存储介质或者将来开发出的任何存储介质。 还需要指出的是, 在本发明的装置和方法中, 显然, 各部件或各 步骤是可以分解和 /或重新组合的。 这些分解和 /或重新组合应视为本发 明的等效方案。 并且, 执行上述系列处理的步骤可以自然地按照说明 的顺序按时间顺序执行, 但是并不需要一定按照时间顺序执行。 某些 步骤可以并行或彼此独立地执行。 上述具体实施方式, 并不构成对本发明保护范围的限制。 本领域 技术人员应该明白的是, 取决于设计要求和其他因素, 可以发生各种 各样的修改、 组合、 子组合和替代。 任何在本发明的精神和原则之内 所作的修改、 等同替换和改进等, 均应包含在本发明保护范围之内。

Claims

权 利 要 求 书
1. 一种提供搜索引擎标签的方法, 其特征在于, 包括: 提取句子中的一个或多个属性词;
对所述句子进行依存关系分析, 得出针对各个属性词的从含有该 属性词到含有观点词的依存关系路径;
根据所述依存关系路径提取所述句子中各个所述属性词分别对应 的观点词;
用所述属性词和所述观点词组成搜索引擎标签。
2. 根据权利要求 1所述的方法, 其特征在于,
所述提取句子中的一个或多个属性词的步骤之前还包括: 按预设 的规则对文本数据进行过滤;
从所述文本数据中获取句子。
3. 根据权利要求 2所述的方法, 其特征在于, 从所述文本数据中 获取句子的步骤包括:
对文本数据按标点符号进行子句划分得到短子句;
获取所述短子句作为所述句子。
4. 根据权利要求 1所述的方法, 其特征在于, 对所述句子进行依 存关系分析, 得出针对各个属性词的从含有该属性词到含有观点词的 依存关系路径的步骤包括:
对所述句子进行依存关系分析得出该句子的一系列依存关系; 根据所述属性词和所述一系列依存关系得出针对各个属性词的从 含有该属性词经由所述一系列依存关系到含有观点词的依存关系; 遍历所述含有观点词的依存关系从而得出所述依存关系路径。
5. 根据权利要求 1或 4所述的方法, 其特征在于, 根据所述依存 关系路径提取所述句子中各个所述属性词分别对应的观点词的步骤包 从所述依存关系路径中选择出现频率较高的依存关系路径; 根据选择出的依存关系路径得出依存关系规则;
根据所述依存关系规则提取所述句子中各个属性词对应的观点
6. 根据权利要求 1至 4中任一项所述的方法, 其特征在于, 用所 述属性词和所述观点词组成搜索引擎标签的步骤之后, 还包括: 根据 同义词表, 将包含同义的观点词的多个标签合并成一个标签。
7. 一种提供搜索引擎标签的装置, 其特征在于, 包括:
属性词提取模块, 用于提取句子中的一个或多个属性词; 依存关系分析模块, 用于对所述句子进行依存关系分析, 得出针 对各个属性词的从含有该属性词到含有观点词的依存关系路径;
观点词提取模块, 用于根据所述依存关系路径提取所述句子中各 个所述属性词分别对应的观点词;
搜索引擎标签模块, 用于用所述属性词和所述观点词组成搜索引 擎标签。
8. 根据权利要求 7所述的装置,其特征在于,还包括预处理模块, 用于按预设的规则对文本数据进行过滤, 然后从所述文本数据中获取 句子。
9. 根据权利要求 8所述的装置, 其特征在于, 所述预处理模块还 用于对文本数据按标点符号进行子句划分得到短子句, 然后获取所述 短子句作为所述句子。
10. 根据权利要求 7 所述的装置, 其特征在于, 所述依存关系分 析模块还用于:
对所述句子进行依存关系分析得出该句子的一系列依存关系; 根据所述属性词和所述一系列依存关系得出针对各个属性词的从 含有该属性词经由所述一系列依存关系到含有观点词的依存关系; 遍历所述含有观点词的依存关系从而得出所述依存关系路径。
1 1. 根据权利要求 7或 10所述的装置, 其特征在于, 所述观点词 提取模块还用于:
从所述依存关系路径中选择出现频率较高的依存关系路径; 根据选择出的依存关系路径得出依存关系规则;
根据所述依存关系规则提取所述句子中各个属性词对应的观点 词。
12. 根据权利要求 7至 10中任一项所述的装置, 还包括归一化模 块, 用于根据同义词表, 将包含同义的观点词的多个标签合并成一个 标签。
PCT/CN2013/091105 2013-01-24 2013-12-31 一种提供搜索引擎标签的方法和装置 WO2014114175A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP13872347.3A EP2950223A4 (en) 2013-01-24 2013-12-31 METHOD AND DEVICE FOR PROVIDING SEARCH RESULT DAYS
SG11201505727PA SG11201505727PA (en) 2013-01-24 2013-12-31 Method and apparatus for providing search engine tags
MYPI2015702412A MY194297A (en) 2013-01-24 2013-12-31 A method and device for providing search engine label
US14/808,215 US20150331953A1 (en) 2013-01-24 2015-07-24 Method and device for providing search engine label

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2013100273112A CN103150331A (zh) 2013-01-24 2013-01-24 一种提供搜索引擎标签的方法和装置
CN201310027311.2 2013-01-24

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/808,215 Continuation US20150331953A1 (en) 2013-01-24 2015-07-24 Method and device for providing search engine label

Publications (1)

Publication Number Publication Date
WO2014114175A1 true WO2014114175A1 (zh) 2014-07-31

Family

ID=48548409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/091105 WO2014114175A1 (zh) 2013-01-24 2013-12-31 一种提供搜索引擎标签的方法和装置

Country Status (6)

Country Link
US (1) US20150331953A1 (zh)
EP (1) EP2950223A4 (zh)
CN (1) CN103150331A (zh)
MY (1) MY194297A (zh)
SG (1) SG11201505727PA (zh)
WO (1) WO2014114175A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536778A (zh) * 2020-04-14 2021-10-22 北京沃东天骏信息技术有限公司 标题的生成方法、装置和计算机可读存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150331A (zh) * 2013-01-24 2013-06-12 北京京东世纪贸易有限公司 一种提供搜索引擎标签的方法和装置
CN105183847A (zh) * 2015-09-07 2015-12-23 北京京东尚科信息技术有限公司 网络评论数据的特征信息采集方法和装置
US10642912B2 (en) * 2016-08-17 2020-05-05 Adobe Inc. Control of document similarity determinations by respective nodes of a plurality of computing devices
CN109726384B (zh) * 2017-10-31 2023-08-25 北京国双科技有限公司 评价关系的生成方法及相关装置
CN108153856B (zh) * 2017-12-22 2022-09-06 北京百度网讯科技有限公司 用于输出信息的方法和装置
CN108399158B (zh) * 2018-02-05 2021-05-14 华南理工大学 基于依存树和注意力机制的属性情感分类方法
CN109710852A (zh) * 2018-12-27 2019-05-03 丹翰智能科技(上海)有限公司 一种用于确定财经信息的标签信息的方法与设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279894A (zh) * 2011-09-19 2011-12-14 嘉兴亿言堂信息科技有限公司 基于语义的查找、集成和提供评论信息的方法及搜索系统
CN102436496A (zh) * 2011-11-14 2012-05-02 百度在线网络技术(北京)有限公司 一种提供个性化搜索标签的方法及其装置
CN102737013A (zh) * 2011-04-02 2012-10-17 三星电子(中国)研发中心 基于依存关系来识别语句情感的设备和方法
CN103150331A (zh) * 2013-01-24 2013-06-12 北京京东世纪贸易有限公司 一种提供搜索引擎标签的方法和装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050086219A1 (en) * 2003-03-25 2005-04-21 Claria Corporation Generation of keywords for searching in a computer network
US7930302B2 (en) * 2006-11-22 2011-04-19 Intuit Inc. Method and system for analyzing user-generated content
CN101996208B (zh) * 2009-08-31 2014-04-02 国际商业机器公司 用于数据库语义查询回答的方法及系统
US8874432B2 (en) * 2010-04-28 2014-10-28 Nec Laboratories America, Inc. Systems and methods for semi-supervised relationship extraction
JP5816936B2 (ja) * 2010-09-24 2015-11-18 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation 質問に対する解答を自動的に生成するための方法、システム、およびコンピュータ・プログラム
US9037452B2 (en) * 2012-03-16 2015-05-19 Afrl/Rij Relation topic construction and its application in semantic relation extraction
US20140067370A1 (en) * 2012-08-31 2014-03-06 Xerox Corporation Learning opinion-related patterns for contextual and domain-dependent opinion detection
US20140136503A1 (en) * 2012-11-09 2014-05-15 International Business Machines Corporation Personalized search result re-rank based on relationship bond strength alteration among different keywords

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737013A (zh) * 2011-04-02 2012-10-17 三星电子(中国)研发中心 基于依存关系来识别语句情感的设备和方法
CN102279894A (zh) * 2011-09-19 2011-12-14 嘉兴亿言堂信息科技有限公司 基于语义的查找、集成和提供评论信息的方法及搜索系统
CN102436496A (zh) * 2011-11-14 2012-05-02 百度在线网络技术(北京)有限公司 一种提供个性化搜索标签的方法及其装置
CN103150331A (zh) * 2013-01-24 2013-06-12 北京京东世纪贸易有限公司 一种提供搜索引擎标签的方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2950223A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536778A (zh) * 2020-04-14 2021-10-22 北京沃东天骏信息技术有限公司 标题的生成方法、装置和计算机可读存储介质

Also Published As

Publication number Publication date
US20150331953A1 (en) 2015-11-19
CN103150331A (zh) 2013-06-12
MY194297A (en) 2022-11-27
EP2950223A4 (en) 2016-06-01
SG11201505727PA (en) 2015-09-29
EP2950223A1 (en) 2015-12-02

Similar Documents

Publication Publication Date Title
US10019515B2 (en) Attribute-based contexts for sentiment-topic pairs
WO2014114175A1 (zh) 一种提供搜索引擎标签的方法和装置
US10042896B2 (en) Providing search recommendation
WO2017107457A1 (zh) 查询推荐方法及装置
CN111460787A (zh) 一种话题提取方法、装置、终端设备及存储介质
US20160299955A1 (en) Text mining system and tool
US9818080B2 (en) Categorizing a use scenario of a product
CN107544988B (zh) 一种获取舆情数据的方法和装置
US20110047138A1 (en) Method and Apparatus for Identifying Synonyms and Using Synonyms to Search
US10740406B2 (en) Matching of an input document to documents in a document collection
WO2018026489A1 (en) Surfacing unique facts for entities
CN110852095A (zh) 语句热点提取方法及系统
US20220365956A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
Tuarob et al. A product feature inference model for mining implicit customer preferences within large scale social media networks
JP2014219872A (ja) 発話選択装置、方法、及びプログラム、対話装置及び方法
WO2016067334A1 (ja) 文書検索システム、ディベートシステム、文書検索方法
Pasarate et al. Comparative study of feature extraction techniques used in sentiment analysis
US11841883B2 (en) Resolving queries using structured and unstructured data
CN110705285B (zh) 一种政务文本主题词库构建方法、装置、服务器及可读存储介质
Saravanan et al. Extraction of Core Web Content from Web Pages using Noise Elimination.
US11507593B2 (en) System and method for generating queryeable structured document from an unstructured document using machine learning
Quarteroni et al. Evaluating Multi-focus Natural Language Queries over Data Services.
TWI534640B (zh) Chinese network information monitoring and analysis system and its method
Priyanka et al. Recommender System Using Machine Learning
Santosh et al. Automatic machine recognition of features and sentiments from online reviews

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13872347

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: IDP00201504670

Country of ref document: ID

WWE Wipo information: entry into national phase

Ref document number: 2013872347

Country of ref document: EP