CN102193929B - 利用词信息熵的搜索方法及其设备 - Google Patents

利用词信息熵的搜索方法及其设备 Download PDF

Info

Publication number
CN102193929B
CN102193929B CN2010101205640A CN201010120564A CN102193929B CN 102193929 B CN102193929 B CN 102193929B CN 2010101205640 A CN2010101205640 A CN 2010101205640A CN 201010120564 A CN201010120564 A CN 201010120564A CN 102193929 B CN102193929 B CN 102193929B
Authority
CN
China
Prior art keywords
word
information entropy
searching request
word information
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010101205640A
Other languages
English (en)
Chinese (zh)
Other versions
CN102193929A (zh
Inventor
金凯民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN2010101205640A priority Critical patent/CN102193929B/zh
Priority to US12/932,643 priority patent/US8566303B2/en
Priority to EP11753707.6A priority patent/EP2545439A4/en
Priority to JP2012557039A priority patent/JP5450842B2/ja
Priority to PCT/US2011/000401 priority patent/WO2011112238A1/en
Publication of CN102193929A publication Critical patent/CN102193929A/zh
Priority to HK12100205.7A priority patent/HK1159813B/xx
Application granted granted Critical
Publication of CN102193929B publication Critical patent/CN102193929B/zh
Priority to US14/024,431 priority patent/US9342627B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Creation or modification of classes or clusters
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
CN2010101205640A 2010-03-08 2010-03-08 利用词信息熵的搜索方法及其设备 Expired - Fee Related CN102193929B (zh)

Priority Applications (7)

Application Number Priority Date Filing Date Title
CN2010101205640A CN102193929B (zh) 2010-03-08 2010-03-08 利用词信息熵的搜索方法及其设备
US12/932,643 US8566303B2 (en) 2010-03-08 2011-03-01 Determining word information entropies
JP2012557039A JP5450842B2 (ja) 2010-03-08 2011-03-02 単語情報エントロピの決定
PCT/US2011/000401 WO2011112238A1 (en) 2010-03-08 2011-03-02 Determining word information entropies
EP11753707.6A EP2545439A4 (en) 2010-03-08 2011-03-02 Determining word information entropies
HK12100205.7A HK1159813B (en) 2012-01-09 Method and apparatus for searching by using word information entropies
US14/024,431 US9342627B2 (en) 2010-03-08 2013-09-11 Determining word information entropies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101205640A CN102193929B (zh) 2010-03-08 2010-03-08 利用词信息熵的搜索方法及其设备

Publications (2)

Publication Number Publication Date
CN102193929A CN102193929A (zh) 2011-09-21
CN102193929B true CN102193929B (zh) 2013-03-13

Family

ID=44532194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101205640A Expired - Fee Related CN102193929B (zh) 2010-03-08 2010-03-08 利用词信息熵的搜索方法及其设备

Country Status (5)

Country Link
US (2) US8566303B2 (enExample)
EP (1) EP2545439A4 (enExample)
JP (1) JP5450842B2 (enExample)
CN (1) CN102193929B (enExample)
WO (1) WO2011112238A1 (enExample)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8938466B2 (en) * 2010-01-15 2015-01-20 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for ranking documents
US9110986B2 (en) * 2011-01-31 2015-08-18 Vexigo, Ltd. System and method for using a combination of semantic and statistical processing of input strings or other data content
CN103116572B (zh) * 2013-02-02 2015-10-21 深圳先进技术研究院 文学作品出品时期识别方法及装置
CN103106192B (zh) * 2013-02-02 2016-02-03 深圳先进技术研究院 文学作品作者识别方法及装置
CN103678274A (zh) * 2013-04-15 2014-03-26 南京邮电大学 一种基于改进互信息和熵的文本分类特征提取方法
US20140350919A1 (en) * 2013-05-27 2014-11-27 Tencent Technology (Shenzhen) Company Limited Method and apparatus for word counting
CN104009970A (zh) * 2013-09-17 2014-08-27 宁波公众信息产业有限公司 一种网络信息采集方法
US10042936B1 (en) * 2014-07-11 2018-08-07 Google Llc Frequency-based content analysis
CN104199832B (zh) * 2014-08-01 2017-08-22 西安理工大学 基于信息熵的金融网络异常交易社区发现方法
CN105224695B (zh) * 2015-11-12 2018-04-20 中南大学 一种基于信息熵的文本特征量化方法和装置及文本分类方法和装置
CN106649868B (zh) * 2016-12-30 2019-03-26 首都师范大学 问答匹配方法及装置
US10607604B2 (en) * 2017-10-27 2020-03-31 International Business Machines Corporation Method for re-aligning corpus and improving the consistency
CN108256070B (zh) * 2018-01-17 2022-07-15 北京百度网讯科技有限公司 用于生成信息的方法和装置
CN108664470B (zh) * 2018-05-04 2022-06-17 武汉斗鱼网络科技有限公司 视频标题信息量的度量方法、可读存储介质及电子设备
CN110750986B (zh) * 2018-07-04 2023-10-10 普天信息技术有限公司 基于最小信息熵的神经网络分词系统及训练方法
JP6948425B2 (ja) * 2020-03-19 2021-10-13 ヤフー株式会社 判定装置、判定方法及び判定プログラム
CN112765975B (zh) * 2020-12-25 2023-08-04 北京百度网讯科技有限公司 分词岐义处理方法、装置、设备以及介质
JP7045515B1 (ja) * 2021-07-19 2022-03-31 ヤフー株式会社 情報処理装置、情報処理方法および情報処理プログラム
US12153619B2 (en) * 2022-09-20 2024-11-26 Adobe Inc. Generative prompt expansion for image generation
US12314309B2 (en) * 2022-09-23 2025-05-27 Adobe Inc. Zero-shot entity-aware nearest neighbors retrieval
CN115858478B (zh) * 2023-02-24 2023-05-12 山东中联翰元教育科技有限公司 一种可互动的智慧教学平台的数据快速压缩方法

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2825814B1 (fr) * 2001-06-07 2003-09-19 Commissariat Energie Atomique Procede de creation automatique d'une base de donnees images interrogeable par son contenu semantique
US6836777B2 (en) * 2001-11-15 2004-12-28 Ncr Corporation System and method for constructing generic analytical database applications
US6941297B2 (en) * 2002-07-31 2005-09-06 International Business Machines Corporation Automatic query refinement
CN1629833A (zh) * 2003-12-17 2005-06-22 国际商业机器公司 实现问与答功能和计算机辅助写作的方法及装置
US20080077570A1 (en) * 2004-10-25 2008-03-27 Infovell, Inc. Full Text Query and Search Systems and Method of Use
JP2006343925A (ja) * 2005-06-08 2006-12-21 Fuji Xerox Co Ltd 関連語辞書作成装置、および関連語辞書作成方法、並びにコンピュータ・プログラム
US20070250501A1 (en) * 2005-09-27 2007-10-25 Grubb Michael L Search result delivery engine
CN101535945A (zh) * 2006-04-25 2009-09-16 英孚威尔公司 全文查询和搜索系统及其使用方法
CN101122909B (zh) * 2006-08-10 2010-06-16 株式会社日立制作所 文本信息检索装置以及文本信息检索方法
US7392250B1 (en) * 2007-10-22 2008-06-24 International Business Machines Corporation Discovering interestingness in faceted search
US7860885B2 (en) * 2007-12-05 2010-12-28 Palo Alto Research Center Incorporated Inbound content filtering via automated inference detection
US7877389B2 (en) * 2007-12-14 2011-01-25 Yahoo, Inc. Segmentation of search topics in query logs
US8190541B2 (en) * 2008-02-25 2012-05-29 Atigeo Llc Determining relevant information for domains of interest
CN101510221B (zh) * 2009-02-17 2012-05-30 北京大学 一种用于信息检索的查询语句分析方法与系统
US9928296B2 (en) * 2010-12-16 2018-03-27 Microsoft Technology Licensing, Llc Search lexicon expansion

Also Published As

Publication number Publication date
EP2545439A1 (en) 2013-01-16
JP2013522720A (ja) 2013-06-13
US8566303B2 (en) 2013-10-22
JP5450842B2 (ja) 2014-03-26
US20110219004A1 (en) 2011-09-08
WO2011112238A1 (en) 2011-09-15
CN102193929A (zh) 2011-09-21
US20140074884A1 (en) 2014-03-13
US9342627B2 (en) 2016-05-17
EP2545439A4 (en) 2017-03-08
HK1159813A1 (en) 2012-08-03

Similar Documents

Publication Publication Date Title
CN102193929B (zh) 利用词信息熵的搜索方法及其设备
US10546005B2 (en) Perspective data analysis and management
US8990241B2 (en) System and method for recommending queries related to trending topics based on a received query
CN102760138B (zh) 用户网络行为的分类方法和装置及对应的搜索方法和装置
WO2022127543A1 (zh) 广告信息处理方法、装置、设备和存储介质
CN110390052B (zh) 搜索推荐方法、ctr预估模型的训练方法、装置及设备
CN104102639B (zh) 基于文本分类的推广触发方法和装置
CN106796578A (zh) 知识自动化系统
CN103020049A (zh) 搜索方法及搜索系统
CN105183733A (zh) 一种文本信息的匹配、业务对象的推送方法和装置
CN114330329A (zh) 一种业务内容搜索方法、装置、电子设备及存储介质
US10346496B2 (en) Information category obtaining method and apparatus
CN104978332B (zh) 用户生成内容标签数据生成方法、装置及相关方法和装置
JP2012533819A (ja) 文書インデックス化およびデータクエリングのための方法およびシステム
CN103744887B (zh) 一种用于人物搜索的方法、装置和计算机设备
CN103365904A (zh) 一种广告信息搜索方法和系统
CN111191111A (zh) 内容推荐方法、装置及存储介质
CN115017200B (zh) 搜索结果的排序方法、装置、电子设备和存储介质
US10055478B2 (en) Perspective data analysis and management
CN104077707A (zh) 一种推广呈现方式的优化方法和装置
CN103942232A (zh) 用于挖掘意图的方法和设备
CN103186650B (zh) 一种搜索方法和装置
CN104252487A (zh) 一种用于生成词条信息的方法和装置
CN111694929B (zh) 基于数据图谱的搜索方法、智能终端和可读存储介质
CN107133321B (zh) 页面的搜索特性的分析方法和分析装置

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1159813

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1159813

Country of ref document: HK

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130313