HK1167028A1 - A method and device for providing word segmentation result with multiple granularities - Google Patents

A method and device for providing word segmentation result with multiple granularities

Info

Publication number
HK1167028A1
HK1167028A1 HK12107731.5A HK12107731A HK1167028A1 HK 1167028 A1 HK1167028 A1 HK 1167028A1 HK 12107731 A HK12107731 A HK 12107731A HK 1167028 A1 HK1167028 A1 HK 1167028A1
Authority
HK
Hong Kong
Prior art keywords
word segmentation
segmentation result
providing word
multiple granularities
granularities
Prior art date
Application number
HK12107731.5A
Other languages
English (en)
Chinese (zh)
Inventor
孫健
侯磊
唐晶明
初敏
廖曉玲
許冰婧
彭仁剛
楊揚
Original Assignee
阿里巴巴集團控股有限公司 號郵箱
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集團控股有限公司 號郵箱 filed Critical 阿里巴巴集團控股有限公司 號郵箱
Publication of HK1167028A1 publication Critical patent/HK1167028A1/xx

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
HK12107731.5A 2010-11-22 2012-08-07 A method and device for providing word segmentation result with multiple granularities HK1167028A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010555763.4A CN102479191B (zh) 2010-11-22 2010-11-22 提供多粒度分词结果的方法及其装置

Publications (1)

Publication Number Publication Date
HK1167028A1 true HK1167028A1 (en) 2012-11-16

Family

ID=46065146

Family Applications (1)

Application Number Title Priority Date Filing Date
HK12107731.5A HK1167028A1 (en) 2010-11-22 2012-08-07 A method and device for providing word segmentation result with multiple granularities

Country Status (7)

Country Link
US (3) US8892420B2 (xx)
EP (1) EP2643770A4 (xx)
JP (1) JP5788015B2 (xx)
CN (1) CN102479191B (xx)
HK (1) HK1167028A1 (xx)
TW (1) TWI512507B (xx)
WO (1) WO2012095696A2 (xx)

Families Citing this family (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9721238B2 (en) 2009-02-13 2017-08-01 Visa U.S.A. Inc. Point of interaction loyalty currency redemption in a transaction
US9031859B2 (en) 2009-05-21 2015-05-12 Visa U.S.A. Inc. Rebate automation
US8463706B2 (en) 2009-08-24 2013-06-11 Visa U.S.A. Inc. Coupon bearing sponsor account transaction authorization
CN102479191B (zh) 2010-11-22 2014-03-26 阿里巴巴集团控股有限公司 提供多粒度分词结果的方法及其装置
US8782042B1 (en) * 2011-10-14 2014-07-15 Firstrain, Inc. Method and system for identifying entities
US10360578B2 (en) 2012-01-30 2019-07-23 Visa International Service Association Systems and methods to process payments based on payment deals
FR2986882A1 (fr) * 2012-02-09 2013-08-16 Mining Essential Procede d'identification d'un ensemble de phrases d'un document numerique, procede de generation d'un document numerique, dispositif associe
US9460436B2 (en) 2012-03-16 2016-10-04 Visa International Service Association Systems and methods to apply the benefit of offers via a transaction handler
US8880431B2 (en) * 2012-03-16 2014-11-04 Visa International Service Association Systems and methods to generate a receipt for a transaction
US9922338B2 (en) 2012-03-23 2018-03-20 Visa International Service Association Systems and methods to apply benefit of offers
US9495690B2 (en) 2012-04-04 2016-11-15 Visa International Service Association Systems and methods to process transactions and offers via a gateway
CN103425691B (zh) * 2012-05-22 2016-12-14 阿里巴巴集团控股有限公司 一种搜索方法和系统
US9864988B2 (en) 2012-06-15 2018-01-09 Visa International Service Association Payment processing for qualified transaction items
US9626678B2 (en) 2012-08-01 2017-04-18 Visa International Service Association Systems and methods to enhance security in transactions
US10438199B2 (en) 2012-08-10 2019-10-08 Visa International Service Association Systems and methods to apply values from stored value accounts to payment transactions
US10685367B2 (en) 2012-11-05 2020-06-16 Visa International Service Association Systems and methods to provide offer benefits based on issuer identity
US10629186B1 (en) * 2013-03-11 2020-04-21 Amazon Technologies, Inc. Domain and intent name feature identification and processing
US10592980B1 (en) 2013-03-15 2020-03-17 Intuit Inc. Systems methods and computer program products for identifying financial accounts utilized for business purposes
CN103400579B (zh) * 2013-08-04 2015-11-18 徐华 一种语音识别系统和构建方法
CN104679738B (zh) * 2013-11-27 2018-02-27 北京拓尔思信息技术股份有限公司 互联网热词挖掘方法及装置
CN103942347B (zh) * 2014-05-19 2017-04-05 焦点科技股份有限公司 一种基于多维度综合词库的分词方法
CN104050294A (zh) * 2014-06-30 2014-09-17 北京奇虎科技有限公司 互联网稀有资源的挖掘方法及装置
CN104317882B (zh) * 2014-10-21 2017-05-10 北京理工大学 一种决策级中文分词融合方法
CN104598573B (zh) * 2015-01-13 2017-06-16 北京京东尚科信息技术有限公司 一种用户的生活圈提取方法及系统
CN104965818B (zh) * 2015-05-25 2018-01-05 中国科学院信息工程研究所 一种基于自学习规则的项目名实体识别方法及系统
CN106649249A (zh) * 2015-07-14 2017-05-10 比亚迪股份有限公司 检索方法和检索装置
CN106547743B (zh) * 2015-09-23 2020-03-27 阿里巴巴集团控股有限公司 一种进行翻译的方法及其系统
CN105550170B (zh) * 2015-12-14 2018-10-12 北京锐安科技有限公司 一种中文分词方法及装置
US10224034B2 (en) * 2016-02-03 2019-03-05 Hua Xu Voice recognition system and construction method thereof
CN107291684B (zh) * 2016-04-12 2021-02-09 华为技术有限公司 语言文本的分词方法和系统
US20170371850A1 (en) * 2016-06-22 2017-12-28 Google Inc. Phonetics-based computer transliteration techniques
CN106202039B (zh) * 2016-06-30 2019-06-11 昆明理工大学 基于条件随机场的越南语组合词消歧方法
CN106202464B (zh) * 2016-07-18 2019-12-17 上海轻维软件有限公司 一种基于变异回溯算法的数据识别方法
CN106227719B (zh) * 2016-07-26 2018-10-23 北京智能管家科技有限公司 中文分词歧义消除方法和系统
CN106484677B (zh) * 2016-09-30 2019-02-12 北京林业大学 一种基于最小信息量的汉语快速分词系统及方法
CN106569997B (zh) * 2016-10-19 2019-12-10 中国科学院信息工程研究所 一种基于隐式马尔科夫模型的科技类复合短语识别方法
CN108073566B (zh) * 2016-11-16 2022-01-18 北京搜狗科技发展有限公司 分词方法和装置、用于分词的装置
TWI656450B (zh) * 2017-01-06 2019-04-11 香港商光訊網絡科技有限公司 從中文語料庫提取知識的方法和系統
US10176889B2 (en) 2017-02-09 2019-01-08 International Business Machines Corporation Segmenting and interpreting a document, and relocating document fragments to corresponding sections
US10169325B2 (en) 2017-02-09 2019-01-01 International Business Machines Corporation Segmenting and interpreting a document, and relocating document fragments to corresponding sections
CN107168992A (zh) * 2017-03-29 2017-09-15 北京百度网讯科技有限公司 基于人工智能的文章分类方法及装置、设备与可读介质
CN110945514B (zh) 2017-07-31 2023-08-25 北京嘀嘀无限科技发展有限公司 用于分割句子的系统和方法
CN107818079A (zh) * 2017-09-05 2018-03-20 苏州大学 多粒度分词标注数据自动获取方法及系统
CN107729312B (zh) * 2017-09-05 2021-04-20 苏州大学 基于序列标注建模的多粒度分词方法及系统
US11750897B2 (en) * 2017-09-07 2023-09-05 Studeo Realty Marketing Inc. Generating sequential visual narratives
CN108304373B (zh) * 2017-10-13 2021-07-09 腾讯科技(深圳)有限公司 语义词典的构建方法、装置、存储介质和电子装置
US10607604B2 (en) * 2017-10-27 2020-03-31 International Business Machines Corporation Method for re-aligning corpus and improving the consistency
CN108052500B (zh) * 2017-12-13 2021-06-22 北京数洋智慧科技有限公司 一种基于语义分析的文本关键信息提取方法及装置
CN109635157B (zh) * 2018-10-30 2021-05-25 北京奇艺世纪科技有限公司 模型生成方法、视频搜索方法、装置、终端及存储介质
US10885282B2 (en) * 2018-12-07 2021-01-05 Microsoft Technology Licensing, Llc Document heading detection
WO2020167586A1 (en) * 2019-02-11 2020-08-20 Db Cybertech, Inc. Automated data discovery for cybersecurity
JP7293767B2 (ja) * 2019-03-19 2023-06-20 株式会社リコー テキストセグメンテーション装置、テキストセグメンテーション方法、テキストセグメンテーションプログラム、及びテキストセグメンテーションシステム
CN110210034A (zh) * 2019-05-31 2019-09-06 腾讯科技(深圳)有限公司 信息查询方法、装置、终端及存储介质
CN110457551B (zh) * 2019-08-14 2021-04-23 梁冰 自然语言的语义递归表示系统的构造方法
CN111104800B (zh) * 2019-12-24 2024-01-23 东软集团股份有限公司 一种实体识别方法、装置、设备、存储介质和程序产品
CN111274353B (zh) * 2020-01-14 2023-08-01 百度在线网络技术(北京)有限公司 文本切词方法、装置、设备和介质
CN111931034B (zh) * 2020-08-24 2024-01-26 腾讯科技(深圳)有限公司 数据搜索方法、装置、设备及存储介质
CN112017773B (zh) * 2020-08-31 2024-03-26 吾征智能技术(北京)有限公司 一种基于噩梦的疾病认知模型构建方法及疾病认知系统
US11373041B2 (en) 2020-09-18 2022-06-28 International Business Machines Corporation Text classification using models with complementary granularity and accuracy
CN112784574B (zh) * 2021-02-02 2023-09-15 网易(杭州)网络有限公司 一种文本分割方法、装置、电子设备及介质
CN114386407B (zh) * 2021-12-23 2023-04-11 北京金堤科技有限公司 文本的分词方法及装置
CN116186698A (zh) * 2022-12-16 2023-05-30 广东技术师范大学 一种基于机器学习的安全数据处理方法、介质及设备
CN116991980B (zh) * 2023-09-27 2024-01-19 腾讯科技(深圳)有限公司 文本筛选模型训练方法及相关方法、装置、介质及设备

Family Cites Families (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01234975A (ja) * 1988-03-11 1989-09-20 Internatl Business Mach Corp <Ibm> 日本語文章分割装置
JPH04262460A (ja) 1991-02-15 1992-09-17 Ricoh Co Ltd 情報検索装置
US6202058B1 (en) 1994-04-25 2001-03-13 Apple Computer, Inc. System for ranking the relevance of information objects accessed by computer users
JP3617096B2 (ja) 1994-05-25 2005-02-02 富士ゼロックス株式会社 関係表現抽出装置および関係表現検索装置、関係表現抽出方法、関係表現検索方法
US7133835B1 (en) 1995-08-08 2006-11-07 Cxn, Inc. Online exchange market system with a buyer auction and a seller auction
JP3565239B2 (ja) 1996-09-03 2004-09-15 日本電信電話株式会社 情報検索装置
AU2399301A (en) 1999-12-21 2001-07-03 Matsushita Electric Industrial Co., Ltd. Vector index creating method, similar vector searching method, and devices for them
US7092871B2 (en) 2000-07-20 2006-08-15 Microsoft Corporation Tokenizer for a natural language processing system
US20020157116A1 (en) 2000-07-28 2002-10-24 Koninklijke Philips Electronics N.V. Context and content based information processing for multimedia segmentation and indexing
US7403938B2 (en) * 2001-09-24 2008-07-22 Iac Search & Media, Inc. Natural language query processing
US7805302B2 (en) * 2002-05-20 2010-09-28 Microsoft Corporation Applying a structured language model to information extraction
US7756847B2 (en) 2003-03-03 2010-07-13 Koninklijke Philips Electronics N.V. Method and arrangement for searching for strings
US7424421B2 (en) * 2004-03-03 2008-09-09 Microsoft Corporation Word collection method and system for use in word-breaking
JP4754247B2 (ja) * 2004-03-31 2011-08-24 オセ−テクノロジーズ ビーブイ 複合語を構成する単語を割り出す装置及びコンピュータ化された方法
US20080077570A1 (en) 2004-10-25 2008-03-27 Infovell, Inc. Full Text Query and Search Systems and Method of Use
US8200687B2 (en) 2005-06-20 2012-06-12 Ebay Inc. System to generate related search queries
US20070067098A1 (en) 2005-09-19 2007-03-22 Zelentsov Oleg U Method and system for identification of geographic location
US8255383B2 (en) 2006-07-14 2012-08-28 Chacha Search, Inc Method and system for qualifying keywords in query strings
US8510298B2 (en) 2006-08-04 2013-08-13 Thefind, Inc. Method for relevancy ranking of products in online shopping
JP2008287406A (ja) * 2007-05-16 2008-11-27 Sony Corp 情報処理装置および情報処理方法、プログラム、並びに、記録媒体
TW200926033A (en) * 2007-07-18 2009-06-16 Steven Kays Adaptive electronic design
EP2191401A1 (en) 2007-08-27 2010-06-02 Google, Inc. Distinguishing accessories from products for ranking search results
US8301633B2 (en) * 2007-10-01 2012-10-30 Palo Alto Research Center Incorporated System and method for semantic search
US8019748B1 (en) 2007-11-14 2011-09-13 Google Inc. Web search refinement
JP5343861B2 (ja) 2007-12-27 2013-11-13 日本電気株式会社 テキスト分割装置とテキスト分割方法およびプログラム
CN101246472B (zh) * 2008-03-28 2010-10-06 腾讯科技(深圳)有限公司 一种汉语文本的大、小粒度切分实现方法和装置
JP4979637B2 (ja) 2008-06-06 2012-07-18 ヤフー株式会社 複合語の区切り位置を推定する複合語区切り推定装置、方法、およびプログラム
US8862989B2 (en) * 2008-06-25 2014-10-14 Microsoft Corporation Extensible input method editor dictionary
EP2259252B1 (en) 2009-06-02 2012-08-01 Nuance Communications, Inc. Speech recognition method for selecting a combination of list elements via a speech input
CN101655838B (zh) * 2009-09-10 2011-12-14 复旦大学 一种粒度可量化的话题提取方法
US20110093331A1 (en) 2009-10-19 2011-04-21 Donald Metzler Term Weighting for Contextual Advertising
US9348892B2 (en) 2010-01-27 2016-05-24 International Business Machines Corporation Natural language interface for faceted search/analysis of semistructured data
EP2534585A4 (en) 2010-02-12 2018-01-24 Google LLC Compound splitting
CN102236663B (zh) 2010-04-30 2014-04-09 阿里巴巴集团控股有限公司 一种基于垂直搜索的查询方法、系统和装置
US8515968B1 (en) 2010-08-13 2013-08-20 Google Inc. Tie breaking rules for content item matching
CN102479191B (zh) 2010-11-22 2014-03-26 阿里巴巴集团控股有限公司 提供多粒度分词结果的方法及其装置
CA2721498C (en) 2010-11-25 2011-08-02 Microsoft Corporation Efficient use of exceptions in text segmentation
US20120191745A1 (en) 2011-01-24 2012-07-26 Yahoo!, Inc. Synthesized Suggestions for Web-Search Queries
US20120317088A1 (en) 2011-06-07 2012-12-13 Microsoft Corporation Associating Search Queries and Entities

Also Published As

Publication number Publication date
EP2643770A4 (en) 2017-12-27
CN102479191B (zh) 2014-03-26
EP2643770A2 (en) 2013-10-02
TWI512507B (zh) 2015-12-11
US8892420B2 (en) 2014-11-18
US20160132492A1 (en) 2016-05-12
CN102479191A (zh) 2012-05-30
JP5788015B2 (ja) 2015-09-30
US9223779B2 (en) 2015-12-29
TW201222291A (en) 2012-06-01
WO2012095696A3 (en) 2012-11-08
JP2014500547A (ja) 2014-01-09
WO2012095696A2 (en) 2012-07-19
US20150100307A1 (en) 2015-04-09
US20120130705A1 (en) 2012-05-24

Similar Documents

Publication Publication Date Title
HK1167028A1 (en) A method and device for providing word segmentation result with multiple granularities
EP2725764A4 (en) DATA STORAGE METHOD AND DATA STORAGE DEVICE
EP2767880A4 (en) METHOD AND DEVICE FOR STORING DATA
EP2685389A4 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING AND COMPUTER PROGRAM
EP2667514A4 (en) DATA PROCESSING DEVICE AND DATA PROCESSING METHOD
EP2688210A4 (en) DATA PROCESSING DEVICE AND DATA PROCESSING METHOD
EP2731332A4 (en) INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD
EP2675068A4 (en) DATA PROCESSING DEVICE AND DATA PROCESSING METHOD
EP2717186A4 (en) INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD
EP2801912A4 (en) MESSAGE-BASED MEMORY ACCESS DEVICE AND ACCESS METHOD THEREOF
HUE042122T2 (hu) Szegmens tartalom lokalizálása és visszakeresése
EP2613443A4 (en) DATA PROCESSING DEVICE AND DATA PROCESSING METHOD
EP2680446A4 (en) DATA PROCESSING DEVICE AND DATA PROCESSING METHOD
EP2618491A4 (en) DATA PROCESSING DEVICE AND DATA PROCESSING METHOD
EP2667513A4 (en) DATA PROCESSING DEVICE AND DATA PROCESSING METHOD
EP2549390A4 (en) DATA PROCESSING DEVICE AND DATA PROCESSING METHOD
HK1159792A1 (zh) 種數據分析方法和設備
HK1169867A1 (en) Method and device for providing information
HK1173819A1 (zh) 種基於單字索引系統的檢索方法和裝置
PT2700234T (pt) Método e dispositivo para codificação com compressão com perda de dados
SG10201404266YA (en) A data structure and a method for using the data structure
EP2824893A4 (en) DATA STORAGE METHOD AND DEVICE
EP2570937A4 (en) DATA SEARCH DEVICE, DATA SEARCH METHOD, AND PROGRAM
EP2782125A4 (en) METHOD OF MANAGING DATA ASSOCIATED WITH A WAFER AND DEVICE FOR CREATING DATA ASSOCIATED WITH A WAFER
EP2437462A4 (en) DEVICE AND METHOD FOR PROCESSING ACCESS TO DATA

Legal Events

Date Code Title Description
PC Patent ceased (i.e. patent has lapsed due to the failure to pay the renewal fee)

Effective date: 20221120