SG11202002266YA - Method, device, and apparatus for word vector processing based on clusters - Google Patents

Method, device, and apparatus for word vector processing based on clusters

Info

Publication number
SG11202002266YA
SG11202002266YA SG11202002266YA SG11202002266YA SG11202002266YA SG 11202002266Y A SG11202002266Y A SG 11202002266YA SG 11202002266Y A SG11202002266Y A SG 11202002266YA SG 11202002266Y A SG11202002266Y A SG 11202002266YA SG 11202002266Y A SG11202002266Y A SG 11202002266YA
Authority
SG
Singapore
Prior art keywords
clusters
processing based
word vector
vector processing
word
Prior art date
Application number
SG11202002266YA
Inventor
Shaosheng Cao
Xinxing Yang
Jun Zhou
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Publication of SG11202002266YA publication Critical patent/SG11202002266YA/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Machine Translation (AREA)
  • Devices For Executing Special Programs (AREA)
SG11202002266YA 2017-11-14 2018-09-17 Method, device, and apparatus for word vector processing based on clusters SG11202002266YA (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711123278.8A CN108170663A (en) 2017-11-14 2017-11-14 Term vector processing method, device and equipment based on cluster
PCT/CN2018/105959 WO2019095836A1 (en) 2017-11-14 2018-09-17 Method, device, and apparatus for word vector processing based on clusters

Publications (1)

Publication Number Publication Date
SG11202002266YA true SG11202002266YA (en) 2020-04-29

Family

ID=62527339

Family Applications (1)

Application Number Title Priority Date Filing Date
SG11202002266YA SG11202002266YA (en) 2017-11-14 2018-09-17 Method, device, and apparatus for word vector processing based on clusters

Country Status (6)

Country Link
US (1) US10846483B2 (en)
EP (1) EP3657360A4 (en)
CN (1) CN108170663A (en)
SG (1) SG11202002266YA (en)
TW (1) TW201923620A (en)
WO (1) WO2019095836A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107957989B9 (en) 2017-10-23 2021-01-12 创新先进技术有限公司 Cluster-based word vector processing method, device and equipment
CN108170663A (en) * 2017-11-14 2018-06-15 阿里巴巴集团控股有限公司 Term vector processing method, device and equipment based on cluster
CN111079945B (en) * 2019-12-18 2021-02-05 北京百度网讯科技有限公司 End-to-end model training method and device

Family Cites Families (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5325298A (en) 1990-11-07 1994-06-28 Hnc, Inc. Methods for generating or revising context vectors for a plurality of word stems
US5317507A (en) 1990-11-07 1994-05-31 Gallant Stephen I Method for document retrieval and for word sense disambiguation using neural networks
US5233681A (en) 1992-04-24 1993-08-03 International Business Machines Corporation Context-dependent speech recognizer using estimated next word context
US7251637B1 (en) 1993-09-20 2007-07-31 Fair Isaac Corporation Context vector generation and retrieval
US5619709A (en) 1993-09-20 1997-04-08 Hnc, Inc. System and method of context vector generation and retrieval
US5828999A (en) 1996-05-06 1998-10-27 Apple Computer, Inc. Method and system for deriving a large-span semantic language model for large-vocabulary recognition systems
US6137911A (en) 1997-06-16 2000-10-24 The Dialog Corporation Plc Test classification system and method
US6882990B1 (en) 1999-05-01 2005-04-19 Biowulf Technologies, Llc Methods of identifying biological patterns using multiple data sets
US6574632B2 (en) 1998-11-18 2003-06-03 Harris Corporation Multiple engine information retrieval and visualization system
US6317707B1 (en) 1998-12-07 2001-11-13 At&T Corp. Automatic clustering of tokens from a corpus for grammar acquisition
US6922699B2 (en) 1999-01-26 2005-07-26 Xerox Corporation System and method for quantitatively representing data objects in vector space
US6904405B2 (en) 1999-07-17 2005-06-07 Edwin A. Suominen Message recognition using shared language model
US7376618B1 (en) 2000-06-30 2008-05-20 Fair Isaac Corporation Detecting and measuring risk with predictive models using content mining
US7340674B2 (en) 2002-12-16 2008-03-04 Xerox Corporation Method and apparatus for normalizing quoting styles in electronic mail messages
US7280957B2 (en) 2002-12-16 2007-10-09 Palo Alto Research Center, Incorporated Method and apparatus for generating overview information for hierarchically related information
US7007069B2 (en) 2002-12-16 2006-02-28 Palo Alto Research Center Inc. Method and apparatus for clustering hierarchically related information
EP1894125A4 (en) 2005-06-17 2015-12-02 Nat Res Council Canada Means and method for adapted language translation
US9600568B2 (en) 2006-01-23 2017-03-21 Veritas Technologies Llc Methods and systems for automatic evaluation of electronic discovery review and productions
US9275129B2 (en) 2006-01-23 2016-03-01 Symantec Corporation Methods and systems to efficiently find similar and near-duplicate emails and files
US20080109454A1 (en) 2006-11-03 2008-05-08 Willse Alan R Text analysis techniques
US8027938B1 (en) 2007-03-26 2011-09-27 Google Inc. Discriminative training in machine learning
US7877258B1 (en) 2007-03-29 2011-01-25 Google Inc. Representing n-gram language models for compact storage and fast retrieval
US8756229B2 (en) 2009-06-26 2014-06-17 Quantifind, Inc. System and methods for units-based numeric information retrieval
US8719257B2 (en) 2011-02-16 2014-05-06 Symantec Corporation Methods and systems for automatically generating semantic/concept searches
US8488916B2 (en) 2011-07-22 2013-07-16 David S Terman Knowledge acquisition nexus for facilitating concept capture and promoting time on task
US9519858B2 (en) 2013-02-10 2016-12-13 Microsoft Technology Licensing, Llc Feature-augmented neural networks and applications of same
WO2015081128A1 (en) 2013-11-27 2015-06-04 Ntt Docomo, Inc. Automatic task classification based upon machine learning
JP6588449B2 (en) 2014-01-31 2019-10-09 グーグル エルエルシー Generating a vector representation of a document
US20160070748A1 (en) 2014-09-04 2016-03-10 Crimson Hexagon, Inc. Method and apparatus for improved searching of digital content
US9779085B2 (en) 2015-05-29 2017-10-03 Oracle International Corporation Multilingual embeddings for natural language processing
CN105095444A (en) * 2015-07-24 2015-11-25 百度在线网络技术(北京)有限公司 Information acquisition method and device
US20170139899A1 (en) 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device
CN107102981B (en) * 2016-02-19 2020-06-23 腾讯科技(深圳)有限公司 Word vector generation method and device
CN107133622B (en) 2016-02-29 2022-08-26 阿里巴巴集团控股有限公司 Word segmentation method and device
CN108431794B (en) 2016-03-18 2022-06-21 微软技术许可有限责任公司 Method and apparatus for training a learning machine
CN105786782B (en) * 2016-03-25 2018-10-19 北京搜狗信息服务有限公司 A kind of training method and device of term vector
US10789545B2 (en) * 2016-04-14 2020-09-29 Oath Inc. Method and system for distributed machine learning
JP6671020B2 (en) 2016-06-23 2020-03-25 パナソニックIpマネジメント株式会社 Dialogue act estimation method, dialogue act estimation device and program
JP6199461B1 (en) * 2016-09-13 2017-09-20 ヤフー株式会社 Information processing apparatus, information processing method, and program
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
CN106897265B (en) * 2017-01-12 2020-07-10 北京航空航天大学 Word vector training method and device
CN106802888B (en) 2017-01-12 2020-01-24 北京航空航天大学 Word vector training method and device
CN107239443A (en) 2017-05-09 2017-10-10 清华大学 The training method and server of a kind of term vector learning model
US10303681B2 (en) 2017-05-19 2019-05-28 Microsoft Technology Licensing, Llc Search query and job title proximity computation via word embedding
US10380259B2 (en) 2017-05-22 2019-08-13 International Business Machines Corporation Deep embedding for natural language content based on semantic dependencies
CN107247704B (en) * 2017-06-09 2020-09-08 阿里巴巴集团控股有限公司 Word vector processing method and device and electronic equipment
CN107273355B (en) * 2017-06-12 2020-07-14 大连理工大学 Chinese word vector generation method based on word and phrase joint training
CN107957989B9 (en) 2017-10-23 2021-01-12 创新先进技术有限公司 Cluster-based word vector processing method, device and equipment
CN108170663A (en) * 2017-11-14 2018-06-15 阿里巴巴集团控股有限公司 Term vector processing method, device and equipment based on cluster
US10678830B2 (en) 2018-05-31 2020-06-09 Fmr Llc Automated computer text classification and routing using artificial intelligence transfer learning

Also Published As

Publication number Publication date
CN108170663A (en) 2018-06-15
TW201923620A (en) 2019-06-16
EP3657360A1 (en) 2020-05-27
US10846483B2 (en) 2020-11-24
US20200167527A1 (en) 2020-05-28
WO2019095836A1 (en) 2019-05-23
EP3657360A4 (en) 2020-08-05

Similar Documents

Publication Publication Date Title
SG11202004838WA (en) Blockchain data processing method, apparatus, device, and system
ZA201902729B (en) Blockchain data processing method and apparatus
SG11202100872VA (en) Data processing method, apparatus, and device
SG11202002560PA (en) Data processing method and apparatus
EP3893180C0 (en) Service data processing method and apparatus
SG11201710576XA (en) Data processing method and apparatus, and flash device
SG11202001204RA (en) Cluster-based word vector processing method, device, and apparatus
SG11201703410YA (en) Data processing method, apparatus, and system
EP3700142A4 (en) Data processing method, apparatus and device
EP3547672A4 (en) Data processing method, device, and apparatus
EP3496368A4 (en) Data processing method, apparatus, and system
EP3618432A4 (en) Test data processing device, test data processing method and test apparatus
SG10202107782UA (en) Device configuration method, apparatus and system
SG11202001210YA (en) Data processing method, apparatus, and equipment
EP3404538A4 (en) Data processing method, and data processing apparatus
EP3591868A4 (en) Information processing method, apparatus and device
EP3442151A4 (en) Data processing method, apparatus and system
EP3324583A4 (en) Data processing method, apparatus and system
ZA202000917B (en) Method and device for processing data
EP3220565A4 (en) Data processing method, apparatus and device
SG11202002266YA (en) Method, device, and apparatus for word vector processing based on clusters
GB201704320D0 (en) Data processing apparatus and methods
EP3306474A4 (en) Clock task processing method, apparatus and device
EP3435727A4 (en) Access method, apparatus, device and system
GB201807821D0 (en) System, device, apparatus and method