CN108170663A - 基于集群的词向量处理方法、装置以及设备 - Google Patents
基于集群的词向量处理方法、装置以及设备 Download PDFInfo
- Publication number
- CN108170663A CN108170663A CN201711123278.8A CN201711123278A CN108170663A CN 108170663 A CN108170663 A CN 108170663A CN 201711123278 A CN201711123278 A CN 201711123278A CN 108170663 A CN108170663 A CN 108170663A
- Authority
- CN
- China
- Prior art keywords
- gradient
- term vector
- word
- institute
- predicate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/273—Asynchronous replication or reconciliation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Machine Translation (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Description
Claims (15)
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711123278.8A CN108170663A (zh) | 2017-11-14 | 2017-11-14 | 基于集群的词向量处理方法、装置以及设备 |
TW107131853A TW201923620A (zh) | 2017-11-14 | 2018-09-11 | 基於叢集的詞向量處理方法、裝置以及設備 |
EP18877958.1A EP3657360A4 (en) | 2017-11-14 | 2018-09-17 | METHOD, DEVICE AND APPARATUS FOR PROCESSING VECTOR OF WORDS ON THE BASIS OF CLUSTERS |
PCT/CN2018/105959 WO2019095836A1 (zh) | 2017-11-14 | 2018-09-17 | 基于集群的词向量处理方法、装置以及设备 |
SG11202002266YA SG11202002266YA (en) | 2017-11-14 | 2018-09-17 | Method, device, and apparatus for word vector processing based on clusters |
US16/776,456 US10846483B2 (en) | 2017-11-14 | 2020-01-29 | Method, device, and apparatus for word vector processing based on clusters |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711123278.8A CN108170663A (zh) | 2017-11-14 | 2017-11-14 | 基于集群的词向量处理方法、装置以及设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108170663A true CN108170663A (zh) | 2018-06-15 |
Family
ID=62527339
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711123278.8A Pending CN108170663A (zh) | 2017-11-14 | 2017-11-14 | 基于集群的词向量处理方法、装置以及设备 |
Country Status (6)
Country | Link |
---|---|
US (1) | US10846483B2 (zh) |
EP (1) | EP3657360A4 (zh) |
CN (1) | CN108170663A (zh) |
SG (1) | SG11202002266YA (zh) |
TW (1) | TW201923620A (zh) |
WO (1) | WO2019095836A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019095836A1 (zh) * | 2017-11-14 | 2019-05-23 | 阿里巴巴集团控股有限公司 | 基于集群的词向量处理方法、装置以及设备 |
US10769383B2 (en) | 2017-10-23 | 2020-09-08 | Alibaba Group Holding Limited | Cluster-based word vector processing method, device, and apparatus |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079945B (zh) * | 2019-12-18 | 2021-02-05 | 北京百度网讯科技有限公司 | 端到端模型的训练方法及装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095444A (zh) * | 2015-07-24 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | 信息获取方法和装置 |
CN105786782A (zh) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | 一种词向量的训练方法和装置 |
CN107247704A (zh) * | 2017-06-09 | 2017-10-13 | 阿里巴巴集团控股有限公司 | 词向量处理方法、装置以及电子设备 |
Family Cites Families (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5317507A (en) | 1990-11-07 | 1994-05-31 | Gallant Stephen I | Method for document retrieval and for word sense disambiguation using neural networks |
US5325298A (en) | 1990-11-07 | 1994-06-28 | Hnc, Inc. | Methods for generating or revising context vectors for a plurality of word stems |
US5233681A (en) | 1992-04-24 | 1993-08-03 | International Business Machines Corporation | Context-dependent speech recognizer using estimated next word context |
US5619709A (en) | 1993-09-20 | 1997-04-08 | Hnc, Inc. | System and method of context vector generation and retrieval |
US7251637B1 (en) | 1993-09-20 | 2007-07-31 | Fair Isaac Corporation | Context vector generation and retrieval |
US5828999A (en) | 1996-05-06 | 1998-10-27 | Apple Computer, Inc. | Method and system for deriving a large-span semantic language model for large-vocabulary recognition systems |
US6137911A (en) | 1997-06-16 | 2000-10-24 | The Dialog Corporation Plc | Test classification system and method |
US6574632B2 (en) | 1998-11-18 | 2003-06-03 | Harris Corporation | Multiple engine information retrieval and visualization system |
US6317707B1 (en) | 1998-12-07 | 2001-11-13 | At&T Corp. | Automatic clustering of tokens from a corpus for grammar acquisition |
US6922699B2 (en) | 1999-01-26 | 2005-07-26 | Xerox Corporation | System and method for quantitatively representing data objects in vector space |
US6714925B1 (en) | 1999-05-01 | 2004-03-30 | Barnhill Technologies, Llc | System for identifying patterns in biological data using a distributed network |
US6904405B2 (en) | 1999-07-17 | 2005-06-07 | Edwin A. Suominen | Message recognition using shared language model |
US7376618B1 (en) | 2000-06-30 | 2008-05-20 | Fair Isaac Corporation | Detecting and measuring risk with predictive models using content mining |
US7007069B2 (en) | 2002-12-16 | 2006-02-28 | Palo Alto Research Center Inc. | Method and apparatus for clustering hierarchically related information |
US7280957B2 (en) | 2002-12-16 | 2007-10-09 | Palo Alto Research Center, Incorporated | Method and apparatus for generating overview information for hierarchically related information |
US7340674B2 (en) | 2002-12-16 | 2008-03-04 | Xerox Corporation | Method and apparatus for normalizing quoting styles in electronic mail messages |
EP1894125A4 (en) | 2005-06-17 | 2015-12-02 | Nat Res Council Canada | MEANS AND METHOD FOR ADAPTED LANGUAGE TRANSLATION |
US9600568B2 (en) | 2006-01-23 | 2017-03-21 | Veritas Technologies Llc | Methods and systems for automatic evaluation of electronic discovery review and productions |
US9275129B2 (en) | 2006-01-23 | 2016-03-01 | Symantec Corporation | Methods and systems to efficiently find similar and near-duplicate emails and files |
US20080109454A1 (en) | 2006-11-03 | 2008-05-08 | Willse Alan R | Text analysis techniques |
US8027938B1 (en) | 2007-03-26 | 2011-09-27 | Google Inc. | Discriminative training in machine learning |
US7877258B1 (en) | 2007-03-29 | 2011-01-25 | Google Inc. | Representing n-gram language models for compact storage and fast retrieval |
US8756229B2 (en) | 2009-06-26 | 2014-06-17 | Quantifind, Inc. | System and methods for units-based numeric information retrieval |
US8719257B2 (en) | 2011-02-16 | 2014-05-06 | Symantec Corporation | Methods and systems for automatically generating semantic/concept searches |
US8488916B2 (en) | 2011-07-22 | 2013-07-16 | David S Terman | Knowledge acquisition nexus for facilitating concept capture and promoting time on task |
US9519858B2 (en) | 2013-02-10 | 2016-12-13 | Microsoft Technology Licensing, Llc | Feature-augmented neural networks and applications of same |
JP6440732B2 (ja) | 2013-11-27 | 2018-12-19 | 株式会社Nttドコモ | 機械学習に基づく自動タスク分類 |
WO2015116909A1 (en) | 2014-01-31 | 2015-08-06 | Google Inc. | Generating vector representations of documents |
US20160070748A1 (en) | 2014-09-04 | 2016-03-10 | Crimson Hexagon, Inc. | Method and apparatus for improved searching of digital content |
US9779085B2 (en) | 2015-05-29 | 2017-10-03 | Oracle International Corporation | Multilingual embeddings for natural language processing |
US20170139899A1 (en) | 2015-11-18 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Keyword extraction method and electronic device |
CN107102981B (zh) * | 2016-02-19 | 2020-06-23 | 腾讯科技(深圳)有限公司 | 词向量生成方法和装置 |
CN107133622B (zh) | 2016-02-29 | 2022-08-26 | 阿里巴巴集团控股有限公司 | 一种单词的分割方法和装置 |
CN108431794B (zh) | 2016-03-18 | 2022-06-21 | 微软技术许可有限责任公司 | 用于训练学习机的方法和装置 |
US10789545B2 (en) * | 2016-04-14 | 2020-09-29 | Oath Inc. | Method and system for distributed machine learning |
JP6671020B2 (ja) | 2016-06-23 | 2020-03-25 | パナソニックIpマネジメント株式会社 | 対話行為推定方法、対話行為推定装置及びプログラム |
JP6199461B1 (ja) * | 2016-09-13 | 2017-09-20 | ヤフー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
CN106802888B (zh) | 2017-01-12 | 2020-01-24 | 北京航空航天大学 | 词向量训练方法和装置 |
CN106897265B (zh) * | 2017-01-12 | 2020-07-10 | 北京航空航天大学 | 词向量训练方法及装置 |
CN107239443A (zh) | 2017-05-09 | 2017-10-10 | 清华大学 | 一种词向量学习模型的训练方法及服务器 |
US10303681B2 (en) | 2017-05-19 | 2019-05-28 | Microsoft Technology Licensing, Llc | Search query and job title proximity computation via word embedding |
US10380259B2 (en) | 2017-05-22 | 2019-08-13 | International Business Machines Corporation | Deep embedding for natural language content based on semantic dependencies |
CN107273355B (zh) * | 2017-06-12 | 2020-07-14 | 大连理工大学 | 一种基于字词联合训练的中文词向量生成方法 |
CN107957989B9 (zh) | 2017-10-23 | 2021-01-12 | 创新先进技术有限公司 | 基于集群的词向量处理方法、装置以及设备 |
CN108170663A (zh) * | 2017-11-14 | 2018-06-15 | 阿里巴巴集团控股有限公司 | 基于集群的词向量处理方法、装置以及设备 |
US10678830B2 (en) | 2018-05-31 | 2020-06-09 | Fmr Llc | Automated computer text classification and routing using artificial intelligence transfer learning |
-
2017
- 2017-11-14 CN CN201711123278.8A patent/CN108170663A/zh active Pending
-
2018
- 2018-09-11 TW TW107131853A patent/TW201923620A/zh unknown
- 2018-09-17 WO PCT/CN2018/105959 patent/WO2019095836A1/zh unknown
- 2018-09-17 EP EP18877958.1A patent/EP3657360A4/en active Pending
- 2018-09-17 SG SG11202002266YA patent/SG11202002266YA/en unknown
-
2020
- 2020-01-29 US US16/776,456 patent/US10846483B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095444A (zh) * | 2015-07-24 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | 信息获取方法和装置 |
CN105786782A (zh) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | 一种词向量的训练方法和装置 |
CN107247704A (zh) * | 2017-06-09 | 2017-10-13 | 阿里巴巴集团控股有限公司 | 词向量处理方法、装置以及电子设备 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10769383B2 (en) | 2017-10-23 | 2020-09-08 | Alibaba Group Holding Limited | Cluster-based word vector processing method, device, and apparatus |
WO2019095836A1 (zh) * | 2017-11-14 | 2019-05-23 | 阿里巴巴集团控股有限公司 | 基于集群的词向量处理方法、装置以及设备 |
US10846483B2 (en) | 2017-11-14 | 2020-11-24 | Advanced New Technologies Co., Ltd. | Method, device, and apparatus for word vector processing based on clusters |
Also Published As
Publication number | Publication date |
---|---|
TW201923620A (zh) | 2019-06-16 |
EP3657360A1 (en) | 2020-05-27 |
SG11202002266YA (en) | 2020-04-29 |
US20200167527A1 (en) | 2020-05-28 |
EP3657360A4 (en) | 2020-08-05 |
US10846483B2 (en) | 2020-11-24 |
WO2019095836A1 (zh) | 2019-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI721310B (zh) | 基於集群的詞向量處理方法、裝置以及設備 | |
TWI701588B (zh) | 詞向量處理方法、裝置以及設備 | |
TWI685761B (zh) | 詞向量處理方法及裝置 | |
TWI689831B (zh) | 詞向量產生方法、裝置以及設備 | |
CN109658455A (zh) | 图像处理方法和处理设备 | |
CN107450972A (zh) | 一种调度方法、装置以及电子设备 | |
TWI686713B (zh) | 詞向量產生方法、裝置以及設備 | |
CN109086961A (zh) | 一种信息风险监测方法及装置 | |
CN110019903A (zh) | 图像处理引擎组件的生成方法、搜索方法及终端、系统 | |
CN108874765A (zh) | 词向量处理方法及装置 | |
CN108021610A (zh) | 随机游走、基于分布式系统的随机游走方法、装置以及设备 | |
CN108170663A (zh) | 基于集群的词向量处理方法、装置以及设备 | |
CN108073687A (zh) | 随机游走、基于集群的随机游走方法、装置以及设备 | |
CN109271587A (zh) | 一种页面生成方法和装置 | |
CN107423269A (zh) | 词向量处理方法及装置 | |
CN109656946A (zh) | 一种多表关联查询方法、装置及设备 | |
CN107247704A (zh) | 词向量处理方法、装置以及电子设备 | |
CN107562716A (zh) | 词向量处理方法、装置以及电子设备 | |
WO2019174392A1 (zh) | 针对rpc信息的向量处理 | |
CN108519986A (zh) | 一种网页生成方法、装置及设备 | |
CN107562715A (zh) | 词向量处理方法、装置以及电子设备 | |
CN107844472A (zh) | 词向量处理方法、装置以及电子设备 | |
CN107577659A (zh) | 词向量处理方法、装置以及电子设备 | |
CN109635147A (zh) | 一种顶点的图嵌入向量生成、查询方法和装置 | |
CN106802952A (zh) | 海量数据的处理方法、提取方法以及处理装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1255563 Country of ref document: HK |
|
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20201020 Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands Applicant after: Innovative advanced technology Co.,Ltd. Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands Applicant before: Advanced innovation technology Co.,Ltd. Effective date of registration: 20201020 Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands Applicant after: Advanced innovation technology Co.,Ltd. Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands Applicant before: Alibaba Group Holding Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180615 |