SG11201906524TA - Word vector processing method and apparatus - Google Patents

Word vector processing method and apparatus

Info

Publication number
SG11201906524TA
SG11201906524TA SG11201906524TA SG11201906524TA SG11201906524TA SG 11201906524T A SG11201906524T A SG 11201906524TA SG 11201906524T A SG11201906524T A SG 11201906524TA SG 11201906524T A SG11201906524T A SG 11201906524TA SG 11201906524T A SG11201906524T A SG 11201906524TA
Authority
SG
Singapore
Prior art keywords
word
international
vectors
words
stroke
Prior art date
Application number
SG11201906524TA
Inventor
Shaosheng Cao
Xiaolong Li
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Publication of SG11201906524TA publication Critical patent/SG11201906524TA/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)
  • Document Processing Apparatus (AREA)

Abstract

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (43) International Publication Date 26 July 2018 (26.07.2018) WIP0 I PCT omit VIII °nolo moilowo oimIE (10) International Publication Number WO 2018/136870 Al (51) International Patent Classification: TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, GOOF 17/28 (2006.01) GOOF 17/27 (2006.01) (21) International Application Number: PCT/US2018/014680 (22) International Filing Date: 22 January 2018 (22.01.2018) (25) Filing Language: English (26) Publication Language: English (30) Priority Data: 201710045459.7 22 January 2017 (22.01.2017) CN 15/874,725 18 January 2018 (18.01.2018) US (71) Applicant: ALIBABA GROUP HOLDING LIMITED [ /US]; Fourth Floor, One Capital Place, P.O. Box 847, George Town, Grand Cayman (KY). (72) Inventors: CAO, Shaosheng; c/o Ants Patent Team, 17F Building B, Huanglong Times Plaza, No.18 Wantang Road, Hangzhou, 310099 (CN). LI, Xiaolong; c/o Ants Patent Team, 17F Building B, Huanglong Times Plaza, No.18 Wantang Road, Hangzhou, 310099 (CN). (74) Agent: STALFORD, Terry, J.; Fish & Richardson P.C., P.O. Box 1022, Minneapolis, MN 55440-1022 (US). (81) Designated States (unless otherwise indicated, for every kind of national protection available): AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IR, IS, JO, JP, KE, KG, KH, KN, KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (84) Designated States (unless otherwise indicated, for every kind of regional protection available): ARIPO (BW, GH, GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, KM, ML, MR, NE, SN, TD, TG). Published: — with international search report (Art. 21(3)) 1-1 (54) Title: WORD VECTOR PROCESSING METHOD AND APPARATUS ot (57) : Embodiments of the present application disclose a word vector processing method and apparatus. The method includes: performing word segmentation on a corpus to obtain words; determining n-gram strokes corresponding to the words, the n-gram stroke representing n successive strokes of a corresponding word; establishing and initializing word vectors of the words and stroke vectors 1-1 of the n-gram strokes corresponding to the words; and training the word vectors and the stroke vectors according to the corpus obtained after the word segmentation, the word vectors, and the stroke vectors. With the embodiments of the present application, features of a 1-1 © word can be shown more precisely by using n-gram strokes corresponding to the word, thus enhancing accuracy of word vectors of N Chinese words and achieving a desirable practical effect.
SG11201906524TA 2017-01-22 2018-01-22 Word vector processing method and apparatus SG11201906524TA (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201710045459.7A CN108345580B (en) 2017-01-22 2017-01-22 Word vector processing method and device
US15/874,725 US10430518B2 (en) 2017-01-22 2018-01-18 Word vector processing for foreign languages
PCT/US2018/014680 WO2018136870A1 (en) 2017-01-22 2018-01-22 Word vector processing method and apparatus

Publications (1)

Publication Number Publication Date
SG11201906524TA true SG11201906524TA (en) 2019-08-27

Family

ID=62906491

Family Applications (1)

Application Number Title Priority Date Filing Date
SG11201906524TA SG11201906524TA (en) 2017-01-22 2018-01-22 Word vector processing method and apparatus

Country Status (9)

Country Link
US (2) US10430518B2 (en)
EP (1) EP3559823A1 (en)
JP (1) JP6742653B2 (en)
KR (1) KR102117799B1 (en)
CN (2) CN108345580B (en)
PH (1) PH12019501675A1 (en)
SG (1) SG11201906524TA (en)
TW (1) TWI685761B (en)
WO (1) WO2018136870A1 (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345580B (en) 2017-01-22 2020-05-15 创新先进技术有限公司 Word vector processing method and device
CN110119507A (en) * 2018-02-05 2019-08-13 阿里巴巴集团控股有限公司 Term vector generation method, device and equipment
CN109271622B (en) * 2018-08-08 2021-05-14 山西大学 Low-dimensional word feature learning method based on frequency distribution correction
CN110929508B (en) * 2018-09-20 2023-05-02 阿里巴巴集团控股有限公司 Word vector generation method, device and system
CN110956034B (en) * 2018-09-21 2023-04-11 阿里巴巴集团控股有限公司 Word acquisition method and device and commodity search method
CN111274793B (en) * 2018-11-19 2023-04-28 阿里巴巴集团控股有限公司 Text processing method and device and computing equipment
CN110059155A (en) * 2018-12-18 2019-07-26 阿里巴巴集团控股有限公司 The calculating of text similarity, intelligent customer service system implementation method and device
CN109657062A (en) * 2018-12-24 2019-04-19 万达信息股份有限公司 A kind of electronic health record text resolution closed-loop policy based on big data technology
CN111353016B (en) * 2018-12-24 2023-04-18 阿里巴巴集团控股有限公司 Text processing method and device
CN109933686B (en) * 2019-03-18 2023-02-03 创新先进技术有限公司 Song label prediction method, device, server and storage medium
CN110222144B (en) * 2019-04-17 2023-03-28 深圳壹账通智能科技有限公司 Text content extraction method and device, electronic equipment and storage medium
CA3061432A1 (en) * 2019-04-25 2019-07-18 Alibaba Group Holding Limited Identifying entities in electronic medical records
CN110334196B (en) * 2019-06-28 2023-06-27 同济大学 Neural network Chinese problem generation system based on strokes and self-attention mechanism
US10909317B2 (en) * 2019-07-26 2021-02-02 Advanced New Technologies Co., Ltd. Blockchain-based text similarity detection method, apparatus and electronic device
CN110619120B (en) * 2019-08-12 2021-03-02 北京航空航天大学 Language model training method and device
CN110765230B (en) * 2019-09-03 2022-08-09 平安科技(深圳)有限公司 Legal text storage method and device, readable storage medium and terminal equipment
CN111221960A (en) * 2019-10-28 2020-06-02 支付宝(杭州)信息技术有限公司 Text detection method, similarity calculation method, model training method and device
EP4127967A4 (en) 2020-03-23 2024-05-01 Sorcero, Inc. Feature engineering with question generation
JP7416665B2 (en) 2020-06-12 2024-01-17 株式会社日立製作所 Dialogue system and control method for dialogue system
RU2763921C1 (en) * 2021-02-10 2022-01-11 Акционерное общество "Лаборатория Касперского" System and method for creating heuristic rules for detecting fraudulent emails attributed to the category of bec attacks
CN114997162B (en) * 2022-05-26 2024-06-14 中国工商银行股份有限公司 Training data extraction method and device
TWI827409B (en) * 2022-12-20 2023-12-21 綺源碼有限公司 Automatic organization value range mapping method, electronic apparatus and computer readable medium

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5577135A (en) * 1994-03-01 1996-11-19 Apple Computer, Inc. Handwriting signal processing front-end for handwriting recognizers
CN1061449C (en) 1997-11-26 2001-01-31 张立龙 Four-bit keyboard and its entering method
CN1187677C (en) * 2002-03-18 2005-02-02 郑方 Method for inputting Chinese holophrase into computers by using partial stroke
CN1203389C (en) * 2002-05-24 2005-05-25 郑方 Initial four-stroke Chinese sentence input method for computer
US8392446B2 (en) 2007-05-31 2013-03-05 Yahoo! Inc. System and method for providing vector terms related to a search query
CN101593270B (en) * 2008-05-29 2012-01-25 汉王科技股份有限公司 Method for recognizing hand-painted shapes and device thereof
US8175389B2 (en) * 2009-03-30 2012-05-08 Synaptics Incorporated Recognizing handwritten words
US8909514B2 (en) * 2009-12-15 2014-12-09 Microsoft Corporation Unsupervised learning using global features, including for log-linear model word segmentation
KR101252397B1 (en) 2011-06-02 2013-04-08 포항공과대학교 산학협력단 Information Searching Method Using WEB and Spoken Dialogue Method Using The Same
CN103164865B (en) * 2011-12-12 2016-01-27 北京三星通信技术研究有限公司 A kind of method and apparatus that handwriting input is beautified
CN102750556A (en) * 2012-06-01 2012-10-24 山东大学 Off-line handwritten form Chinese character recognition method
CN103970798B (en) * 2013-02-04 2019-05-28 商业对象软件有限公司 The search and matching of data
CN103390358B (en) * 2013-07-03 2015-08-19 广东小天才科技有限公司 Method and device for judging normativity of character writing operation of electronic equipment
JPWO2015145981A1 (en) 2014-03-28 2017-04-13 日本電気株式会社 Multilingual document similarity learning device, multilingual document similarity determining device, multilingual document similarity learning method, multilingual document similarity determining method, and multilingual document similarity learning program
US9524440B2 (en) 2014-04-04 2016-12-20 Myscript System and method for superimposed handwriting recognition technology
CN103971097B (en) * 2014-05-15 2015-05-13 武汉睿智视讯科技有限公司 Vehicle license plate recognition method and system based on multiscale stroke models
KR102396250B1 (en) 2015-07-31 2022-05-09 삼성전자주식회사 Apparatus and Method for determining target word
US10387464B2 (en) * 2015-08-25 2019-08-20 Facebook, Inc. Predicting labels using a deep-learning model
CN105183844A (en) * 2015-09-06 2015-12-23 国家基础地理信息中心 Method for obtaining rarely-used Chinese character library in basic geographic information data
US20170139899A1 (en) * 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device
CN105488031B (en) * 2015-12-09 2018-10-19 北京奇虎科技有限公司 A kind of method and device detecting similar short message
US9792534B2 (en) * 2016-01-13 2017-10-17 Adobe Systems Incorporated Semantic natural language vector space
CN105678339B (en) * 2016-01-15 2018-10-02 合肥工业大学 A kind of Off-line Handwritten Chinese Character cognitive approach with imitative feedback adjustment mechanism
CN105740349B (en) * 2016-01-25 2019-03-08 重庆邮电大学 A kind of sensibility classification method of combination Doc2vec and convolutional neural networks
CN105786782B (en) * 2016-03-25 2018-10-19 北京搜狗信息服务有限公司 A kind of training method and device of term vector
CN106095736A (en) * 2016-06-07 2016-11-09 华东师范大学 A kind of method of field neologisms extraction
US9645998B1 (en) * 2016-06-12 2017-05-09 Apple Inc. Learning new words
CN106295796B (en) * 2016-07-22 2018-12-25 浙江大学 entity link method based on deep learning
CN108345580B (en) 2017-01-22 2020-05-15 创新先进技术有限公司 Word vector processing method and device

Also Published As

Publication number Publication date
TW201828105A (en) 2018-08-01
WO2018136870A1 (en) 2018-07-26
KR102117799B1 (en) 2020-06-02
US10430518B2 (en) 2019-10-01
JP6742653B2 (en) 2020-08-19
EP3559823A1 (en) 2019-10-30
JP2020507155A (en) 2020-03-05
CN108345580B (en) 2020-05-15
US20180210876A1 (en) 2018-07-26
CN111611798A (en) 2020-09-01
CN108345580A (en) 2018-07-31
US20200134262A1 (en) 2020-04-30
CN111611798B (en) 2023-05-16
PH12019501675A1 (en) 2020-03-02
KR20190107033A (en) 2019-09-18
TWI685761B (en) 2020-02-21
US10878199B2 (en) 2020-12-29

Similar Documents

Publication Publication Date Title
SG11201906524TA (en) Word vector processing method and apparatus
SG11201909950QA (en) Identifying entities in electronic medical records
SG11201903895XA (en) Blockchain data processing method and apparatus
SG11201903137XA (en) Three-dimensional graphical user interface for informational input in virtual reality environment
SG11201906476TA (en) Login information processing method and device
SG11201903141QA (en) Business processing method and apparatus
SG11201907679TA (en) Business verification method and apparatus
SG11201903310UA (en) Service control and user identity authentication based on virtual reality
SG11201907320YA (en) Trusted login method, server, and system
SG11201901138XA (en) Facial recognition-based authentication
SG11201903108UA (en) Order information determination method and apparatus
SG11201903582UA (en) Settlement method, entrance control method, and apparatus
SG11201810678WA (en) Glucocorticoid receptor agonist and immunoconjugates thereof
SG11201806541RA (en) Image classification and labeling
SG11201903286RA (en) User identity authentication using virtual reality
SG11201907912YA (en) An appliance operation signal processing system and method
SG11201901550WA (en) Method and apparatus for data processing
SG11201908886TA (en) Consensus node selection method and apparatus, and server
SG11201906395PA (en) Blockchain based data processing method and device
SG11201907394UA (en) Two-dimensional code generation method and device, and two-dimensional code recognition method and device
SG11201907243UA (en) Parallel execution of transactions in a blockchain network based on smart contract whitelists
SG11201906755VA (en) Digital certificate management method, apparatus, and system
SG11201809343RA (en) Systems and methods for correcting error in a first classifier by evaluating classifier output in parallel
SG11201903452SA (en) User location determination based on augmented reality
SG11201906240RA (en) Narrowband time-division duplex frame structure for narrowband communications