SG11201906524TA - Word vector processing method and apparatus - Google Patents
Word vector processing method and apparatusInfo
- Publication number
- SG11201906524TA SG11201906524TA SG11201906524TA SG11201906524TA SG11201906524TA SG 11201906524T A SG11201906524T A SG 11201906524TA SG 11201906524T A SG11201906524T A SG 11201906524TA SG 11201906524T A SG11201906524T A SG 11201906524TA SG 11201906524T A SG11201906524T A SG 11201906524TA
- Authority
- SG
- Singapore
- Prior art keywords
- word
- international
- vectors
- words
- stroke
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/53—Processing of non-Latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Machine Translation (AREA)
- Character Discrimination (AREA)
- Document Processing Apparatus (AREA)
Abstract
INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (43) International Publication Date 26 July 2018 (26.07.2018) WIP0 I PCT omit VIII °nolo moilowo oimIE (10) International Publication Number WO 2018/136870 Al (51) International Patent Classification: TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, GOOF 17/28 (2006.01) GOOF 17/27 (2006.01) (21) International Application Number: PCT/US2018/014680 (22) International Filing Date: 22 January 2018 (22.01.2018) (25) Filing Language: English (26) Publication Language: English (30) Priority Data: 201710045459.7 22 January 2017 (22.01.2017) CN 15/874,725 18 January 2018 (18.01.2018) US (71) Applicant: ALIBABA GROUP HOLDING LIMITED [ /US]; Fourth Floor, One Capital Place, P.O. Box 847, George Town, Grand Cayman (KY). (72) Inventors: CAO, Shaosheng; c/o Ants Patent Team, 17F Building B, Huanglong Times Plaza, No.18 Wantang Road, Hangzhou, 310099 (CN). LI, Xiaolong; c/o Ants Patent Team, 17F Building B, Huanglong Times Plaza, No.18 Wantang Road, Hangzhou, 310099 (CN). (74) Agent: STALFORD, Terry, J.; Fish & Richardson P.C., P.O. Box 1022, Minneapolis, MN 55440-1022 (US). (81) Designated States (unless otherwise indicated, for every kind of national protection available): AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IR, IS, JO, JP, KE, KG, KH, KN, KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (84) Designated States (unless otherwise indicated, for every kind of regional protection available): ARIPO (BW, GH, GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, KM, ML, MR, NE, SN, TD, TG). Published: — with international search report (Art. 21(3)) 1-1 (54) Title: WORD VECTOR PROCESSING METHOD AND APPARATUS ot (57) : Embodiments of the present application disclose a word vector processing method and apparatus. The method includes: performing word segmentation on a corpus to obtain words; determining n-gram strokes corresponding to the words, the n-gram stroke representing n successive strokes of a corresponding word; establishing and initializing word vectors of the words and stroke vectors 1-1 of the n-gram strokes corresponding to the words; and training the word vectors and the stroke vectors according to the corpus obtained after the word segmentation, the word vectors, and the stroke vectors. With the embodiments of the present application, features of a 1-1 © word can be shown more precisely by using n-gram strokes corresponding to the word, thus enhancing accuracy of word vectors of N Chinese words and achieving a desirable practical effect.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710045459.7A CN108345580B (en) | 2017-01-22 | 2017-01-22 | Word vector processing method and device |
US15/874,725 US10430518B2 (en) | 2017-01-22 | 2018-01-18 | Word vector processing for foreign languages |
PCT/US2018/014680 WO2018136870A1 (en) | 2017-01-22 | 2018-01-22 | Word vector processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
SG11201906524TA true SG11201906524TA (en) | 2019-08-27 |
Family
ID=62906491
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
SG11201906524TA SG11201906524TA (en) | 2017-01-22 | 2018-01-22 | Word vector processing method and apparatus |
Country Status (9)
Country | Link |
---|---|
US (2) | US10430518B2 (en) |
EP (1) | EP3559823A1 (en) |
JP (1) | JP6742653B2 (en) |
KR (1) | KR102117799B1 (en) |
CN (2) | CN108345580B (en) |
PH (1) | PH12019501675A1 (en) |
SG (1) | SG11201906524TA (en) |
TW (1) | TWI685761B (en) |
WO (1) | WO2018136870A1 (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345580B (en) | 2017-01-22 | 2020-05-15 | 创新先进技术有限公司 | Word vector processing method and device |
CN110119507A (en) * | 2018-02-05 | 2019-08-13 | 阿里巴巴集团控股有限公司 | Term vector generation method, device and equipment |
CN109271622B (en) * | 2018-08-08 | 2021-05-14 | 山西大学 | Low-dimensional word feature learning method based on frequency distribution correction |
CN110929508B (en) * | 2018-09-20 | 2023-05-02 | 阿里巴巴集团控股有限公司 | Word vector generation method, device and system |
CN110956034B (en) * | 2018-09-21 | 2023-04-11 | 阿里巴巴集团控股有限公司 | Word acquisition method and device and commodity search method |
CN111274793B (en) * | 2018-11-19 | 2023-04-28 | 阿里巴巴集团控股有限公司 | Text processing method and device and computing equipment |
CN110059155A (en) * | 2018-12-18 | 2019-07-26 | 阿里巴巴集团控股有限公司 | The calculating of text similarity, intelligent customer service system implementation method and device |
CN109657062A (en) * | 2018-12-24 | 2019-04-19 | 万达信息股份有限公司 | A kind of electronic health record text resolution closed-loop policy based on big data technology |
CN111353016B (en) * | 2018-12-24 | 2023-04-18 | 阿里巴巴集团控股有限公司 | Text processing method and device |
CN109933686B (en) * | 2019-03-18 | 2023-02-03 | 创新先进技术有限公司 | Song label prediction method, device, server and storage medium |
CN110222144B (en) * | 2019-04-17 | 2023-03-28 | 深圳壹账通智能科技有限公司 | Text content extraction method and device, electronic equipment and storage medium |
CA3061432A1 (en) * | 2019-04-25 | 2019-07-18 | Alibaba Group Holding Limited | Identifying entities in electronic medical records |
CN110334196B (en) * | 2019-06-28 | 2023-06-27 | 同济大学 | Neural network Chinese problem generation system based on strokes and self-attention mechanism |
US10909317B2 (en) * | 2019-07-26 | 2021-02-02 | Advanced New Technologies Co., Ltd. | Blockchain-based text similarity detection method, apparatus and electronic device |
CN110619120B (en) * | 2019-08-12 | 2021-03-02 | 北京航空航天大学 | Language model training method and device |
CN110765230B (en) * | 2019-09-03 | 2022-08-09 | 平安科技(深圳)有限公司 | Legal text storage method and device, readable storage medium and terminal equipment |
CN111221960A (en) * | 2019-10-28 | 2020-06-02 | 支付宝(杭州)信息技术有限公司 | Text detection method, similarity calculation method, model training method and device |
EP4127967A4 (en) | 2020-03-23 | 2024-05-01 | Sorcero, Inc. | Feature engineering with question generation |
JP7416665B2 (en) | 2020-06-12 | 2024-01-17 | 株式会社日立製作所 | Dialogue system and control method for dialogue system |
RU2763921C1 (en) * | 2021-02-10 | 2022-01-11 | Акционерное общество "Лаборатория Касперского" | System and method for creating heuristic rules for detecting fraudulent emails attributed to the category of bec attacks |
CN114997162B (en) * | 2022-05-26 | 2024-06-14 | 中国工商银行股份有限公司 | Training data extraction method and device |
TWI827409B (en) * | 2022-12-20 | 2023-12-21 | 綺源碼有限公司 | Automatic organization value range mapping method, electronic apparatus and computer readable medium |
Family Cites Families (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5577135A (en) * | 1994-03-01 | 1996-11-19 | Apple Computer, Inc. | Handwriting signal processing front-end for handwriting recognizers |
CN1061449C (en) | 1997-11-26 | 2001-01-31 | 张立龙 | Four-bit keyboard and its entering method |
CN1187677C (en) * | 2002-03-18 | 2005-02-02 | 郑方 | Method for inputting Chinese holophrase into computers by using partial stroke |
CN1203389C (en) * | 2002-05-24 | 2005-05-25 | 郑方 | Initial four-stroke Chinese sentence input method for computer |
US8392446B2 (en) | 2007-05-31 | 2013-03-05 | Yahoo! Inc. | System and method for providing vector terms related to a search query |
CN101593270B (en) * | 2008-05-29 | 2012-01-25 | 汉王科技股份有限公司 | Method for recognizing hand-painted shapes and device thereof |
US8175389B2 (en) * | 2009-03-30 | 2012-05-08 | Synaptics Incorporated | Recognizing handwritten words |
US8909514B2 (en) * | 2009-12-15 | 2014-12-09 | Microsoft Corporation | Unsupervised learning using global features, including for log-linear model word segmentation |
KR101252397B1 (en) | 2011-06-02 | 2013-04-08 | 포항공과대학교 산학협력단 | Information Searching Method Using WEB and Spoken Dialogue Method Using The Same |
CN103164865B (en) * | 2011-12-12 | 2016-01-27 | 北京三星通信技术研究有限公司 | A kind of method and apparatus that handwriting input is beautified |
CN102750556A (en) * | 2012-06-01 | 2012-10-24 | 山东大学 | Off-line handwritten form Chinese character recognition method |
CN103970798B (en) * | 2013-02-04 | 2019-05-28 | 商业对象软件有限公司 | The search and matching of data |
CN103390358B (en) * | 2013-07-03 | 2015-08-19 | 广东小天才科技有限公司 | Method and device for judging normativity of character writing operation of electronic equipment |
JPWO2015145981A1 (en) | 2014-03-28 | 2017-04-13 | 日本電気株式会社 | Multilingual document similarity learning device, multilingual document similarity determining device, multilingual document similarity learning method, multilingual document similarity determining method, and multilingual document similarity learning program |
US9524440B2 (en) | 2014-04-04 | 2016-12-20 | Myscript | System and method for superimposed handwriting recognition technology |
CN103971097B (en) * | 2014-05-15 | 2015-05-13 | 武汉睿智视讯科技有限公司 | Vehicle license plate recognition method and system based on multiscale stroke models |
KR102396250B1 (en) | 2015-07-31 | 2022-05-09 | 삼성전자주식회사 | Apparatus and Method for determining target word |
US10387464B2 (en) * | 2015-08-25 | 2019-08-20 | Facebook, Inc. | Predicting labels using a deep-learning model |
CN105183844A (en) * | 2015-09-06 | 2015-12-23 | 国家基础地理信息中心 | Method for obtaining rarely-used Chinese character library in basic geographic information data |
US20170139899A1 (en) * | 2015-11-18 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Keyword extraction method and electronic device |
CN105488031B (en) * | 2015-12-09 | 2018-10-19 | 北京奇虎科技有限公司 | A kind of method and device detecting similar short message |
US9792534B2 (en) * | 2016-01-13 | 2017-10-17 | Adobe Systems Incorporated | Semantic natural language vector space |
CN105678339B (en) * | 2016-01-15 | 2018-10-02 | 合肥工业大学 | A kind of Off-line Handwritten Chinese Character cognitive approach with imitative feedback adjustment mechanism |
CN105740349B (en) * | 2016-01-25 | 2019-03-08 | 重庆邮电大学 | A kind of sensibility classification method of combination Doc2vec and convolutional neural networks |
CN105786782B (en) * | 2016-03-25 | 2018-10-19 | 北京搜狗信息服务有限公司 | A kind of training method and device of term vector |
CN106095736A (en) * | 2016-06-07 | 2016-11-09 | 华东师范大学 | A kind of method of field neologisms extraction |
US9645998B1 (en) * | 2016-06-12 | 2017-05-09 | Apple Inc. | Learning new words |
CN106295796B (en) * | 2016-07-22 | 2018-12-25 | 浙江大学 | entity link method based on deep learning |
CN108345580B (en) | 2017-01-22 | 2020-05-15 | 创新先进技术有限公司 | Word vector processing method and device |
-
2017
- 2017-01-22 CN CN201710045459.7A patent/CN108345580B/en active Active
- 2017-01-22 CN CN202010459596.7A patent/CN111611798B/en active Active
- 2017-11-10 TW TW106138932A patent/TWI685761B/en active
-
2018
- 2018-01-18 US US15/874,725 patent/US10430518B2/en active Active
- 2018-01-22 WO PCT/US2018/014680 patent/WO2018136870A1/en unknown
- 2018-01-22 JP JP2019539241A patent/JP6742653B2/en active Active
- 2018-01-22 KR KR1020197021351A patent/KR102117799B1/en active IP Right Grant
- 2018-01-22 SG SG11201906524TA patent/SG11201906524TA/en unknown
- 2018-01-22 EP EP18702885.7A patent/EP3559823A1/en not_active Withdrawn
-
2019
- 2019-07-19 PH PH12019501675A patent/PH12019501675A1/en unknown
- 2019-09-30 US US16/587,676 patent/US10878199B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
TW201828105A (en) | 2018-08-01 |
WO2018136870A1 (en) | 2018-07-26 |
KR102117799B1 (en) | 2020-06-02 |
US10430518B2 (en) | 2019-10-01 |
JP6742653B2 (en) | 2020-08-19 |
EP3559823A1 (en) | 2019-10-30 |
JP2020507155A (en) | 2020-03-05 |
CN108345580B (en) | 2020-05-15 |
US20180210876A1 (en) | 2018-07-26 |
CN111611798A (en) | 2020-09-01 |
CN108345580A (en) | 2018-07-31 |
US20200134262A1 (en) | 2020-04-30 |
CN111611798B (en) | 2023-05-16 |
PH12019501675A1 (en) | 2020-03-02 |
KR20190107033A (en) | 2019-09-18 |
TWI685761B (en) | 2020-02-21 |
US10878199B2 (en) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
SG11201906524TA (en) | Word vector processing method and apparatus | |
SG11201909950QA (en) | Identifying entities in electronic medical records | |
SG11201903895XA (en) | Blockchain data processing method and apparatus | |
SG11201903137XA (en) | Three-dimensional graphical user interface for informational input in virtual reality environment | |
SG11201906476TA (en) | Login information processing method and device | |
SG11201903141QA (en) | Business processing method and apparatus | |
SG11201907679TA (en) | Business verification method and apparatus | |
SG11201903310UA (en) | Service control and user identity authentication based on virtual reality | |
SG11201907320YA (en) | Trusted login method, server, and system | |
SG11201901138XA (en) | Facial recognition-based authentication | |
SG11201903108UA (en) | Order information determination method and apparatus | |
SG11201903582UA (en) | Settlement method, entrance control method, and apparatus | |
SG11201810678WA (en) | Glucocorticoid receptor agonist and immunoconjugates thereof | |
SG11201806541RA (en) | Image classification and labeling | |
SG11201903286RA (en) | User identity authentication using virtual reality | |
SG11201907912YA (en) | An appliance operation signal processing system and method | |
SG11201901550WA (en) | Method and apparatus for data processing | |
SG11201908886TA (en) | Consensus node selection method and apparatus, and server | |
SG11201906395PA (en) | Blockchain based data processing method and device | |
SG11201907394UA (en) | Two-dimensional code generation method and device, and two-dimensional code recognition method and device | |
SG11201907243UA (en) | Parallel execution of transactions in a blockchain network based on smart contract whitelists | |
SG11201906755VA (en) | Digital certificate management method, apparatus, and system | |
SG11201809343RA (en) | Systems and methods for correcting error in a first classifier by evaluating classifier output in parallel | |
SG11201903452SA (en) | User location determination based on augmented reality | |
SG11201906240RA (en) | Narrowband time-division duplex frame structure for narrowband communications |