CN108062302B - 一种文本信息的识别方法及装置 - Google Patents
一种文本信息的识别方法及装置 Download PDFInfo
- Publication number
- CN108062302B CN108062302B CN201610983648.4A CN201610983648A CN108062302B CN 108062302 B CN108062302 B CN 108062302B CN 201610983648 A CN201610983648 A CN 201610983648A CN 108062302 B CN108062302 B CN 108062302B
- Authority
- CN
- China
- Prior art keywords
- text information
- text
- character
- vectorization
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000007704 transition Effects 0.000 claims abstract description 42
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 29
- 238000012546 transfer Methods 0.000 claims description 32
- 238000003860 storage Methods 0.000 claims description 14
- 230000011218 segmentation Effects 0.000 claims description 11
- 230000002123 temporal effect Effects 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 10
- 230000008901 benefit Effects 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 15
- 238000004590 computer program Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000012549 training Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000155 melt Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011017 operating method Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (12)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610983648.4A CN108062302B (zh) | 2016-11-08 | 2016-11-08 | 一种文本信息的识别方法及装置 |
US16/347,860 US11010554B2 (en) | 2016-11-08 | 2017-11-08 | Method and device for identifying specific text information |
PCT/CN2017/109841 WO2018086519A1 (zh) | 2016-11-08 | 2017-11-08 | 一种特定文本信息的识别方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610983648.4A CN108062302B (zh) | 2016-11-08 | 2016-11-08 | 一种文本信息的识别方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108062302A CN108062302A (zh) | 2018-05-22 |
CN108062302B true CN108062302B (zh) | 2019-03-26 |
Family
ID=62109352
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610983648.4A Active CN108062302B (zh) | 2016-11-08 | 2016-11-08 | 一种文本信息的识别方法及装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11010554B2 (zh) |
CN (1) | CN108062302B (zh) |
WO (1) | WO2018086519A1 (zh) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446508B (zh) * | 2018-10-19 | 2023-06-02 | 科大讯飞股份有限公司 | 一种文本规整方法、装置、设备及可读存储介质 |
CN109657231B (zh) * | 2018-11-09 | 2023-04-07 | 广东电网有限责任公司 | 一种长短信精简方法及系统 |
CN111368838A (zh) * | 2018-12-26 | 2020-07-03 | 珠海金山网络游戏科技有限公司 | 一种举报截图的识别方法及装置 |
CN111488757B (zh) * | 2019-01-25 | 2023-06-23 | 富士通株式会社 | 用于对图像的识别结果进行分割的方法和设备及存储介质 |
CN112560470A (zh) * | 2019-09-06 | 2021-03-26 | 富士通株式会社 | 生成有限状态自动机的方法和装置以及识别方法 |
CN112925837B (zh) * | 2019-12-06 | 2022-08-02 | 上海高德威智能交通系统有限公司 | 文本结构化方法及装置 |
CN111125329B (zh) * | 2019-12-18 | 2023-07-21 | 东软集团股份有限公司 | 一种文本信息筛选方法、装置及设备 |
US20220092452A1 (en) * | 2020-09-18 | 2022-03-24 | Tibco Software Inc. | Automated machine learning tool for explaining the effects of complex text on predictive results |
CN112926587B (zh) * | 2021-02-19 | 2024-03-29 | 北京大米未来科技有限公司 | 一种文本识别的方法、装置、可读存储介质和电子设备 |
CN116912845B (zh) * | 2023-06-16 | 2024-03-19 | 广东电网有限责任公司佛山供电局 | 一种基于nlp与ai的智能内容识别与分析方法及装置 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1922610A (zh) * | 2004-02-24 | 2007-02-28 | 皇家飞利浦电子股份有限公司 | 一种节目内容定位方法和装置 |
CN1924995A (zh) * | 2005-08-31 | 2007-03-07 | 中国科学院声学研究所 | 基于内容分析的短信问答系统及实现方法 |
CN101488927A (zh) * | 2009-02-19 | 2009-07-22 | 腾讯科技(深圳)有限公司 | 即时通讯装置管理文字信息的方法及一种即时通讯装置 |
CN103294664A (zh) * | 2013-07-04 | 2013-09-11 | 清华大学 | 开放领域新词发现的方法及系统 |
CN103425691A (zh) * | 2012-05-22 | 2013-12-04 | 阿里巴巴集团控股有限公司 | 一种搜索方法和系统 |
CN104331438A (zh) * | 2014-10-24 | 2015-02-04 | 北京奇虎科技有限公司 | 对小说网页内容选择性抽取方法和装置 |
CN104866478A (zh) * | 2014-02-21 | 2015-08-26 | 腾讯科技(深圳)有限公司 | 恶意文本的检测识别方法及装置 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI256562B (en) * | 2002-05-03 | 2006-06-11 | Ind Tech Res Inst | Method for named-entity recognition and verification |
CN101075228B (zh) * | 2006-05-15 | 2012-05-23 | 松下电器产业株式会社 | 识别自然语言中的命名实体的方法和装置 |
CN102314417A (zh) * | 2011-09-22 | 2012-01-11 | 西安电子科技大学 | 基于统计模型的Web命名实体识别方法 |
CN102360383B (zh) * | 2011-10-15 | 2013-07-31 | 西安交通大学 | 一种面向文本的领域术语与术语关系抽取方法 |
CN103164426B (zh) | 2011-12-13 | 2015-10-28 | 北大方正集团有限公司 | 一种命名实体识别的方法及装置 |
CN106021227B (zh) * | 2016-05-16 | 2018-08-21 | 南京大学 | 一种基于状态转移与神经网络的汉语组块分析方法 |
-
2016
- 2016-11-08 CN CN201610983648.4A patent/CN108062302B/zh active Active
-
2017
- 2017-11-08 US US16/347,860 patent/US11010554B2/en active Active
- 2017-11-08 WO PCT/CN2017/109841 patent/WO2018086519A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1922610A (zh) * | 2004-02-24 | 2007-02-28 | 皇家飞利浦电子股份有限公司 | 一种节目内容定位方法和装置 |
CN1924995A (zh) * | 2005-08-31 | 2007-03-07 | 中国科学院声学研究所 | 基于内容分析的短信问答系统及实现方法 |
CN101488927A (zh) * | 2009-02-19 | 2009-07-22 | 腾讯科技(深圳)有限公司 | 即时通讯装置管理文字信息的方法及一种即时通讯装置 |
CN103425691A (zh) * | 2012-05-22 | 2013-12-04 | 阿里巴巴集团控股有限公司 | 一种搜索方法和系统 |
CN103294664A (zh) * | 2013-07-04 | 2013-09-11 | 清华大学 | 开放领域新词发现的方法及系统 |
CN104866478A (zh) * | 2014-02-21 | 2015-08-26 | 腾讯科技(深圳)有限公司 | 恶意文本的检测识别方法及装置 |
CN104331438A (zh) * | 2014-10-24 | 2015-02-04 | 北京奇虎科技有限公司 | 对小说网页内容选择性抽取方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN108062302A (zh) | 2018-05-22 |
US11010554B2 (en) | 2021-05-18 |
US20190272319A1 (en) | 2019-09-05 |
WO2018086519A1 (zh) | 2018-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108062302B (zh) | 一种文本信息的识别方法及装置 | |
CN108596882B (zh) | 病理图片的识别方法及装置 | |
CN109816032B (zh) | 基于生成式对抗网络的无偏映射零样本分类方法和装置 | |
CN108733837B (zh) | 一种病历文本的自然语言结构化方法及装置 | |
CN110287477A (zh) | 实体情感分析方法及相关装置 | |
CN110196908A (zh) | 数据分类方法、装置、计算机装置及存储介质 | |
CN110287961A (zh) | 中文分词方法、电子装置及可读存储介质 | |
CN111914944A (zh) | 基于动态样本选择和损失一致性的物体检测方法和系统 | |
CN112990222B (zh) | 一种基于图像边界知识迁移的引导语义分割方法 | |
CN109886554A (zh) | 违规行为判别方法、装置、计算机设备和存储介质 | |
CN109344346A (zh) | 网页信息提取方法和装置 | |
CN109918658A (zh) | 一种从文本中获取目标词汇的方法及系统 | |
CN113763371A (zh) | 病理图像的细胞核分割方法及装置 | |
CN114490926A (zh) | 一种相似问题的确定方法、装置、存储介质及终端 | |
CN116719748B (zh) | 一种船舶系统的场景生成方法、装置及介质 | |
JP2019160252A (ja) | 学習識別装置および学習識別方法 | |
JP2019160256A (ja) | 学習識別装置および学習識別方法 | |
CN117351273A (zh) | 基于因果知识引导的电力设备局部放电故障诊断方法 | |
CN110929516A (zh) | 文本的情感分析方法、装置、电子设备及可读存储介质 | |
CN115936003A (zh) | 基于神经网络的软件功能点查重方法、装置、设备及介质 | |
CN110263163A (zh) | 一种获取文本摘要的方法和装置 | |
CN112419098B (zh) | 基于安全信息熵的电网安全稳定仿真样本筛选扩充方法 | |
CN109492086A (zh) | 一种答案输出方法、装置、电子设备及存储介质 | |
CN108108371A (zh) | 一种文本分类方法及装置 | |
CN109558582B (zh) | 基于视角的句子情感分析方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Patentee after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Patentee before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. |
|
CP02 | Change in the address of a patent holder | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Method and device for identifying text information Effective date of registration: 20190531 Granted publication date: 20190326 Pledgee: Shenzhen Black Horse World Investment Consulting Co.,Ltd. Pledgor: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. Registration number: 2019990000503 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PP01 | Preservation of patent right |
Effective date of registration: 20240604 Granted publication date: 20190326 |