CN111401004B - 一种基于机器学习的文章断句方法 - Google Patents
一种基于机器学习的文章断句方法 Download PDFInfo
- Publication number
- CN111401004B CN111401004B CN202010232911.2A CN202010232911A CN111401004B CN 111401004 B CN111401004 B CN 111401004B CN 202010232911 A CN202010232911 A CN 202010232911A CN 111401004 B CN111401004 B CN 111401004B
- Authority
- CN
- China
- Prior art keywords
- sentence
- text
- segmentation model
- feature
- symbol
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000010801 machine learning Methods 0.000 title claims abstract description 12
- 230000011218 segmentation Effects 0.000 claims abstract description 47
- 238000000926 separation method Methods 0.000 claims abstract description 16
- 238000012216 screening Methods 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims description 20
- 238000012937 correction Methods 0.000 claims description 10
- 125000004122 cyclic group Chemical group 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 7
- 239000000126 substance Substances 0.000 abstract description 7
- 238000000605 extraction Methods 0.000 abstract description 4
- 238000007781 pre-processing Methods 0.000 abstract description 3
- 238000011160 research Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010232911.2A CN111401004B (zh) | 2020-03-28 | 2020-03-28 | 一种基于机器学习的文章断句方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010232911.2A CN111401004B (zh) | 2020-03-28 | 2020-03-28 | 一种基于机器学习的文章断句方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111401004A CN111401004A (zh) | 2020-07-10 |
CN111401004B true CN111401004B (zh) | 2023-12-22 |
Family
ID=71433685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010232911.2A Active CN111401004B (zh) | 2020-03-28 | 2020-03-28 | 一种基于机器学习的文章断句方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111401004B (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632988A (zh) * | 2020-12-29 | 2021-04-09 | 文思海辉智科科技有限公司 | 句段的断句方法、装置和电子设备 |
CN112949261A (zh) * | 2021-02-04 | 2021-06-11 | 维沃移动通信有限公司 | 文本还原方法、装置及电子设备 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103268314A (zh) * | 2013-05-02 | 2013-08-28 | 百度在线网络技术(北京)有限公司 | 一种获取泰文断句规则的方法及装置 |
CN103902524A (zh) * | 2012-12-28 | 2014-07-02 | 新疆电力信息通信有限责任公司 | 维吾尔语句子边界识别方法 |
CN107491439A (zh) * | 2017-09-07 | 2017-12-19 | 成都信息工程大学 | 一种基于贝叶斯统计学习的医学古汉语句子切分方法 |
-
2020
- 2020-03-28 CN CN202010232911.2A patent/CN111401004B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902524A (zh) * | 2012-12-28 | 2014-07-02 | 新疆电力信息通信有限责任公司 | 维吾尔语句子边界识别方法 |
CN103268314A (zh) * | 2013-05-02 | 2013-08-28 | 百度在线网络技术(北京)有限公司 | 一种获取泰文断句规则的方法及装置 |
CN107491439A (zh) * | 2017-09-07 | 2017-12-19 | 成都信息工程大学 | 一种基于贝叶斯统计学习的医学古汉语句子切分方法 |
Non-Patent Citations (1)
Title |
---|
黄成哲 等.英文句子边界自动识别.微处理机.2003,第30-34页. * |
Also Published As
Publication number | Publication date |
---|---|
CN111401004A (zh) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108664474B (zh) | 一种基于深度学习的简历解析方法 | |
CN109635279B (zh) | 一种基于神经网络的中文命名实体识别方法 | |
CN108959242B (zh) | 一种基于中文字符词性特征的目标实体识别方法及装置 | |
CN111198948B (zh) | 文本分类校正方法、装置、设备及计算机可读存储介质 | |
CN109933796B (zh) | 一种公告文本关键信息提取方法及设备 | |
CN110175334B (zh) | 基于自定义的知识槽结构的文本知识抽取系统和方法 | |
CN107943911A (zh) | 数据抽取方法、装置、计算机设备及可读存储介质 | |
CN112836052B (zh) | 一种汽车评论文本观点挖掘方法、设备及存储介质 | |
CN109948120B (zh) | 一种基于二元化的简历解析方法 | |
CN111401004B (zh) | 一种基于机器学习的文章断句方法 | |
CN111046660B (zh) | 一种识别文本专业术语的方法及装置 | |
CN106372053B (zh) | 句法分析的方法和装置 | |
CN112016320A (zh) | 基于数据增强的英文标点符号添加方法和系统及设备 | |
CN111737623A (zh) | 网页信息提取方法及相关设备 | |
CN111008526A (zh) | 一种基于双通道神经网络的命名实体识别方法 | |
CN114880468A (zh) | 基于BiLSTM与知识图谱的建筑规范审查方法与系统 | |
CN112380864A (zh) | 一种基于回译的文本三元组标注样本增强方法 | |
CN112395392A (zh) | 一种意图识别方法及装置、可读存储介质 | |
CN115618883A (zh) | 一种业务语义识别方法及装置 | |
CN112784601B (zh) | 关键信息提取方法、装置、电子设备和存储介质 | |
CN112101003B (zh) | 语句文本的切分方法、装置、设备和计算机可读存储介质 | |
CN107451215B (zh) | 特征文本抽取方法及装置 | |
CN110362803B (zh) | 一种基于领域特征词法组合的文本模板生成方法 | |
TW201117024A (en) | A unified machine learning-based Chinese word segmentation and part-of-speech tagging algorithm | |
CN111460834B (zh) | 基于lstm网络的法条语义标注方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240530 Address after: Room 1102-A009, 11th Floor, Zhongxin Wang'an Building, northeast corner of the intersection of Chuangxin Avenue and Wangjiang West Road, High tech Zone, Hefei City, Anhui Province, 230088 Patentee after: Hefei Jiqian Quantum Technology Co.,Ltd. Country or region after: China Address before: 215000 room 505-3, building 1, Suzhou nano City, No. 99, Jinjihu Avenue, Suzhou Industrial Park, Suzhou City, Jiangsu Province Patentee before: Suzhou machine digital core Micro Technology Co.,Ltd. Country or region before: China |