CN116187163A - 一种用于专利文件处理的预训练模型的构建方法及系统 - Google Patents
一种用于专利文件处理的预训练模型的构建方法及系统 Download PDFInfo
- Publication number
- CN116187163A CN116187163A CN202211640990.6A CN202211640990A CN116187163A CN 116187163 A CN116187163 A CN 116187163A CN 202211640990 A CN202211640990 A CN 202211640990A CN 116187163 A CN116187163 A CN 116187163A
- Authority
- CN
- China
- Prior art keywords
- training
- model
- tasks
- prediction
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 214
- 238000012545 processing Methods 0.000 title claims abstract description 25
- 238000010276 construction Methods 0.000 title claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 29
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000004458 analytical method Methods 0.000 claims abstract description 16
- 238000003860 storage Methods 0.000 claims abstract description 6
- 230000007246 mechanism Effects 0.000 claims description 18
- 238000003062 neural network model Methods 0.000 claims description 13
- 230000000873 masking effect Effects 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 7
- 239000013604 expression vector Substances 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000013145 classification model Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 20
- 239000000284 extract Substances 0.000 description 6
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 238000009966 trimming Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/02—Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Geometry (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211640990.6A CN116187163B (zh) | 2022-12-20 | 2022-12-20 | 一种用于专利文件处理的预训练模型的构建方法及系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211640990.6A CN116187163B (zh) | 2022-12-20 | 2022-12-20 | 一种用于专利文件处理的预训练模型的构建方法及系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116187163A true CN116187163A (zh) | 2023-05-30 |
CN116187163B CN116187163B (zh) | 2024-02-20 |
Family
ID=86435502
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211640990.6A Active CN116187163B (zh) | 2022-12-20 | 2022-12-20 | 一种用于专利文件处理的预训练模型的构建方法及系统 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116187163B (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116795789A (zh) * | 2023-08-24 | 2023-09-22 | 卓望信息技术(北京)有限公司 | 自动生成专利检索报告的方法及装置 |
CN116912047A (zh) * | 2023-09-13 | 2023-10-20 | 湘潭大学 | 一种专利结构感知相似性检测方法 |
CN117172323A (zh) * | 2023-11-02 | 2023-12-05 | 知呱呱(天津)大数据技术有限公司 | 一种基于特征对齐的专利多领域知识抽取方法及系统 |
CN117576710A (zh) * | 2024-01-15 | 2024-02-20 | 西湖大学 | 用于大数据分析的基于图生成自然语言文本的方法及装置 |
CN117609902A (zh) * | 2024-01-18 | 2024-02-27 | 知呱呱(天津)大数据技术有限公司 | 一种基于图文多模态双曲嵌入的专利ipc分类方法及系统 |
CN117851373A (zh) * | 2024-03-08 | 2024-04-09 | 南京数策信息科技有限公司 | 一种知识文档分层管理方法、存储介质及管理系统 |
CN117851373B (zh) * | 2024-03-08 | 2024-06-11 | 南京数策信息科技有限公司 | 一种知识文档分层管理方法、存储介质及管理系统 |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111382569A (zh) * | 2018-12-27 | 2020-07-07 | 深圳市优必选科技有限公司 | 对话语料中实体的识别方法、装置和计算机设备 |
CN112784051A (zh) * | 2021-02-05 | 2021-05-11 | 北京信息科技大学 | 专利术语抽取方法 |
CN113609267A (zh) * | 2021-07-21 | 2021-11-05 | 上海交通大学 | 基于GCNDT-MacBERT神经网络框架的话语关系识别方法及系统 |
CN113761890A (zh) * | 2021-08-17 | 2021-12-07 | 汕头市同行网络科技有限公司 | 一种基于bert上下文感知的多层级语义信息检索方法 |
US20210390127A1 (en) * | 2020-06-16 | 2021-12-16 | Virginia Tech Intellectual Properties, Inc. | Methods and systems for generating summaries given documents with questions and answers |
CN113868422A (zh) * | 2021-10-11 | 2021-12-31 | 国家电网有限公司客户服务中心 | 一种多标签稽查工单问题溯源识别方法及装置 |
CN114841173A (zh) * | 2022-07-04 | 2022-08-02 | 北京邮电大学 | 基于预训练模型的学术文本语义特征提取方法、系统和存储介质 |
CN114974463A (zh) * | 2022-05-24 | 2022-08-30 | 中国科学院重庆绿色智能技术研究院 | 一种纳米孔单分子感知信号知识表示学习方法 |
CN115048511A (zh) * | 2022-04-19 | 2022-09-13 | 南京烽火星空通信发展有限公司 | 一种基于Bert的护照版面分析方法 |
CN115062140A (zh) * | 2022-05-27 | 2022-09-16 | 电子科技大学 | 一种bert sum和pgn融合的供应链生态区长文档摘要生成方法 |
-
2022
- 2022-12-20 CN CN202211640990.6A patent/CN116187163B/zh active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111382569A (zh) * | 2018-12-27 | 2020-07-07 | 深圳市优必选科技有限公司 | 对话语料中实体的识别方法、装置和计算机设备 |
US20210390127A1 (en) * | 2020-06-16 | 2021-12-16 | Virginia Tech Intellectual Properties, Inc. | Methods and systems for generating summaries given documents with questions and answers |
CN112784051A (zh) * | 2021-02-05 | 2021-05-11 | 北京信息科技大学 | 专利术语抽取方法 |
CN113609267A (zh) * | 2021-07-21 | 2021-11-05 | 上海交通大学 | 基于GCNDT-MacBERT神经网络框架的话语关系识别方法及系统 |
CN113761890A (zh) * | 2021-08-17 | 2021-12-07 | 汕头市同行网络科技有限公司 | 一种基于bert上下文感知的多层级语义信息检索方法 |
CN113868422A (zh) * | 2021-10-11 | 2021-12-31 | 国家电网有限公司客户服务中心 | 一种多标签稽查工单问题溯源识别方法及装置 |
CN115048511A (zh) * | 2022-04-19 | 2022-09-13 | 南京烽火星空通信发展有限公司 | 一种基于Bert的护照版面分析方法 |
CN114974463A (zh) * | 2022-05-24 | 2022-08-30 | 中国科学院重庆绿色智能技术研究院 | 一种纳米孔单分子感知信号知识表示学习方法 |
CN115062140A (zh) * | 2022-05-27 | 2022-09-16 | 电子科技大学 | 一种bert sum和pgn融合的供应链生态区长文档摘要生成方法 |
CN114841173A (zh) * | 2022-07-04 | 2022-08-02 | 北京邮电大学 | 基于预训练模型的学术文本语义特征提取方法、系统和存储介质 |
Non-Patent Citations (2)
Title |
---|
李舟军 等: "面向自然语言处理的预训练技术研究综述", 《计算机科学》, vol. 47, no. 03, pages 162 - 173 * |
赵旸 等: "基于BERT模型的中文医学文献分类研究", 《数据分析与知识发现》, vol. 4, no. 08, pages 41 - 49 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116795789A (zh) * | 2023-08-24 | 2023-09-22 | 卓望信息技术(北京)有限公司 | 自动生成专利检索报告的方法及装置 |
CN116795789B (zh) * | 2023-08-24 | 2024-04-19 | 卓望信息技术(北京)有限公司 | 自动生成专利检索报告的方法及装置 |
CN116912047A (zh) * | 2023-09-13 | 2023-10-20 | 湘潭大学 | 一种专利结构感知相似性检测方法 |
CN116912047B (zh) * | 2023-09-13 | 2023-11-28 | 湘潭大学 | 一种专利结构感知相似性检测方法 |
CN117172323A (zh) * | 2023-11-02 | 2023-12-05 | 知呱呱(天津)大数据技术有限公司 | 一种基于特征对齐的专利多领域知识抽取方法及系统 |
CN117172323B (zh) * | 2023-11-02 | 2024-01-23 | 知呱呱(天津)大数据技术有限公司 | 一种基于特征对齐的专利多领域知识抽取方法及系统 |
CN117576710A (zh) * | 2024-01-15 | 2024-02-20 | 西湖大学 | 用于大数据分析的基于图生成自然语言文本的方法及装置 |
CN117576710B (zh) * | 2024-01-15 | 2024-05-28 | 西湖大学 | 用于大数据分析的基于图生成自然语言文本的方法及装置 |
CN117609902A (zh) * | 2024-01-18 | 2024-02-27 | 知呱呱(天津)大数据技术有限公司 | 一种基于图文多模态双曲嵌入的专利ipc分类方法及系统 |
CN117609902B (zh) * | 2024-01-18 | 2024-04-05 | 北京知呱呱科技有限公司 | 一种基于图文多模态双曲嵌入的专利ipc分类方法及系统 |
CN117851373A (zh) * | 2024-03-08 | 2024-04-09 | 南京数策信息科技有限公司 | 一种知识文档分层管理方法、存储介质及管理系统 |
CN117851373B (zh) * | 2024-03-08 | 2024-06-11 | 南京数策信息科技有限公司 | 一种知识文档分层管理方法、存储介质及管理系统 |
Also Published As
Publication number | Publication date |
---|---|
CN116187163B (zh) | 2024-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116187163B (zh) | 一种用于专利文件处理的预训练模型的构建方法及系统 | |
CN110348016B (zh) | 基于句子关联注意力机制的文本摘要生成方法 | |
Cao et al. | A joint model for word embedding and word morphology | |
CN112101028B (zh) | 一种多特征双向门控领域专家实体抽取方法及系统 | |
CN112541337B (zh) | 一种基于递归神经网络语言模型的文档模板自动生成方法及系统 | |
CN111061882A (zh) | 一种知识图谱构建方法 | |
CN111209749A (zh) | 一种将深度学习应用于中文分词的方法 | |
CN111814477B (zh) | 一种基于争议焦点实体的争议焦点发现方法、装置及终端 | |
CN110442880B (zh) | 一种机器翻译译文的翻译方法、装置及存储介质 | |
CN114218389A (zh) | 一种基于图神经网络的化工制备领域长文本分类方法 | |
CN115688776A (zh) | 面向中文金融文本的关系抽取方法 | |
CN113065349A (zh) | 基于条件随机场的命名实体识别方法 | |
CN115438709A (zh) | 基于代码属性图的代码相似性检测方法 | |
CN114860942A (zh) | 文本意图分类方法、装置、设备及存储介质 | |
CN112764762B (zh) | 一种将规范文本自动转为可计算逻辑规则的方法及系统 | |
CN114356924A (zh) | 用于从结构化文档提取数据的方法和设备 | |
CN115840815A (zh) | 基于指针关键信息的自动摘要生成方法 | |
CN115481636A (zh) | 一种面向技术文献的技术功效矩阵构建方法 | |
CN114611489A (zh) | 文本逻辑条件抽取ai模型构建方法、抽取方法及系统 | |
Gouws | Deep unsupervised feature learning for natural language processing | |
CN110414002B (zh) | 基于统计和深度学习的智能中文分词方法 | |
CN116304062B (zh) | 一种基于级联深度学习模型的公平竞争审查方法 | |
CN111241827B (zh) | 一种基于句子检索模式的属性抽取方法 | |
CN113255342B (zh) | 一种5g移动业务产品名称识别方法及系统 | |
Yu et al. | Abstractive Text Summarization With Semantic Dependency Graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Country or region after: China Address after: No. 401-1, 4th floor, podium, building 3 and 4, No. 11, Changchun Bridge Road, Haidian District, Beijing 100089 Applicant after: Beijing Zhiguagua Technology Co.,Ltd. Applicant after: Zhiguagua (Tianjin) Big Data Technology Co.,Ltd. Address before: No. 401-1, 4th floor, podium, building 3 and 4, No. 11, Changchun Bridge Road, Haidian District, Beijing 100089 Applicant before: Beijing Zhiguquan Technology Service Co.,Ltd. Country or region before: China Applicant before: Zhiguagua (Tianjin) Big Data Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: No. 401-1, 4th floor, podium, building 3 and 4, No. 11, Changchun Bridge Road, Haidian District, Beijing 100089 Patentee after: Beijing Xinghe Zhiyuan Technology Co.,Ltd. Country or region after: China Patentee after: Zhiguagua (Tianjin) Big Data Technology Co.,Ltd. Address before: No. 401-1, 4th floor, podium, building 3 and 4, No. 11, Changchun Bridge Road, Haidian District, Beijing 100089 Patentee before: Beijing Zhiguagua Technology Co.,Ltd. Country or region before: China Patentee before: Zhiguagua (Tianjin) Big Data Technology Co.,Ltd. |
|
TR01 | Transfer of patent right |
Effective date of registration: 20240508 Address after: No. 401-1, 4th floor, podium, building 3 and 4, No. 11, Changchun Bridge Road, Haidian District, Beijing 100089 Patentee after: Beijing Xinghe Zhiyuan Technology Co.,Ltd. Country or region after: China Address before: No. 401-1, 4th floor, podium, building 3 and 4, No. 11, Changchun Bridge Road, Haidian District, Beijing 100089 Patentee before: Beijing Xinghe Zhiyuan Technology Co.,Ltd. Country or region before: China Patentee before: Zhiguagua (Tianjin) Big Data Technology Co.,Ltd. |