CN110543475A - 一种基于机器学习的财务报表数据自动识别和分析方法 - Google Patents
一种基于机器学习的财务报表数据自动识别和分析方法 Download PDFInfo
- Publication number
- CN110543475A CN110543475A CN201910820809.1A CN201910820809A CN110543475A CN 110543475 A CN110543475 A CN 110543475A CN 201910820809 A CN201910820809 A CN 201910820809A CN 110543475 A CN110543475 A CN 110543475A
- Authority
- CN
- China
- Prior art keywords
- data
- information
- financial
- text
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 38
- 238000010801 machine learning Methods 0.000 title claims abstract description 20
- 238000013075 data extraction Methods 0.000 claims abstract description 14
- 230000002776 aggregation Effects 0.000 claims abstract description 11
- 238000004220 aggregation Methods 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 10
- 238000007405 data analysis Methods 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 51
- 230000008569 process Effects 0.000 claims description 30
- 238000002372 labelling Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 238000012937 correction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000007477 logistic regression Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000002347 injection Methods 0.000 claims description 3
- 239000007924 injection Substances 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims 1
- 238000004140 cleaning Methods 0.000 abstract description 4
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000012216 screening Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820809.1A CN110543475A (zh) | 2019-08-29 | 2019-08-29 | 一种基于机器学习的财务报表数据自动识别和分析方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820809.1A CN110543475A (zh) | 2019-08-29 | 2019-08-29 | 一种基于机器学习的财务报表数据自动识别和分析方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110543475A true CN110543475A (zh) | 2019-12-06 |
Family
ID=68711330
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910820809.1A Pending CN110543475A (zh) | 2019-08-29 | 2019-08-29 | 一种基于机器学习的财务报表数据自动识别和分析方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110543475A (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111190973A (zh) * | 2019-12-31 | 2020-05-22 | 税友软件集团股份有限公司 | 一种申报表的分类方法、装置、设备及存储介质 |
CN111814000A (zh) * | 2020-07-10 | 2020-10-23 | 东软集团(上海)有限公司 | 一种基于模板过滤的异构数据分析方法及系统 |
CN112733505A (zh) * | 2020-12-30 | 2021-04-30 | 科大讯飞股份有限公司 | 文档生成方法和装置、电子设备及存储介质 |
CN112785399A (zh) * | 2021-01-12 | 2021-05-11 | 四川天行健穗金科技有限公司 | 一种用于财税数据的清洗方法及系统 |
WO2022037573A1 (zh) * | 2020-08-17 | 2022-02-24 | 北京市商汤科技开发有限公司 | 表单识别方法、装置、设备及计算机可读存储介质 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102508860A (zh) * | 2011-09-29 | 2012-06-20 | 广州中浩控制技术有限公司 | 一种基于xbrl实例文档的数据挖掘方法 |
US20160300075A1 (en) * | 2013-11-14 | 2016-10-13 | 3M Innovative Properties Company | Systems and method for obfuscating data using dictionary |
CN106611375A (zh) * | 2015-10-22 | 2017-05-03 | 北京大学 | 一种基于文本分析的信用风险评估方法及装置 |
CN107943785A (zh) * | 2017-11-06 | 2018-04-20 | 广东广业开元科技有限公司 | 一种基于大数据的pdf文档处理方法及装置 |
CN108334501A (zh) * | 2018-03-21 | 2018-07-27 | 王欣 | 基于机器学习的电子文档分析系统及方法 |
CN108563783A (zh) * | 2018-04-25 | 2018-09-21 | 张艳 | 一种基于大数据的财务分析管理系统及方法 |
CN109117479A (zh) * | 2018-08-13 | 2019-01-01 | 数据地平线(广州)科技有限公司 | 一种金融文档智能核查方法、装置及存储介质 |
CN109376202A (zh) * | 2018-10-30 | 2019-02-22 | 青岛理工大学 | 一种基于nlp的企业供应关系自动抽取分析方法 |
KR20190064749A (ko) * | 2017-12-01 | 2019-06-11 | 신한금융투자 주식회사 | 지능형 증권 투자 의사결정 지원 방법 및 그 장치 |
-
2019
- 2019-08-29 CN CN201910820809.1A patent/CN110543475A/zh active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102508860A (zh) * | 2011-09-29 | 2012-06-20 | 广州中浩控制技术有限公司 | 一种基于xbrl实例文档的数据挖掘方法 |
US20160300075A1 (en) * | 2013-11-14 | 2016-10-13 | 3M Innovative Properties Company | Systems and method for obfuscating data using dictionary |
CN106611375A (zh) * | 2015-10-22 | 2017-05-03 | 北京大学 | 一种基于文本分析的信用风险评估方法及装置 |
CN107943785A (zh) * | 2017-11-06 | 2018-04-20 | 广东广业开元科技有限公司 | 一种基于大数据的pdf文档处理方法及装置 |
KR20190064749A (ko) * | 2017-12-01 | 2019-06-11 | 신한금융투자 주식회사 | 지능형 증권 투자 의사결정 지원 방법 및 그 장치 |
CN108334501A (zh) * | 2018-03-21 | 2018-07-27 | 王欣 | 基于机器学习的电子文档分析系统及方法 |
CN108563783A (zh) * | 2018-04-25 | 2018-09-21 | 张艳 | 一种基于大数据的财务分析管理系统及方法 |
CN109117479A (zh) * | 2018-08-13 | 2019-01-01 | 数据地平线(广州)科技有限公司 | 一种金融文档智能核查方法、装置及存储介质 |
CN109376202A (zh) * | 2018-10-30 | 2019-02-22 | 青岛理工大学 | 一种基于nlp的企业供应关系自动抽取分析方法 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111190973A (zh) * | 2019-12-31 | 2020-05-22 | 税友软件集团股份有限公司 | 一种申报表的分类方法、装置、设备及存储介质 |
CN111814000A (zh) * | 2020-07-10 | 2020-10-23 | 东软集团(上海)有限公司 | 一种基于模板过滤的异构数据分析方法及系统 |
WO2022037573A1 (zh) * | 2020-08-17 | 2022-02-24 | 北京市商汤科技开发有限公司 | 表单识别方法、装置、设备及计算机可读存储介质 |
CN112733505A (zh) * | 2020-12-30 | 2021-04-30 | 科大讯飞股份有限公司 | 文档生成方法和装置、电子设备及存储介质 |
CN112733505B (zh) * | 2020-12-30 | 2024-04-26 | 中国科学技术大学 | 文档生成方法和装置、电子设备及存储介质 |
CN112785399A (zh) * | 2021-01-12 | 2021-05-11 | 四川天行健穗金科技有限公司 | 一种用于财税数据的清洗方法及系统 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110543475A (zh) | 一种基于机器学习的财务报表数据自动识别和分析方法 | |
Kleber et al. | Cvl-database: An off-line database for writer retrieval, writer identification and word spotting | |
CN109886270B (zh) | 一种面向电子卷宗笔录文本的案件要素识别方法 | |
CN104123550A (zh) | 基于云计算的文本扫描识别方法 | |
CN112434691A (zh) | 基于智能解析识别的hs编码匹配、展示方法、系统及存储介质 | |
US11010543B1 (en) | Systems and methods for table extraction in documents | |
CN111488458B (zh) | 国际贸易商品代码的自动识别处理方法及系统 | |
CN111783710B (zh) | 医药影印件的信息提取方法和系统 | |
CN110909123A (zh) | 一种数据提取方法、装置、终端设备及存储介质 | |
CN112307741A (zh) | 保险行业文档智能化解析方法和装置 | |
Rahman et al. | Bn-htrd: A benchmark dataset for document level offline bangla handwritten text recognition (HTR) and line segmentation | |
CN113111869B (zh) | 提取文字图片及其描述的方法和系统 | |
Kumar et al. | Line based robust script identification for indianlanguages | |
Shetty et al. | Recognition of handwritten digits and English texts using MNIST and EMNIST datasets | |
Chazalon et al. | A Simple and Uniform Way to Introduce Complimentary Asynchronous Interaction Models in an Existing Document Analysis System | |
CN110096574B (zh) | 电商评论分类任务中数据集的建立和后续优化及扩充方案 | |
Saxena et al. | Text extraction systems for printed images: a review | |
Clausner et al. | Unearthing the recent past: digitising and understanding statistical information from census tables | |
Kodada et al. | Unconstrained Handwritten Kannada NumeralRecognition | |
Labarga et al. | An Extensible System for Optical Character Recognition of Maintenance Documents | |
CN111507236B (zh) | 文件处理方法、系统、装置及介质 | |
Wu et al. | Accr: Auto-labeling for ancient chinese handwritten characters recognition on cnn | |
EP3955130A1 (en) | Template-based document extraction | |
Balasooriya | Improving and Measuring OCR Accuracy for Sinhala with Tesseract OCR Engine | |
Poonja et al. | Hindi Text to Speech Conversion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200628 Address after: Room 2103, International Chamber of Commerce Center, Fuhua 3rd road, Futian street, Futian District, Shenzhen City, Guangdong Province Applicant after: Shenzhen origin parameter information technology Co.,Ltd. Address before: 518033 room 2103, International Chamber of Commerce Center, Fuhua Third Road, Futian street, Futian District, Shenzhen City, Guangdong Province Applicant before: Shenzhen origin Parameter Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191206 |
|
RJ01 | Rejection of invention patent application after publication |