JP2022543052A - 文書処理方法、文書処理装置、文書処理機器、コンピュータ可読記憶媒体及びコンピュータプログラム - Google Patents

文書処理方法、文書処理装置、文書処理機器、コンピュータ可読記憶媒体及びコンピュータプログラム Download PDF

Info

Publication number
JP2022543052A
JP2022543052A JP2022506431A JP2022506431A JP2022543052A JP 2022543052 A JP2022543052 A JP 2022543052A JP 2022506431 A JP2022506431 A JP 2022506431A JP 2022506431 A JP2022506431 A JP 2022506431A JP 2022543052 A JP2022543052 A JP 2022543052A
Authority
JP
Japan
Prior art keywords
document
features
processed
type
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2022506431A
Other languages
English (en)
Japanese (ja)
Inventor
明捷 ▲セン▼
厳 許
鼎 梁
学博 劉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Publication of JP2022543052A publication Critical patent/JP2022543052A/ja
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
JP2022506431A 2020-06-29 2021-06-11 文書処理方法、文書処理装置、文書処理機器、コンピュータ可読記憶媒体及びコンピュータプログラム Pending JP2022543052A (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN202010610080.8 2020-06-29
CN202010610080.8A CN111782808A (zh) 2020-06-29 2020-06-29 文档处理方法、装置、设备及计算机可读存储介质
PCT/CN2021/099799 WO2022001637A1 (zh) 2020-06-29 2021-06-11 文档处理方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
JP2022543052A true JP2022543052A (ja) 2022-10-07

Family

ID=72760274

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2022506431A Pending JP2022543052A (ja) 2020-06-29 2021-06-11 文書処理方法、文書処理装置、文書処理機器、コンピュータ可読記憶媒体及びコンピュータプログラム

Country Status (4)

Country Link
JP (1) JP2022543052A (zh)
KR (1) KR20220031097A (zh)
CN (1) CN111782808A (zh)
WO (1) WO2022001637A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782808A (zh) * 2020-06-29 2020-10-16 北京市商汤科技开发有限公司 文档处理方法、装置、设备及计算机可读存储介质
CN112861757B (zh) * 2021-02-23 2022-11-22 天津汇智星源信息技术有限公司 基于文本语义理解的笔录智能审核方法及电子设备
CN113051396B (zh) * 2021-03-08 2023-11-17 北京百度网讯科技有限公司 文档的分类识别方法、装置和电子设备
CN113297951A (zh) * 2021-05-20 2021-08-24 北京市商汤科技开发有限公司 文档处理方法、装置、设备及计算机可读存储介质
CN113742483A (zh) * 2021-08-27 2021-12-03 北京百度网讯科技有限公司 文档分类的方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000285190A (ja) * 1999-03-31 2000-10-13 Toshiba Corp 帳票識別方法および帳票識別装置および記憶媒体
JP2015111467A (ja) * 2015-03-12 2015-06-18 株式会社東芝 手書き文字検索装置、方法及びプログラム
WO2019052403A1 (zh) * 2017-09-12 2019-03-21 腾讯科技(深圳)有限公司 图像文本匹配模型的训练方法、双向搜索方法及相关装置
CN110298338A (zh) * 2019-06-20 2019-10-01 北京易道博识科技有限公司 一种文档图像分类方法及装置
WO2020113468A1 (en) * 2018-12-05 2020-06-11 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for grounding a target video clip in a video

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10354009B2 (en) * 2016-08-24 2019-07-16 Microsoft Technology Licensing, Llc Characteristic-pattern analysis of text
US10936970B2 (en) * 2017-08-31 2021-03-02 Accenture Global Solutions Limited Machine learning document processing
CN110390094B (zh) * 2018-04-20 2023-05-23 伊姆西Ip控股有限责任公司 对文档进行分类的方法、电子设备和计算机程序产品
CN109033478B (zh) * 2018-09-12 2022-08-19 重庆工业职业技术学院 一种用于搜索引擎的文本信息规律分析方法与系统
CN109344815B (zh) * 2018-12-13 2021-08-13 深源恒际科技有限公司 一种文档图像分类方法
CN110008944B (zh) * 2019-02-20 2024-02-13 平安科技(深圳)有限公司 基于模板匹配的ocr识别方法及装置、存储介质
CN110866116A (zh) * 2019-10-25 2020-03-06 远光软件股份有限公司 政策文档的处理方法、装置、存储介质及电子设备
CN111782808A (zh) * 2020-06-29 2020-10-16 北京市商汤科技开发有限公司 文档处理方法、装置、设备及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000285190A (ja) * 1999-03-31 2000-10-13 Toshiba Corp 帳票識別方法および帳票識別装置および記憶媒体
JP2015111467A (ja) * 2015-03-12 2015-06-18 株式会社東芝 手書き文字検索装置、方法及びプログラム
WO2019052403A1 (zh) * 2017-09-12 2019-03-21 腾讯科技(深圳)有限公司 图像文本匹配模型的训练方法、双向搜索方法及相关装置
WO2020113468A1 (en) * 2018-12-05 2020-06-11 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for grounding a target video clip in a video
CN110298338A (zh) * 2019-06-20 2019-10-01 北京易道博识科技有限公司 一种文档图像分类方法及装置

Also Published As

Publication number Publication date
KR20220031097A (ko) 2022-03-11
WO2022001637A1 (zh) 2022-01-06
CN111782808A (zh) 2020-10-16

Similar Documents

Publication Publication Date Title
JP2022543052A (ja) 文書処理方法、文書処理装置、文書処理機器、コンピュータ可読記憶媒体及びコンピュータプログラム
CN107209861B (zh) 使用否定数据优化多类别多媒体数据分类
US10558885B2 (en) Determination method and recording medium
Kalsum et al. Emotion recognition from facial expressions using hybrid feature descriptors
US9864928B2 (en) Compact and robust signature for large scale visual search, retrieval and classification
Kouw et al. Feature-level domain adaptation
US10013637B2 (en) Optimizing multi-class image classification using patch features
Oliveira et al. Automatic graphic logo detection via fast region-based convolutional networks
US8606022B2 (en) Information processing apparatus, method and program
US20200065573A1 (en) Generating variations of a known shred
Gao et al. The labeled multiple canonical correlation analysis for information fusion
CN105631466B (zh) 图像分类的方法及装置
US20170076152A1 (en) Determining a text string based on visual features of a shred
CN111324874B (zh) 一种证件真伪识别方法及装置
CN111340057B (zh) 一种分类模型训练的方法及装置
Sharma et al. Multimodal classification using feature level fusion and SVM
JP2004178569A (ja) データ分類装置、物体認識装置、データ分類方法及び物体認識方法
Duan Characters recognition of binary image using KNN
CN112380369B (zh) 图像检索模型的训练方法、装置、设备和存储介质
Barbosa et al. Automatic voice recognition system based on multiple Support Vector Machines and mel-frequency cepstral coefficients
Kim et al. An improved license plate recognition technique in outdoor image
CN113297951A (zh) 文档处理方法、装置、设备及计算机可读存储介质
CN110852206A (zh) 一种联合全局特征和局部特征的场景识别方法及装置
US20140119641A1 (en) Character recognition apparatus, character recognition method, and computer-readable medium
JP2007188190A (ja) パターン認識装置、パターン認識方法、パターン認識プログラム、および記録媒体

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20220131

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20220131

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20221115

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20230613