JP6968270B2 - 画像の記述文位置決定方法及び装置、電子機器並びに記憶媒体 - Google Patents

画像の記述文位置決定方法及び装置、電子機器並びに記憶媒体 Download PDF

Info

Publication number
JP6968270B2
JP6968270B2 JP2020517564A JP2020517564A JP6968270B2 JP 6968270 B2 JP6968270 B2 JP 6968270B2 JP 2020517564 A JP2020517564 A JP 2020517564A JP 2020517564 A JP2020517564 A JP 2020517564A JP 6968270 B2 JP6968270 B2 JP 6968270B2
Authority
JP
Japan
Prior art keywords
image
analyzed
sample
subject
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2020517564A
Other languages
English (en)
Japanese (ja)
Other versions
JP2021509979A (ja
JP2021509979A5 (https=
Inventor
シーフイ リウ
ジン シャオ
ズーハオ ワン
ホンション リー
シャオガン ワン
Original Assignee
ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド filed Critical ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド
Publication of JP2021509979A publication Critical patent/JP2021509979A/ja
Publication of JP2021509979A5 publication Critical patent/JP2021509979A5/ja
Application granted granted Critical
Publication of JP6968270B2 publication Critical patent/JP6968270B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/30Scenes; Scene-specific elements in albums, collections or shared content, e.g. social network photos or video
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1916Validation; Performance evaluation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
JP2020517564A 2018-11-30 2019-05-09 画像の記述文位置決定方法及び装置、電子機器並びに記憶媒体 Active JP6968270B2 (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201811459428.7A CN109614613B (zh) 2018-11-30 2018-11-30 图像的描述语句定位方法及装置、电子设备和存储介质
CN201811459428.7 2018-11-30
PCT/CN2019/086274 WO2020107813A1 (zh) 2018-11-30 2019-05-09 图像的描述语句定位方法及装置、电子设备和存储介质

Publications (3)

Publication Number Publication Date
JP2021509979A JP2021509979A (ja) 2021-04-08
JP2021509979A5 JP2021509979A5 (https=) 2021-05-20
JP6968270B2 true JP6968270B2 (ja) 2021-11-17

Family

ID=66006570

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2020517564A Active JP6968270B2 (ja) 2018-11-30 2019-05-09 画像の記述文位置決定方法及び装置、電子機器並びに記憶媒体

Country Status (7)

Country Link
US (1) US11455788B2 (https=)
JP (1) JP6968270B2 (https=)
KR (1) KR102454930B1 (https=)
CN (1) CN109614613B (https=)
SG (1) SG11202003836YA (https=)
TW (1) TWI728564B (https=)
WO (1) WO2020107813A1 (https=)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614613B (zh) * 2018-11-30 2020-07-31 北京市商汤科技开发有限公司 图像的描述语句定位方法及装置、电子设备和存储介质
CN110096707B (zh) * 2019-04-29 2020-09-29 北京三快在线科技有限公司 生成自然语言的方法、装置、设备及可读存储介质
CN110263755B (zh) * 2019-06-28 2021-04-27 上海鹰瞳医疗科技有限公司 眼底图像识别模型训练方法、眼底图像识别方法和设备
US11308492B2 (en) 2019-07-03 2022-04-19 Sap Se Anomaly and fraud detection with fake event detection using pixel intensity testing
US20210004795A1 (en) * 2019-07-03 2021-01-07 Sap Se Anomaly and fraud detection using duplicate event detector
US12039615B2 (en) 2019-07-03 2024-07-16 Sap Se Anomaly and fraud detection with fake event detection using machine learning
CN110413819B (zh) * 2019-07-12 2022-03-29 深兰科技(上海)有限公司 一种图片描述信息的获取方法及装置
CN110516677A (zh) * 2019-08-23 2019-11-29 上海云绅智能科技有限公司 一种神经网络识别模型、目标识别方法及系统
US11461613B2 (en) * 2019-12-05 2022-10-04 Naver Corporation Method and apparatus for multi-document question answering
CN111277759B (zh) * 2020-02-27 2021-08-31 Oppo广东移动通信有限公司 构图提示方法、装置、存储介质及电子设备
CN111738186B (zh) * 2020-06-28 2024-02-02 香港中文大学(深圳) 目标定位方法、装置、电子设备及可读存储介质
CN111859005B (zh) * 2020-07-01 2022-03-29 江西理工大学 一种跨层多模型特征融合与基于卷积解码的图像描述方法
KR102451299B1 (ko) * 2020-09-03 2022-10-06 고려대학교 세종산학협력단 동물의 상황인지를 통한 캡션 생성 시스템
CN112084319B (zh) * 2020-09-29 2021-03-16 四川省人工智能研究院(宜宾) 一种基于动作的关系网络视频问答系统及方法
WO2022130509A1 (ja) * 2020-12-15 2022-06-23 日本電信電話株式会社 物体検出装置、物体検出方法、及び物体検出プログラム
CN113298083B (zh) * 2021-02-25 2025-03-07 阿里巴巴集团控股有限公司 一种数据处理方法及装置
US12147497B2 (en) * 2021-05-19 2024-11-19 Baidu Usa Llc Systems and methods for cross-lingual cross-modal training for multimodal retrieval
CN113761153B (zh) * 2021-05-19 2023-10-24 腾讯科技(深圳)有限公司 基于图片的问答处理方法、装置、可读介质及电子设备
US12482464B2 (en) 2021-12-07 2025-11-25 Deepmind Technologies Limited Controlling interactive agents using multi-modal inputs
CN117911727A (zh) * 2022-07-08 2024-04-19 鸿海精密工业股份有限公司 辨识方法及其电子装置
JP7835304B2 (ja) * 2022-11-14 2026-03-25 Ntt株式会社 行動認識学習装置、行動認識推定装置、行動認識学習方法、及び行動認識学習プログラム
CN116012835B (zh) * 2023-02-20 2026-02-24 张国栋 一种基于文本分割的两阶段场景文本擦除方法
CN118037888B (zh) * 2024-02-01 2024-10-01 嘉达鼎新信息技术(苏州)有限公司 基于图像分析和语言描述的ai图像生成方法及系统

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10007897C1 (de) 2000-02-21 2001-06-28 Siemens Ag Verfahren zum Verteilen von Sendungen
US7181054B2 (en) * 2001-08-31 2007-02-20 Siemens Medical Solutions Health Services Corporation System for processing image representative data
DE602006021408D1 (de) 2005-04-27 2011-06-01 Univ Leiden Medical Ct Behandlung von hpv-induzierter intraepithelialer anogenitaler neoplasien
US7835820B2 (en) * 2005-10-11 2010-11-16 Vanderbilt University System and method for image mapping and visual attention
WO2008017430A1 (de) * 2006-08-07 2008-02-14 MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. Verfahren zur herstellung skalierbarer bildmatrizen
TWI464604B (zh) * 2010-11-29 2014-12-11 Ind Tech Res Inst 資料分群方法與裝置、資料處理裝置及影像處理裝置
US8428363B2 (en) * 2011-04-29 2013-04-23 Mitsubishi Electric Research Laboratories, Inc. Method for segmenting images using superpixels and entropy rate clustering
CN103106239A (zh) * 2012-12-10 2013-05-15 江苏乐买到网络科技有限公司 一种图像中对象的识别方法和装置
TWI528197B (zh) * 2013-09-26 2016-04-01 財團法人資訊工業策進會 相片分群系統及相片分群方法與電腦可讀取記錄媒體
US9477908B2 (en) * 2014-04-10 2016-10-25 Disney Enterprises, Inc. Multi-level framework for object detection
US9965705B2 (en) * 2015-11-03 2018-05-08 Baidu Usa Llc Systems and methods for attention-based configurable convolutional neural networks (ABC-CNN) for visual question answering
GB2545661A (en) * 2015-12-21 2017-06-28 Nokia Technologies Oy A method for analysing media content
CN106777999A (zh) * 2016-12-26 2017-05-31 上海联影医疗科技有限公司 图像处理方法、系统和装置
CN108229518B (zh) * 2017-02-15 2020-07-10 北京市商汤科技开发有限公司 基于语句的图像检测方法、装置和系统
CN108229272B (zh) * 2017-02-23 2020-11-27 北京市商汤科技开发有限公司 视觉关系检测方法和装置及视觉关系检测训练方法和装置
CN108694398B (zh) * 2017-04-06 2020-10-30 杭州海康威视数字技术股份有限公司 一种图像分析方法及装置
CN108228686B (zh) * 2017-06-15 2021-03-23 北京市商汤科技开发有限公司 用于实现图文匹配的方法、装置和电子设备
CN109658455B (zh) * 2017-10-11 2023-04-18 阿里巴巴集团控股有限公司 图像处理方法和处理设备
CN108171254A (zh) * 2017-11-22 2018-06-15 北京达佳互联信息技术有限公司 图像标签确定方法、装置及终端
CN108108771A (zh) * 2018-01-03 2018-06-01 华南理工大学 基于多尺度深度学习的图像问答方法
US10643112B1 (en) * 2018-03-27 2020-05-05 Facebook, Inc. Detecting content items violating policies of an online system using machine learning based model
CN108549850B (zh) * 2018-03-27 2021-07-16 联想(北京)有限公司 一种图像识别方法及电子设备
CN108764083A (zh) * 2018-05-17 2018-11-06 淘然视界(杭州)科技有限公司 基于自然语言表达的目标检测方法、电子设备、存储介质
CN108874360B (zh) * 2018-06-27 2023-04-07 百度在线网络技术(北京)有限公司 全景内容定位方法和装置
CN109614613B (zh) * 2018-11-30 2020-07-31 北京市商汤科技开发有限公司 图像的描述语句定位方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
KR102454930B1 (ko) 2022-10-14
WO2020107813A1 (zh) 2020-06-04
JP2021509979A (ja) 2021-04-08
TW202022561A (zh) 2020-06-16
KR20200066617A (ko) 2020-06-10
SG11202003836YA (en) 2020-07-29
CN109614613A (zh) 2019-04-12
CN109614613B (zh) 2020-07-31
TWI728564B (zh) 2021-05-21
US20200226410A1 (en) 2020-07-16
US11455788B2 (en) 2022-09-27

Similar Documents

Publication Publication Date Title
JP6968270B2 (ja) 画像の記述文位置決定方法及び装置、電子機器並びに記憶媒体
JP7041284B2 (ja) 画像処理方法、画像処理装置、電子機器、記憶媒体及びコンピュータプログラム
CN111310764B (zh) 网络训练、图像处理方法及装置、电子设备和存储介质
CN110084775B (zh) 图像处理方法及装置、电子设备和存储介质
CN110210535B (zh) 神经网络训练方法及装置以及图像处理方法及装置
JP6916970B2 (ja) ビデオ処理方法及び装置、電子機器並びに記憶媒体
CN111753822B (zh) 文本识别方法及装置、电子设备和存储介质
CN109614876B (zh) 关键点检测方法及装置、电子设备和存储介质
US20210089799A1 (en) Pedestrian Recognition Method and Apparatus and Storage Medium
CN110598504B (zh) 图像识别方法及装置、电子设备和存储介质
JP7061191B2 (ja) 画像処理方法及び装置、電子機器並びに記憶媒体
KR20210102180A (ko) 이미지 처리 방법 및 장치, 전자 기기 및 기억 매체
KR102454515B1 (ko) 네트워크 최적화 방법 및 장치, 이미지 처리 방법 및 장치, 및 기억 매체
CN109543537B (zh) 重识别模型增量训练方法及装置、电子设备和存储介质
JP2021533430A (ja) 画像処理方法、画像処理装置、電子機器、記憶媒体及びコンピュータプログラム
JP2022522551A (ja) 画像処理方法及び装置、電子機器並びに記憶媒体
CN109934275B (zh) 图像处理方法及装置、电子设备和存储介质
CN111523599B (zh) 目标检测方法及装置、电子设备和存储介质
CN109858614B (zh) 神经网络训练方法及装置、电子设备和存储介质
CN114338083A (zh) 控制器局域网络总线异常检测方法、装置和电子设备
CN108960283A (zh) 分类任务增量处理方法及装置、电子设备和存储介质
CN110633715B (zh) 图像处理方法、网络训练方法及装置、和电子设备
CN113506324B (zh) 图像处理方法及装置、电子设备和存储介质
CN111178115B (zh) 对象识别网络的训练方法及系统
CN110019928B (zh) 视频标题的优化方法及装置

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20200326

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20200326

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20210420

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20210720

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20210720

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20211012

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20211026

R150 Certificate of patent or registration of utility model

Ref document number: 6968270

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250