GB2546360B - Image captioning with weak supervision - Google Patents

Image captioning with weak supervision Download PDF

Info

Publication number
GB2546360B
GB2546360B GB1618932.6A GB201618932A GB2546360B GB 2546360 B GB2546360 B GB 2546360B GB 201618932 A GB201618932 A GB 201618932A GB 2546360 B GB2546360 B GB 2546360B
Authority
GB
United Kingdom
Prior art keywords
weak supervision
image captioning
captioning
image
supervision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
GB1618932.6A
Other versions
GB2546360A (en
Inventor
Wang Zhaowen
You Quanzeng
Jin Hailin
Fang Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Adobe Inc
Original Assignee
Adobe Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/995,042 external-priority patent/US9792534B2/en
Priority claimed from US14/995,032 external-priority patent/US9811765B2/en
Application filed by Adobe Inc filed Critical Adobe Inc
Publication of GB2546360A publication Critical patent/GB2546360A/en
Application granted granted Critical
Publication of GB2546360B publication Critical patent/GB2546360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
GB1618932.6A 2016-01-13 2016-11-09 Image captioning with weak supervision Active GB2546360B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/995,042 US9792534B2 (en) 2016-01-13 2016-01-13 Semantic natural language vector space
US14/995,032 US9811765B2 (en) 2016-01-13 2016-01-13 Image captioning with weak supervision

Publications (2)

Publication Number Publication Date
GB2546360A GB2546360A (en) 2017-07-19
GB2546360B true GB2546360B (en) 2020-08-19

Family

ID=59078284

Family Applications (2)

Application Number Title Priority Date Filing Date
GB1618936.7A Active GB2547068B (en) 2016-01-13 2016-11-09 Semantic natural language vector space
GB1618932.6A Active GB2546360B (en) 2016-01-13 2016-11-09 Image captioning with weak supervision

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB1618936.7A Active GB2547068B (en) 2016-01-13 2016-11-09 Semantic natural language vector space

Country Status (2)

Country Link
DE (2) DE102016013372A1 (en)
GB (2) GB2547068B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205684B (en) * 2017-04-25 2022-02-11 北京市商汤科技开发有限公司 Image disambiguation method, device, storage medium and electronic equipment
CN107608943B (en) * 2017-09-08 2020-07-28 中国石油大学(华东) Image subtitle generating method and system fusing visual attention and semantic attention
CN108108351B (en) * 2017-12-05 2020-05-22 华南理工大学 Text emotion classification method based on deep learning combination model
CN108230413B (en) * 2018-01-23 2021-07-06 北京市商汤科技开发有限公司 Image description method and device, electronic equipment and computer storage medium
CN108921764B (en) * 2018-03-15 2022-10-25 中山大学 Image steganography method and system based on generation countermeasure network
CN108959512B (en) * 2018-06-28 2022-04-29 清华大学 Image description network and technology based on attribute enhanced attention model
CN109086405B (en) * 2018-08-01 2021-09-14 武汉大学 Remote sensing image retrieval method and system based on significance and convolutional neural network
CN109858487B (en) * 2018-10-29 2023-01-17 温州大学 Weak supervision semantic segmentation method based on watershed algorithm and image category label
US11704487B2 (en) * 2019-04-04 2023-07-18 Beijing Jingdong Shangke Information Technology Co., Ltd. System and method for fashion attributes extraction
CN110191096B (en) * 2019-04-30 2023-05-09 安徽工业大学 Word vector webpage intrusion detection method based on semantic analysis
CN110288665B (en) * 2019-05-13 2021-01-15 中国科学院西安光学精密机械研究所 Image description method based on convolutional neural network, computer-readable storage medium and electronic device
CN110276001B (en) * 2019-06-20 2021-10-08 北京百度网讯科技有限公司 Checking page identification method and device, computing equipment and medium
JP6830514B2 (en) 2019-07-26 2021-02-17 zro株式会社 How visual and non-visual semantic attributes are associated with visuals and computing devices
CN110472642B (en) * 2019-08-19 2022-02-01 齐鲁工业大学 Fine-grained image description method and system based on multi-level attention
CN110750669B (en) * 2019-09-19 2023-05-23 深思考人工智能机器人科技(北京)有限公司 Method and system for generating image captions
CN110851644A (en) * 2019-11-04 2020-02-28 泰康保险集团股份有限公司 Image retrieval method and device, computer-readable storage medium and electronic device
CN111275110B (en) * 2020-01-20 2023-06-09 北京百度网讯科技有限公司 Image description method, device, electronic equipment and storage medium
CN111444367B (en) * 2020-03-24 2022-10-14 哈尔滨工程大学 Image title generation method based on global and local attention mechanism
CN111986730A (en) * 2020-07-27 2020-11-24 中国科学院计算技术研究所苏州智能计算产业技术研究院 Method for predicting siRNA silencing efficiency
CN112580362B (en) * 2020-12-18 2024-02-20 西安电子科技大学 Visual behavior recognition method, system and computer readable medium based on text semantic supervision
CN113128410A (en) * 2021-04-21 2021-07-16 湖南大学 Weak supervision pedestrian re-identification method based on track association learning
CN115186655A (en) * 2022-07-06 2022-10-14 重庆软江图灵人工智能科技有限公司 Character semantic recognition method, system, medium and device based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090112020A (en) * 2008-04-23 2009-10-28 엔에이치엔(주) System and method for extracting caption candidate and system and method for extracting image caption using text information and structural information of document
CN105389326A (en) * 2015-09-16 2016-03-09 中国科学院计算技术研究所 Image annotation method based on weak matching probability canonical correlation model
WO2016070098A2 (en) * 2014-10-31 2016-05-06 Paypal, Inc. Determining categories for weakly labeled images

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354704B (en) * 2007-07-23 2011-01-12 夏普株式会社 Apparatus for making grapheme characteristic dictionary and document image processing apparatus having the same
CN104572940B (en) * 2014-12-30 2017-11-21 中国人民解放军海军航空工程学院 A kind of image automatic annotation method based on deep learning and canonical correlation analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090112020A (en) * 2008-04-23 2009-10-28 엔에이치엔(주) System and method for extracting caption candidate and system and method for extracting image caption using text information and structural information of document
WO2016070098A2 (en) * 2014-10-31 2016-05-06 Paypal, Inc. Determining categories for weakly labeled images
CN105389326A (en) * 2015-09-16 2016-03-09 中国科学院计算技术研究所 Image annotation method based on weak matching probability canonical correlation model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Xi, Su Mei, and Young Im Cho. "Image caption automatic generation method based on weighted feature." Control, Automation and Systems (ICCAS), 2013 13th International Conference on. IEEE, 2013 *

Also Published As

Publication number Publication date
GB2546360A (en) 2017-07-19
GB2547068B (en) 2019-06-19
GB2547068A (en) 2017-08-09
DE102016013487A1 (en) 2017-07-13
DE102016013372A1 (en) 2017-07-13

Similar Documents

Publication Publication Date Title
GB2546360B (en) Image captioning with weak supervision
IL258803A (en) Single image detection
HK1251427A1 (en) Image analysis
ZA201708648B (en) Stabilizing video
GB201507320D0 (en) Camera
AU201612508S (en) Video camera
LT3256486T (en) Griffithsin mutants
GB2541589B (en) Image modification
PL3131064T3 (en) Searching image content
GB201515953D0 (en) Improved overflow device
SG10201608874WA (en) Lens
ZA201707892B (en) Video laryngoscopes
IL253940A0 (en) Video encoder
GB201619936D0 (en) Projector with improved contrast
GB201505049D0 (en) Video guide system
GB201404101D0 (en) Image modification
GB2550124B (en) Camera
GB201521000D0 (en) Video content synchronisation
GB2578263B (en) Video laryngoscopes
GB201520066D0 (en) Video system
EP3206627C0 (en) An improved lens design
GB201600522D0 (en) TV etc subtitles
GB201719957D0 (en) Instamatic cameras
GB201515075D0 (en) Television
GB201514293D0 (en) Television