EP3500978A1 - Verfahren und vorrichtung für zero-shot-lernen - Google Patents

Verfahren und vorrichtung für zero-shot-lernen

Info

Publication number
EP3500978A1
EP3500978A1 EP16913114.1A EP16913114A EP3500978A1 EP 3500978 A1 EP3500978 A1 EP 3500978A1 EP 16913114 A EP16913114 A EP 16913114A EP 3500978 A1 EP3500978 A1 EP 3500978A1
Authority
EP
European Patent Office
Prior art keywords
features
dictionary
multimedia content
visual
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP16913114.1A
Other languages
English (en)
French (fr)
Other versions
EP3500978A4 (de
Inventor
Yunlong YU
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of EP3500978A1 publication Critical patent/EP3500978A1/de
Publication of EP3500978A4 publication Critical patent/EP3500978A4/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/28Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle

Definitions

  • Embodiments of the present disclosure generally relate to information processing, and more particularly to a method, apparatus and computer program product for Zero-Shot Learning (ZSL) .
  • ZSL Zero-Shot Learning
  • ZSL refers to a learning process where no training samples are available to discriminate new classes (also called unseen classes) . It aims at improving the scalability of conventional classification methods. It appears frequently in practice because of the enormous amount of real-world object classes that are still constantly changing. It would be too time-consuming and expensive to obtain human annotated labels for each of the classes.
  • ZSL can be widely used in applications of natural scene understanding, object recognition, autonomous vehicles, virtual reality, and so on. For example, in the application of autonomous vehicles, surrounding objects need to be recognized. Conventional recognition methods need to predefine some classes, and then to train a model to recognize objects in these classes. However, if there is an object in an unseen class, the model will fail to recognize it. ZSL is proposed to solve this problem. With ZSL, the model can recognize the objects not only in the seen classes but also in the unseen classes.
  • Conventional methods for ZSL generally apply one transformation matrix to embed the visual features of testing samples into a semantic space, or two transformation matrixes to embed both of the visual features and semantic features of the testing samples into the semantic space. In this way, a connection between the visual features and the semantic features is bridged and a class of the testing samples of unseen classes can be inferred by using the nearest neighborhood method.
  • the conventional methods for ZSL cannot reflect intrinsical structures in the semantic space, leading to unsatisfying performance.
  • example embodiments of the present disclosure include a method, apparatus and computer program product for ZSL.
  • a method comprises: constructing a dictionary model based on visual features and semantic features of multimedia content of seen classes, the semantic features corresponding to the visual features; reconstructing visual features of multimedia content of unseen classes using the dictionary model and semantic features of multimedia content of unseen classes; and determining a class of a testing sample based on comparison of a visual feature of the testing sample and reconstructed visual features.
  • determining the class of the testing sample comprises: in response to the visual feature of the testing sample being closest to one of the reconstructed visual features, designating the class of the testing sample to be a class associated with the one of the reconstructed visual features.
  • constructing the dictionary model comprises: randomly initializing model parameters for the dictionary model; and updating the model parameters so as to obtain a minimum of an objective function for the dictionary model, the objective function being defined at least by the model parameters.
  • the model parameters include at least one of the following: a dictionary matrix, a dictionary coefficient matrix and a transformation matrix.
  • the objective function for the dictionary model is formulized as:
  • the semantic features of the multimedia content include at least one of the following: semantic attributes and distributed text representations of the multimedia content.
  • the visual features of the multimedia content include at least one of the following: color features, texture features, motion features and Convolutional Neural Network features of the multimedia content.
  • an apparatus comprising at least one processor and at least one memory including computer program code.
  • the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: construct a dictionary model based on visual features and semantic features of multimedia content of seen classes, the semantic features corresponding to the visual features; reconstruct visual features of multimedia content of unseen classes using the dictionary model and semantic features of multimedia content of unseen classes; and determine a class of a testing sample based on comparison of a visual feature of the testing sample and reconstructed visual features.
  • an apparatus comprising means for performing the method in the first aspect of the present disclosure.
  • a computer program product comprises at least one computer readable non-transitory memory medium having program code stored thereon, the program code which, when executed by an apparatus, causes the apparatus to perform the method in the first aspect of the present disclosure.
  • Fig. 1 schematically shows an architecture in which embodiments of the present disclosure can be implemented
  • Fig. 2 is a flowchart of a method in accordance with embodiments of the present disclosure.
  • Fig. 3 shows a block diagram of an example computer system suitable for implementing embodiments of the present invention.
  • the term “includes” and its variants are to be read as opened terms that mean “includes, but is not limited to. ”
  • the term “based on” is to be read as “based at least in part on. ”
  • the term “one embodiment” and “an embodiment” are to be read as “at least one embodiment. ”
  • the term “another embodiment” is to be read as “at least one other embodiment. ”
  • Other definitions, explicit and implicit, may be included below.
  • a dictionary model is constructed by using visual features and semantic features of multimedia content of seen classes.
  • semantic features of multimedia content of unseen classes are embedded into the visual space.
  • a class of a testing sample of multimedia content of unseen classes is determined based on comparison of a visual feature of the testing sample and reconstructed visual features. Details of the embodiments of the present disclosure will be described with reference to Figs. 1 to 3.
  • FIG. 1 schematically shows an architecture 100 in which embodiments of the present disclosure can be implemented. It is to be understood that the structure and functionality of the architecture 100 are described only for the purpose of illustration without suggesting any limitations as to the scope of the present disclosure described herein. The present disclosure described herein can be embodied with a different structure and/or functionality.
  • the architecture 100 includes a training system 110 and a testing system 120.
  • the training system 110 is configured to receive visual features 112 of multimedia content of seen classes and semantic features 114 of multimedia content of seen classes.
  • the semantic features 114 correspond to the visual features 112.
  • the training system 110 is further configured to construct a dictionary model based on the visual features 112 and the semantic features 114.
  • Examples of multimedia content include, but are not limited to, images, video and the like.
  • Examples of the visual features 112 include, but are not limited to, color features, texture features, motion features, Convolutional Neural Network (CNN) features and the like.
  • Examples of the semantic features 114 include, but are not limited to, semantic attributes of the multimedia content, distributed text representations of the multimedia content and the like.
  • the testing system 120 is configured to receive the dictionary model from the training system 110, semantic features 126 of multimedia content of unseen classes, and a visual feature 128 of a testing sample.
  • the testing system 120 is further configured to output a classification result of the testing sample.
  • the testing system 120 includes a reconstructing unit 122 and a classifier 124.
  • the reconstructing unit 122 is configured to reconstruct visual features of multimedia content of unseen classes using the dictionary model and the semantic features 126 of multimedia content of unseen classes.
  • the classifier 124 is configured to receive the reconstructed visual features of the seen classes from the reconstructing unit 122 and the visual feature 128 of the testing sample. The classifier 124 is further configured to determine a class of the testing sample based on comparison of the visual feature of the testing sample and the reconstructed visual features. The classifier 124 is further configured to output the classification result of the testing sample.
  • Fig. 2 shows a flowchart of a method 200 for ZSL in accordance with embodiments of the present disclosure.
  • the method 200 may be implemented in the architecture 100 as shown in Fig. 1.
  • the method 200 is entered in step 210, where the training system 110 constructs a dictionary model based on the visual features 112 and semantic features 114 of multimedia content of seen classes.
  • any known feature extracting methods may be used for extract the visual features 112 and the corresponding semantic features 114 from training samples of the seen classes, and that the description thereof is omitted for the purpose of conciseness.
  • the dictionary model may be associated with one or more model parameters.
  • the dictionary model may be constructed by training the model parameters with the training system 110.
  • the model parameters comprise at least one of a dictionary matrix, a dictionary coefficient matrix and a transformation matrix.
  • an objective function for the dictionary model may be predetermined and the objective function may be defined at least by the model parameters for the dictionary model.
  • the objective function of the dictionary model may be formulized as below:
  • F represent an operation of solving a F-norm and F may be in the range of 2 to 4; represents the visual features of the training samples of seen classes; represents the semantic features of the training samples corresponding to the visual features; N represents the number of the training samples of seen classes; d x and d y represent dimensionalities of the matrixes X and Y; respectively; D represents a dictionary matrix; P represents a transformation matrix; represents a dictionary coefficient matrix and d represents a dimensionality of the dictionary coefficient matrix C; and ⁇ represents a predetermined constant for balancing the importance of the two terms in Equation (1) and may be in the range of 0.001 to 1000.
  • constructing the dictionary model comprises randomly initializing model parameters for the dictionary model, and updating the model parameters so as to obtain a minimum of an objective function for the dictionary model.
  • the model parameters are optimized so as to obtain a minimum of an objective function for the dictionary model.
  • the dictionary matrix D, the dictionary coefficient matrix C and the transformation matrix P may be optimized so as to obtain a minimum of the objective function denoted by Equation (1) as below:
  • d i represents an i th base vector in the dictionary matrix D, i ⁇ ⁇ 1; 2; ...; N ⁇ and I represents an identity matrix.
  • a joint optimization process may be used for optimizing the dictionary matrix D, the dictionary coefficient matrix C and the transformation matrix P.
  • the dictionary matrix D and the transformation matrix P may be randomly initialized, respectively.
  • the dictionary coefficient matrix C may be optimized by using Equation (3) as below:
  • the optimized dictionary coefficient matrix C may be represented as:
  • the dictionary matrix D and the dictionary coefficient matrix C may be fixed.
  • the transformation matrix P may be optimized by using Equation (5) as below:
  • the optimized transformation matrix P may be represented as:
  • the transformation matrix P and the dictionary coefficient matrix C may be fixed.
  • the dictionary matrix D may be optimized by using Equation (7) as below:
  • Equation (7) may be solved by the known Alternating Direction Method of Multipliers (ADMM) and the description thereof is omitted for the purpose of conciseness.
  • ADMM Alternating Direction Method of Multipliers
  • the reconstructing unit 122 reconstructs visual features of multimedia content of unseen classes using the dictionary model and semantic features 126 of multimedia content of unseen classes.
  • the semantic features 126 may be represented byyv, v ⁇ ⁇ 1, 2, ..., m ⁇ , where m represents the number of unseen classes.
  • the visual features may be reconstructed by multiplying the optimized dictionary matrix D and the optimized transformation matrix P by the semantic features y v . That is, the reconstructed visual features of multimedia content of unseen classes may be represented as DPy v , v ⁇ ⁇ 1, 2, ..., m ⁇ .
  • the classifier 124 determines a class of a testing sample based on comparison of a visual feature of the testing sample and reconstructed visual features.
  • the classifier 124 may determine the class of the testing sample by using the nearest neighborhood method.
  • determining the class of the testing sample comprises: in response to the visual feature of the testing sample being closest to one of the reconstructed visual features, designating the class of the testing sample to be a class associated with the one of the reconstructed visual features.
  • the nearest neighborhood method is described by way of example without suggesting any limitation to the scope of the present disclosure.
  • the classifier 124 may determine the class of the testing sample by using other suitable methods than the nearest neighborhood method.
  • the dictionary model is constructed based on the visual features and semantic features of multimedia content of seen classes.
  • the dictionary model is learned from both the visual space and the semantic space.
  • the dictionary model may reflect the intrinsical structures in the semantic space, leading to better performance of classification.
  • the model parameters for the dictionary model are jointly optimized, the better performance of classification may be also guaranteed. Further, in the embodiments of present disclosure, because no sparse constraints are imposed on the dictionary coefficient matrix C, the optimization process may be implemented very fast.
  • Fig. 3 shows a block diagram of an example computer system suitable for implementing embodiments of the present invention.
  • the computer system 300 comprises a central processing unit (CPU) 301 which is capable of performing various processes in accordance with a program stored in a read only memory (ROM) 302 or a program loaded from a storage unit 308 to a random access memory (RAM) 303.
  • ROM read only memory
  • RAM random access memory
  • data required when the CPU 301 performs the various processes or the like is also stored as required.
  • the CPU 301, the ROM 302 and the RAM 303 are connected to one another via a bus 304.
  • An input/output (I/O) interface 305 is also connected to the bus 304.
  • the following components are connected to the I/O interface 305: an input unit 306 including a keyboard, a mouse, or the like; an output unit 307 including a display such as a cathode ray tube (CRT) , a liquid crystal display (LCD) , or the like, and a loudspeaker or the like; the storage unit 308 including a hard disk or the like; and a communication unit 309 including a network interface card such as a LAN card, a modem, or the like. The communication unit 309 performs a communication process via the network such as the internet.
  • a drive 310 is also connected to the I/O interface 305 as required.
  • embodiments of the present invention comprise a computer program product including a computer program tangibly embodied on a machine readable medium, the computer program including program code for performing the method 200.
  • the computer program may be downloaded and mounted from the network via the communication unit 309, and/or installed from the removable medium 311.
  • various example embodiments of the present disclosure may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While various aspects of the example embodiments of the present disclosure are illustrated and described as block diagrams, flowcharts, or using some other pictorial representation, it will be appreciated that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • various blocks shown in the flowcharts may be viewed as method steps, and/or as operations that result from operation of computer program code, and/or as a plurality of coupled logic circuit elements constructed to carry out the associated function (s) .
  • embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program containing program codes configured to carry out the methods as described above.
  • a machine readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine readable medium may be a machine readable signal medium or a machine readable storage medium.
  • a machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • machine readable storage medium More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CD-ROM portable compact disc read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • Computer program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer program codes may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor of the computer or other programmable data processing apparatus, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented.
  • the program code may execute entirely on a computer, partly on the computer, as a stand-alone software package, partly on the computer and partly on a remote computer or entirely on the remote computer or server.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP16913114.1A 2016-08-16 2016-08-16 Verfahren und vorrichtung für zero-shot-lernen Withdrawn EP3500978A4 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/095512 WO2018032354A1 (en) 2016-08-16 2016-08-16 Method and apparatus for zero-shot learning

Publications (2)

Publication Number Publication Date
EP3500978A1 true EP3500978A1 (de) 2019-06-26
EP3500978A4 EP3500978A4 (de) 2020-01-22

Family

ID=61196222

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16913114.1A Withdrawn EP3500978A4 (de) 2016-08-16 2016-08-16 Verfahren und vorrichtung für zero-shot-lernen

Country Status (3)

Country Link
EP (1) EP3500978A4 (de)
CN (1) CN109643384A (de)
WO (1) WO2018032354A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051909A (zh) * 2023-03-06 2023-05-02 中国科学技术大学 一种直推式零次学习的未见类图片分类方法、设备及介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580501B (zh) * 2019-08-20 2023-04-25 天津大学 一种基于变分自编码对抗网络的零样本图像分类方法
CN112418257B (zh) * 2019-08-22 2023-04-18 四川大学 一种有效的基于潜在视觉属性挖掘的零样本学习方法
CN110826638B (zh) * 2019-11-12 2023-04-18 福州大学 基于重复注意力网络的零样本图像分类模型及其方法
CN111914903B (zh) * 2020-07-08 2022-10-25 西安交通大学 一种基于外分布样本检测的广义零样本目标分类方法、装置及相关设备
CN112380374B (zh) * 2020-10-23 2022-11-18 华南理工大学 一种基于语义扩充的零样本图像分类方法
CN114627312B (zh) * 2022-05-17 2022-09-06 中国科学技术大学 零样本图像分类方法、系统、设备及存储介质
CN116109877B (zh) * 2023-04-07 2023-06-20 中国科学技术大学 组合式零样本图像分类方法、系统、设备及存储介质
CN117541882B (zh) * 2024-01-05 2024-04-19 南京信息工程大学 一种基于实例的多视角视觉融合转导式零样本分类方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164713B (zh) * 2011-12-12 2016-04-06 阿里巴巴集团控股有限公司 图像分类方法和装置
US20130251340A1 (en) * 2012-03-21 2013-09-26 Wei Jiang Video concept classification using temporally-correlated grouplets
CN103400160B (zh) * 2013-08-20 2017-03-01 中国科学院自动化研究所 一种零训练样本行为识别方法
CN103646256A (zh) * 2013-12-17 2014-03-19 上海电机学院 一种基于图像特征稀疏重构的图像分类方法
CN105184260B (zh) * 2015-09-10 2019-03-08 北京大学 一种图像特征提取方法及行人检测方法及装置
CN105512679A (zh) * 2015-12-02 2016-04-20 天津大学 一种基于极限学习机的零样本分类方法
CN105740879B (zh) * 2016-01-15 2019-05-21 天津大学 基于多模态判别分析的零样本图像分类方法
CN105701514B (zh) * 2016-01-15 2019-05-21 天津大学 一种用于零样本分类的多模态典型相关分析的方法
CN105718940B (zh) * 2016-01-15 2019-03-29 天津大学 基于多组间因子分析的零样本图像分类方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051909A (zh) * 2023-03-06 2023-05-02 中国科学技术大学 一种直推式零次学习的未见类图片分类方法、设备及介质

Also Published As

Publication number Publication date
CN109643384A (zh) 2019-04-16
WO2018032354A1 (en) 2018-02-22
EP3500978A4 (de) 2020-01-22

Similar Documents

Publication Publication Date Title
WO2018032354A1 (en) Method and apparatus for zero-shot learning
US10991074B2 (en) Transforming source domain images into target domain images
US20200327363A1 (en) Image retrieval method and apparatus
US11062453B2 (en) Method and system for scene parsing and storage medium
KR102318772B1 (ko) 도메인 분리 뉴럴 네트워크들
US11074479B2 (en) Learning of detection model using loss function
WO2020006961A1 (zh) 用于提取图像的方法和装置
CN109272043B (zh) 用于光学字符识别的训练数据生成方法、系统和电子设备
CN108427927B (zh) 目标再识别方法和装置、电子设备、程序和存储介质
CN108154222B (zh) 深度神经网络训练方法和系统、电子设备
CN108230346B (zh) 用于分割图像语义特征的方法和装置、电子设备
US20180130203A1 (en) Automated skin lesion segmentation using deep side layers
KR20220122566A (ko) 텍스트 인식 모델의 트레이닝 방법, 텍스트 인식 방법 및 장치
CN113139628B (zh) 样本图像的识别方法、装置、设备及可读存储介质
CN113822428A (zh) 神经网络训练方法及装置、图像分割方法
US20220092407A1 (en) Transfer learning with machine learning systems
US11164004B2 (en) Keyframe scheduling method and apparatus, electronic device, program and medium
US20190362226A1 (en) Facilitate Transfer Learning Through Image Transformation
CN108154153B (zh) 场景分析方法和系统、电子设备
CN111667027A (zh) 多模态图像的分割模型训练方法、图像处理方法及装置
CN108268629B (zh) 基于关键词的图像描述方法和装置、设备、介质
JP2022185143A (ja) テキスト検出方法、テキスト認識方法及び装置
US20170364740A1 (en) Signal processing
CN117315758A (zh) 面部表情的检测方法、装置、电子设备及存储介质
US20230021551A1 (en) Using training images and scaled training images to train an image segmentation model

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20190130

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: NOKIA TECHNOLOGIES OY

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20200103

RIC1 Information provided on ipc code assigned before grant

Ipc: G06K 9/46 20060101ALI20191218BHEP

Ipc: G06K 9/66 20060101AFI20191218BHEP

Ipc: G06K 9/62 20060101ALI20191218BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20200801