CN107077625A

CN107077625A - Hierarchical Deep Convolutional Neural Networks

Info

Publication number: CN107077625A
Application number: CN201580058248.6A
Authority: CN
Inventors: 鲁宾逊·皮拉姆苏; 严志程; 维格里希·贾格迪希; 邸韡; 丹尼斯·德科斯特
Original assignee: eBay Inc
Current assignee: eBay Inc
Priority date: 2014-10-27
Filing date: 2015-10-27
Publication date: 2017-08-18
Also published as: US10387773B2; US20160117587A1; EP3213261A1; KR20170077183A; EP3213261A4; JP2017538195A; WO2016069581A1

Abstract

The deep convolutional neural networks (HD CNN) of layering branch improve existing convolutional neural networks (CNN) technology.In HD CNN, the class that can be easily distinguished is classified in high-rise thick classification CNN, and being sorted in lower floor fine classification CNN for being most difficult to is completed.In HD CNN training, multinomial logistics loss and novel time degree of rarefication punishment can be used.The using of multinomial logistics loss and the punishment of the time degree of rarefication classification subset that to cause each branch component processing different.

Description

Hierarchical Deep Convolutional Neural Networks

优先权声明priority statement

本申请要求于2014年12月23日提交的题为“Hierarchical Deep ConvolutionalNeural Network For Image Classification”的美国专利申请No.14/582,059的优先权，该专利申请要求于2014年10月27日提交的题为“Hierarchical Deep ConvolutionalNeural Network For Image Classification”的美国专利申请No.62/068,883的优先权，通过引用将上述申请中的每一个并入本文。This application claims priority to U.S. Patent Application No. 14/582,059, filed December 23, 2014, entitled "Hierarchical Deep Convolutional Neural Network For Image Classification," which claims The priority of US Patent Application No. 62/068,883 for "Hierarchical Deep Convolutional Neural Network For Image Classification," each of which is incorporated herein by reference.

技术领域technical field

本文公开的主题通常涉及使用分层深卷积神经网络对数据分类。具体地，本公开涉及生成和使用用于图像分类的分层深卷积神经网络的系统和方法。The subject matter disclosed herein generally involves classifying data using layered deep convolutional neural networks. In particular, the present disclosure relates to systems and methods for generating and using hierarchical deep convolutional neural networks for image classification.

背景技术Background technique

深卷积神经网络(CNN)被训练作为N路分类器以在N类数据之间进行区分。CNN分类器被用于对图像进行分类，检测对象，估计姿势，识别面部以及执行其他分类任务。通常，由设计者选择CNN的结构(例如，层的数量、层的类型、层之间的连接性等)，然后通过训练确定每层的参数。A deep convolutional neural network (CNN) is trained as an N-way classifier to distinguish between N classes of data. CNN classifiers are used to classify images, detect objects, estimate poses, recognize faces, and perform other classification tasks. Usually, the structure of CNN (eg, number of layers, type of layers, connectivity between layers, etc.) is selected by the designer, and then the parameters of each layer are determined through training.

可以通过平均来组合使用多个分类器。在模型平均中，使用多个单独的模型。每个模型都能够对类别的整个集合进行分类，并且每个模型都是独立训练的。其预测差异的主要来源包括：不同的初始化、训练全集的不同子集，等等。组合模型的输出是各个单独模型的平均输出。Multiple classifiers can be combined by averaging. In model averaging, multiple individual models are used. Each model is capable of classifying the entire set of categories, and each model is trained independently. The main sources of variance in its predictions include: different initializations, different subsets of the training corpus, etc. The output of the combined model is the average output of the individual models.

附图说明Description of drawings

在附图中以示例而非限制的方式示出了一些实施例。Some embodiments are shown in the figures by way of example and not limitation.

图1是示出根据一些示例实施例的适合于创建和使用用于图像分类的分层深CNN的网络环境的网络图。FIG. 1 is a network diagram illustrating a network environment suitable for creating and using a hierarchical deep CNN for image classification, according to some example embodiments.

图2是示出根据一些示例实施例的适用于图像分类的分层深CNN服务器的组件的框图。FIG. 2 is a block diagram illustrating components of a hierarchical deep CNN server suitable for image classification, according to some example embodiments.

图3是示出根据一些示例实施例的适合于使用分层深CNN技术进行图像分类的设备的组件的框图。FIG. 3 is a block diagram illustrating components of an apparatus suitable for image classification using hierarchical deep CNN techniques, according to some example embodiments.

图4是示出根据一些示例实施例的分类图像的组图。FIG. 4 is a group diagram illustrating classification images according to some example embodiments.

图5是示出根据一些示例实施例的被配置为识别图像的精细类别的服务器的组件之间的关系的框图。FIG. 5 is a block diagram illustrating a relationship between components of a server configured to recognize fine-grained categories of images, according to some example embodiments.

图6是示出根据一些示例实施例的在执行识别粗类别的处理中服务器的操作的流程图。FIG. 6 is a flowchart illustrating an operation of a server in performing a process of recognizing a coarse category, according to some example embodiments.

图7是示出根据一些示例实施例的在执行生成用于对图像进行分类的分层深CNN的处理中服务器的操作的流程图。FIG. 7 is a flowchart illustrating the operation of a server in performing a process of generating a hierarchical deep CNN for classifying images, according to some example embodiments.

图8是示出根据一些示例实施例的可以安装在机器上的软件架构的示例的框图。Figure 8 is a block diagram illustrating an example of a software architecture that may be installed on a machine according to some example embodiments.

图9示出了根据示例实施例的具有计算机系统的形式的机器的示图表示，在所述计算机系统中，可以执行一组指令以使所述机器执行本文讨论的方法中的任意一个或多个方法。9 shows a diagrammatic representation of a machine in the form of a computer system in which a set of instructions can be executed to cause the machine to perform any one or more of the methods discussed herein, according to an example embodiment. method.

具体实施方式detailed description

示例方法和系统涉及用于图像分类的分层深CNN。示例仅仅代表可能的变形。除非另行明确声明，否则组件和功能是可选的，且可被合并或细分，且操作可以在顺序上变化或被组合或细分。在以下描述中，出于解释的目的，对多个具体细节进行阐述，以提供对示例实施例的透彻理解。然而，对于本领域技术人员将显而易见的是：本主题可以在没有这些具体细节的情况下实施。Example methods and systems relate to hierarchical deep CNNs for image classification. The examples merely represent possible variations. Components and functions are optional and may be combined or subdivided, and operations may be changed in order or combined or subdivided, unless explicitly stated otherwise. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of example embodiments. It will be apparent, however, to one skilled in the art that the present subject matter may be practiced without these specific details.

分层深CNN(HD-CNN)遵循由粗到精的分类策略和模块化设计原则。对于任何给定的类标签，有可能定义简单类集合和混淆类集合。因此，初始的粗分类器CNN能够将可容易分离的类彼此分离。随后，具有挑战性的类被路由到下游精细CNN，其仅关注混淆类。在一些示例实施例中，HD-CNN将分类性能改善得优于标准深CNN模型。与CNN一样，HD-CNN的结构(例如，每个组件CNN的结构、精细类的数量等)可以由设计者确定，而每个CNN的每个层的参数可以通过训练来确定。Hierarchical Deep CNN (HD-CNN) follows a coarse-to-fine classification strategy and modular design principles. For any given class label, it is possible to define a collection of simple classes and a collection of confused classes. Therefore, the initial coarse classifier CNN is able to separate easily separable classes from each other. Subsequently, the challenging classes are routed to a downstream fine CNN, which only focuses on the confused classes. In some example embodiments, HD-CNN improves classification performance over standard deep CNN models. Like CNN, the structure of HD-CNN (e.g., the structure of each component CNN, the number of fine classes, etc.) can be determined by the designer, while the parameters of each layer of each CNN can be determined by training.

与从头开始训练HD-CNN相比，预训练HD-CNN可以获得好处。例如，与标准深CNN模型相比，HD-CNN具有来自共享分支浅层以及C′独立分支深层的附加自由参数。相对于标准CNN而言，这将极大地增加HD-CNN内的自由参数的数量。因此，如果使用相同数量的训练数据，则过拟合(overfit)更可能发生在HD-CNN中。预训练可以帮助克服训练数据不足的困难。Compared with training HD-CNN from scratch, pre-training HD-CNN can gain benefits. For example, compared to standard deep CNN models, HD-CNN has additional free parameters from shared branch shallow layers as well as C′ independent branch deep layers. This will greatly increase the number of free parameters within HD-CNN compared to standard CNN. Therefore, overfitting is more likely to occur in HD-CNN if the same amount of training data is used. Pre-training can help overcome the difficulty of insufficient training data.

预训练的另一潜在好处是：对粗类别的良好选择将有益于训练分支组件，以集中关注容易混淆的精细类别的一致子集。例如，分支组件1善长区分苹果和橙子，而分支组件2在区分公共汽车和火车方面更具能力。因此，识别粗类别集合，粗类别组件被预训练来对该粗类别集合进行分类。Another potential benefit of pre-training is that a good choice of coarse categories will benefit training branch components to focus on a consistent subset of confusingly fine categories. For example, Branch 1 is good at distinguishing apples from oranges, while Branch 2 is better at distinguishing buses from trains. Thus, a coarse set of categories is identified and the coarse category component is pre-trained to classify the set of coarse categories.

一些训练数据集包括与粗类别有关的信息以及精细类别和粗类别之间的关系。然而，很多训练数据集不是如此。这些训练数据集仅为数据集中的每个项目提供精细类别，而不识别粗类别。因此，下面参照图6描述将精细类别划分到粗类别的过程。Some training datasets include information about coarse categories and the relationship between fine and coarse categories. However, this is not the case for many training datasets. These training datasets only provide fine categories for each item in the dataset and do not identify coarse categories. Therefore, a process of classifying fine categories into coarse categories is described below with reference to FIG. 6 .

图1是示出根据一些示例实施例的适合于创建和使用用于图像分类的分层深CNN的网络环境100的网络图。网络环境100包括电子商务服务器120和140、HD-CNN服务器130以及设备150A、150B和150C，它们都经由网络170彼此通信耦合。设备150A、150B和150C可以被集体称为“设备150”，或者被统称为“设备150”。电子商务服务器120和140以及HD-CNN服务器130可以是基于网络的系统110的一部分。备选地，设备150可以直接连接到HD-CNN服务器130，或者通过本地网络连接到HD-CNN服务器130，所述本地网络不同于用于连接到电子商务服务器120或140的网络170。如以下参照图8-9所描述的，电子商务服务器120和140、HD-CNN服务器130以及设备150均可以整体地或部分地实现在计算机系统中。FIG. 1 is a network diagram illustrating a network environment 100 suitable for creating and using a hierarchical deep CNN for image classification, according to some example embodiments. Network environment 100 includes e-commerce servers 120 and 140 , HD-CNN server 130 , and devices 150A, 150B, and 150C, which are all communicatively coupled to one another via network 170 . Devices 150A, 150B, and 150C may be referred to collectively as "device 150," or collectively as "device 150." E-commerce servers 120 and 140 and HD-CNN server 130 may be part of web-based system 110 . Alternatively, the device 150 may be directly connected to the HD-CNN server 130 or connected to the HD-CNN server 130 through a local network different from the network 170 used to connect to the e-commerce server 120 or 140 . As described below with reference to FIGS. 8-9 , the e-commerce servers 120 and 140 , the HD-CNN server 130 , and the device 150 may be wholly or partially implemented in a computer system.

电子商务服务器120和140经由网络170向其他机器(例如，设备150)提供电子商务应用。电子商务服务器120和140还可以直接连接到HD-CNN服务器130，或者与HD-CNN服务器130集成在一起。在一些示例实施例中，一个电子商务服务器120和HD-CNN服务器130是基于网络的系统110的一部分，而其他电子商务服务器(例如，电子商务服务器140)与基于网络的系统110分离。电子商务应用可以向用户提供以下方式：彼此直接购买和出售物品，从电子商务应用提供商购买物品并将物品出售给电子商务应用提供商，或者以上二者。E-commerce servers 120 and 140 provide e-commerce applications to other machines (eg, device 150 ) via network 170 . The e-commerce servers 120 and 140 can also be directly connected to the HD-CNN server 130 or integrated with the HD-CNN server 130 . In some example embodiments, one e-commerce server 120 and HD-CNN server 130 are part of web-based system 110 , while other e-commerce servers (eg, e-commerce server 140 ) are separate from web-based system 110 . An e-commerce application may provide a way for users to buy and sell items directly to each other, to buy items from and sell items to an e-commerce application provider, or both.

HD-CNN服务器130创建用于对图像进行分类的HD-CNN，使用HD-CNN对图像进行分类，或者执行这两者。例如，HD-CNN服务器130可以基于训练集创建用于对图像进行分类的HD-CNN，或者可以将预先存在的HD-CNN加载到HD-CNN服务器130上。HD-CNN服务器130还可以通过为图像提供精细类别来响应对图像分类的请求。HD-CNN服务器130可以经由网络170或另一网络向其他机器(例如，电子商务服务器120和140或者设备150)提供数据。HD-CNN服务器130可以经由网络170或另一网络从其他机器(例如，电子商务服务器120和140或者设备150)接收数据。在一些示例性实施例中，这里描述的HD-CNN服务器130的功能在诸如个人计算机、平板计算机或智能电话之类的用户设备上执行。The HD-CNN server 130 creates an HD-CNN for classifying images, uses the HD-CNN to classify images, or performs both. For example, HD-CNN server 130 may create an HD-CNN for classifying images based on the training set, or may load a pre-existing HD-CNN onto HD-CNN server 130 . The HD-CNN server 130 may also respond to requests for image classification by providing fine-grained categories for the images. HD-CNN server 130 may provide data to other machines (eg, e-commerce servers 120 and 140 or device 150 ) via network 170 or another network. HD-CNN server 130 may receive data from other machines (eg, e-commerce servers 120 and 140 or device 150 ) via network 170 or another network. In some exemplary embodiments, the functionality of the HD-CNN server 130 described herein is performed on a user device such as a personal computer, tablet computer, or smartphone.

图1中还示出了用户160。用户160可以是人类用户(例如，人类)、机器用户(例如，通过软件程序配置的与设备150和HD-CNN服务器130交互的计算机)或者它们的任意合适组合(例如，机器辅助的人或者人监管的机器)。用户160不是网络环境100的一部分，但与设备150相关联并且可以是设备150的用户。例如，设备150可以是属于用户160的传感器、台式计算机、车载计算机、平板计算机、导航设备、便携媒体设备或智能电话。Also shown in FIG. 1 is user 160 . User 160 may be a human user (e.g., a human), a machine user (e.g., a computer configured by a software program to interact with device 150 and HD-CNN server 130), or any suitable combination thereof (e.g., a machine-assisted human or human supervised machine). User 160 is not part of network environment 100 , but is associated with device 150 and may be a user of device 150 . For example, device 150 may be a sensor belonging to user 160 , a desktop computer, a vehicle computer, a tablet computer, a navigation device, a portable media device, or a smart phone.

在一些示例性实施例中，HD-CNN服务器130接收与用户感兴趣的项目有关的数据。例如，附接到设备150A的相机可以拍摄用户160希望销售的项目的图像，并且通过网络170将该图像发送给HD-CNN服务器130。HD-CNN服务器130基于该图像对该项目进行分类。类别可被发送给电子商务服务器120或140、发送给设备150A、或其任何组合。电子商务服务器120或140可以使用该类别来辅助生成要出售的项目的列表。类似地，该图像可以是用户160感兴趣的项目的图像，并且可以由电子商务服务器120或140使用类别来帮助选择要向用户160显示的项目的列表。In some exemplary embodiments, HD-CNN server 130 receives data related to items of interest to the user. For example, a camera attached to device 150A may take an image of an item that user 160 wishes to sell and send the image to HD-CNN server 130 over network 170 . HD-CNN server 130 classifies the item based on the image. Categories may be sent to e-commerce server 120 or 140, to device 150A, or any combination thereof. The e-commerce server 120 or 140 may use this category to assist in generating a list of items for sale. Similarly, the image may be of an item of interest to user 160 , and the categories may be used by e-commerce server 120 or 140 to help select a list of items to display to user 160 .

图1中所示的机器、数据库或设备中的任意一个可以用通用计算机来实现，所述通用计算机通过软件修改(例如，配置或编程)为专用计算机，以执行本文针对所述机器、数据库或设备描述的功能。例如，以下参照图8-9讨论能够实现本文描述的方法中的任意一个或更多个的计算机系统。如本文所使用的，“数据库”是数据存储资源并可以存储数据，所述数据结构化为文本文件、表格、电子表格、关系数据库(例如，对象关系数据库)、三元组存储、分层数据存储或它们的任意合适组合。此外，图1中示出的机器、数据库或设备的任意两个或更多个可以组合到单个机器中，并且本文针对任意单个机器、数据库或设备描述的功能可以再划分到多个机器、数据库或设备中。Any of the machines, databases, or devices shown in Figure 1 can be implemented with a general-purpose computer that is modified (e.g., configured or programmed) by software into a special-purpose computer to perform the tasks described herein for the machines, databases, or The capabilities described by the device. For example, a computer system capable of implementing any one or more of the methods described herein is discussed below with reference to FIGS. 8-9 . As used herein, a "database" is a data storage resource and can store data structured as text files, tables, spreadsheets, relational databases (e.g., object-relational databases), triple stores, hierarchical data storage or any suitable combination thereof. Furthermore, any two or more of the machines, databases, or devices shown in Figure 1 may be combined into a single machine, and the functionality described herein for any single machine, database, or device may be subdivided into multiple machines, databases, or or in the device.

网络170可以是实现机器、数据库和设备(例如，HD-CNN服务器130和设备150)之间的通信的任意网络。因此，网络170可以是有线网络、无线网络(例如，移动或蜂窝网络)、或其任意合适组合。网络170可以包括构成私有网络、公共网络(例如，互联网)或其任意合适组合的一个或多个部分。Network 170 may be any network that enables communication between machines, databases, and devices (eg, HD-CNN server 130 and device 150). Thus, network 170 may be a wired network, a wireless network (eg, a mobile or cellular network), or any suitable combination thereof. Network 170 may include one or more components that constitute a private network, a public network (eg, the Internet), or any suitable combination thereof.

图2是示出了根据一些示例实施例的HD-CNN服务器130的组件的框图。HD-CNN服务器130被示出为包括通信模块210、粗类别识别模块220、预训练模块230、微调模块240、分类模块250和存储模块260，这些模块全都被配置为彼此通信(例如，经由总线、共享存储器或交换机)。本文描述的任何一个或多个模块可以使用硬件(例如机器的处理器)来实现。此外，这些模块中的任何两个或更多个模块可被合并为单一模块，且本文中针对单一模块描述的功能可以再划分到多个模块中。此外，根据各种示例实施例，本文描述为在单个机器、数据库或设备中实施的模块可以分布在多个机器、数据库或设备中。FIG. 2 is a block diagram illustrating components of the HD-CNN server 130 according to some example embodiments. HD-CNN server 130 is shown to include a communication module 210, a coarse class recognition module 220, a pre-training module 230, a fine-tuning module 240, a classification module 250, and a storage module 260, all of which are configured to communicate with each other (e.g., via a bus , shared storage, or switches). Any one or more of the modules described herein can be implemented using hardware, such as a processor of a machine. Furthermore, any two or more of these modules may be combined into a single module, and functions described herein for a single module may be subdivided into multiple modules. Furthermore, modules described herein as being implemented in a single machine, database or device may be distributed among multiple machines, databases or devices according to various example embodiments.

通信模块210被配置为发送和接收数据。例如，通信模块210可以通过网络170接收图像数据，并将接收的数据发送给分类模块250。作为另一示例，分类模块250可以识别项目的类别，并且可以由通信模块210通过网络170将项目的类别发送给电子商务服务器120。The communication module 210 is configured to send and receive data. For example, the communication module 210 may receive image data through the network 170 and transmit the received data to the classification module 250 . As another example, the classification module 250 may identify the category of the item, and the communication module 210 may send the category of the item to the e-commerce server 120 through the network 170 .

粗类别识别模块220被配置为针对给定数据集来识别粗类别。粗类别识别模块220确定相关的精细类别，并将它们分组到粗类别。例如，所提供的数据集可以具有C个精细类别，并且HD-CNN设计者可以确定期望的粗类别的数量C′。粗类别识别模块220识别C个精细类别到C’个粗类别的映射。可以使用下面描述的图6的处理600将精细类别分组到粗类别。The coarse category identification module 220 is configured to identify coarse categories for a given data set. The coarse category identification module 220 determines related fine categories and groups them into coarse categories. For example, a provided dataset may have C fine categories, and the HD-CNN designer may determine the desired number of coarse categories C'. The coarse category identification module 220 identifies a mapping of C fine categories to C' coarse categories. Fine categories may be grouped into coarse categories using process 600 of FIG. 6 described below.

预训练模块230和微调模块240被配置为确定HD-CNN的参数。预训练模块230对粗类别CNN和精细类别CNN进行预训练，以减少精细类别CNN之间的重叠。预训练完成之后，微调模块240提供对HD-CNN的附加调整。可以使用下面描述的图7的处理700来执行预训练和微调。The pre-training module 230 and the fine-tuning module 240 are configured to determine parameters of the HD-CNN. The pre-training module 230 pre-trains the coarse category CNN and the fine category CNN to reduce the overlap between the fine category CNNs. After the pre-training is complete, the fine-tuning module 240 provides additional adjustments to the HD-CNN. Pre-training and fine-tuning may be performed using process 700 of FIG. 7 described below.

分类模块250被配置为接收并处理图像数据。图像数据可以是二维图像、来自连续视频流的帧、三维图像、深度图像、红外图像、双目图像或其任何合适的组合。例如，图像可以是从相机接收的。为了说明，相机可以拍摄图片，并将其发送给分类模块250。分类模块250通过使用HD-CNN来确定图像的精细类别(例如，通过使用粗类别CNN确定粗类别或粗类别权重，并且使用一个或多个精细类别CNN确定精细类别)。可以使用预训练模块230、微调模块240或这两者来生成HD-CNN。备选地，HD-CNN可以提供自外部源。Classification module 250 is configured to receive and process image data. The image data may be two-dimensional images, frames from a continuous video stream, three-dimensional images, depth images, infrared images, binocular images, or any suitable combination thereof. For example, an image may be received from a camera. To illustrate, a camera may take a picture and send it to the classification module 250 . The classification module 250 determines a fine class of an image by using an HD-CNN (eg, by using a coarse class CNN to determine a coarse class or a coarse class weight, and using one or more fine class CNNs to determine a fine class). The HD-CNN can be generated using the pre-training module 230, the fine-tuning module 240, or both. Alternatively, HD-CNN may be provided from an external source.

存储模块260被配置为存储和检索由粗类别识别模块220、预训练模块230、微调模块240和分类模块250生成和使用的数据。例如，由预训练模块230生成的HD-CNN可由存储模块260存储，以供微调模块240检索。由分类模块250生成的关于图像分类的信息也可以由存储模块260存储。电子商务服务器120或140可以(例如，通过提供图像标识符)请求图像的类别，所述图像的类别可以由存储模块260从存储器检索并使用通信模块210在网络170上发送。The storage module 260 is configured to store and retrieve data generated and used by the coarse category identification module 220 , the pre-training module 230 , the fine-tuning module 240 and the classification module 250 . For example, the HD-CNN generated by the pre-training module 230 may be stored by the storage module 260 for retrieval by the fine-tuning module 240 . Information on image classification generated by the classification module 250 may also be stored by the storage module 260 . E-commerce server 120 or 140 may request (eg, by providing an image identifier) a category of images that may be retrieved from memory by storage module 260 and sent over network 170 using communication module 210 .

图3是示出了根据一些示例实施例的设备150的组件的框图。设备150被示出为包括全都被配置为(例如，经由总线、共享存储器或交换机)彼此通信的输入模块310、相机模块320和通信模块330。本文描述的任何一个或多个模块可以使用硬件(例如机器的处理器)来实现。此外，这些模块中的任何两个或更多个模块可被合并为单一模块，且本文中针对单一模块描述的功能可以再划分到多个模块中。此外，根据各种示例实施例，本文描述为在单个机器、数据库或设备中实施的模块可以分布在多个机器、数据库或设备中。FIG. 3 is a block diagram illustrating components of device 150 according to some example embodiments. Device 150 is shown to include input module 310 , camera module 320 and communication module 330 all configured to communicate with one another (eg, via a bus, shared memory, or switch). Any one or more of the modules described herein can be implemented using hardware, such as a processor of a machine. Furthermore, any two or more of these modules may be combined into a single module, and functions described herein for a single module may be subdivided into multiple modules. Furthermore, modules described herein as being implemented in a single machine, database or device may be distributed among multiple machines, databases or devices according to various example embodiments.

输入模块310被配置为经由用户接口从用户接收输入。例如，用户可以将其用户名和密码输入到输入模块中，配置相机，选择用作列表或项目搜索的基础的图像，或其任何合适的组合。The input module 310 is configured to receive input from a user via a user interface. For example, a user may enter their username and password into the input module, configure the camera, select an image to use as the basis for a list or item search, or any suitable combination thereof.

相机模块320被配置为捕获图像数据。例如，可以从相机接收图像，可以从红外相机接收深度图像，可以从双目相机接收一对图像，等等。The camera module 320 is configured to capture image data. For example, an image may be received from a camera, a depth image may be received from an infrared camera, a pair of images may be received from a binocular camera, and so on.

通信模块330被配置为将输入模块310或相机模块320接收的数据传送给HD-CNN服务器130、电子商务服务器120或电子商务服务器140。例如，输入模块310可以接收：对利用相机模块320拍摄的图像的选择，以及关于图像描绘了用户(例如，用户160)希望销售的项目的指示。通信模块330可以将图像和指示传送给电子商务服务器120。电子商务服务器120可以将图像发送给HD-CNN服务器130以请求图像的分类，基于类别生成列表模板，以及使列表模板经由通信模块330和输入模块310而呈现给用户。The communication module 330 is configured to transmit data received by the input module 310 or the camera module 320 to the HD-CNN server 130 , the e-commerce server 120 or the e-commerce server 140 . For example, input module 310 may receive a selection of an image captured with camera module 320 and an indication that the image depicts an item that a user (eg, user 160 ) wishes to sell. The communication module 330 may transmit the images and instructions to the e-commerce server 120 . The e-commerce server 120 may send the image to the HD-CNN server 130 to request classification of the image, generate a list template based on the category, and cause the list template to be presented to the user via the communication module 330 and the input module 310 .

图4是示出根据一些示例实施例的分类图像的组图。在图4中，已经将二十七张图像正确分类为描绘苹果(组410)、橙子(组420)或公共汽车(组430)。组410-430在本文中被称为苹果、橙子和公共汽车。通过检查，区分苹果的成员和公共汽车的成员相对容易，而区分苹果的成员和橙子的成员则较困难。来自苹果和橙子的图像可能具有相似的形状、纹理和颜色，所以正确地区分它们较难。相比之下，来自公共汽车的图像通常具有与苹果不同的视觉外观，因此可以预期分类较容易。事实上，苹果和橙子这两个类别可以被定义为属于相同的粗类别，而公交汽车属于不同的粗类别。例如，在CIFAR 100数据集(其在“LearningMultiple Layers of Features from Tiny Images”，Krizhevsky(2009)中进行了讨论)中，苹果和橙子是“水果和蔬菜”中的子类别，而公交汽车是“车辆1”中的子类别。CIFAR 100数据集由自然图像的100个类组成。CIFAR 100数据集中有50,000张训练图像和10,000张测试图像。FIG. 4 is a group diagram illustrating classification images according to some example embodiments. In FIG. 4, twenty seven images have been correctly classified as depicting apples (group 410), oranges (group 420), or buses (group 430). Groups 410-430 are referred to herein as apples, oranges and buses. By inspection, it is relatively easy to distinguish members of apples from members of buses, and more difficult to distinguish members of apples from members of oranges. Images from apples and oranges may have similar shapes, textures, and colors, so distinguishing them correctly is harder. In contrast, images from buses often have a different visual appearance than apples, so easier classification can be expected. In fact, the two categories of apples and oranges can be defined as belonging to the same coarse category, while buses belong to different coarse categories. For example, in the CIFAR 100 dataset (discussed in "Learning Multiple Layers of Features from Tiny Images", Krizhevsky (2009)), apples and oranges are subcategories in "Fruits and Vegetables", while buses are subcategories in " subcategory in Vehicle 1". The CIFAR 100 dataset consists of 100 classes of natural images. There are 50,000 training images and 10,000 testing images in the CIFAR 100 dataset.

图5是示出了根据一些示例实施例的分类模块250的组件之间的关系的框图。可以使用单个标准的深CNN作为HD-CNN的精细预测组件的构建块。如图5所示，粗类别CNN 520预测在粗类别上的概率。多个分支CNN 540-550是独立添加的。在一些示例性实施例中，分支CNN 540-550共享分支浅层530。粗类别CNN 520和多个分支CNN 540-550各自接收输入图像并且对输入图像并行操作。尽管每个分支CNN 540-550接收输入图像并给出在精细类别的全集上的概率分布，但是每个分支CNN 540-550的结果仅对类别的子集有效。由概率平均层560对来自分支CNN 540-550的多个完全预测进行线性组合，以形成由对应的粗别概率加权的最终精细类别预测。FIG. 5 is a block diagram illustrating the relationship between components of the classification module 250 according to some example embodiments. A single standard deep CNN can be used as a building block for the fine prediction component of HD-CNN. As shown in Figure 5, the coarse class CNN 520 predicts probabilities on the coarse class. Multiple branch CNNs 540-550 are added independently. In some exemplary embodiments, branched CNNs 540 - 550 share branched shallow layer 530 . Coarse category CNN 520 and multiple branch CNNs 540-550 each receive an input image and operate on the input image in parallel. Although each branch CNN 540-550 takes an input image and gives a probability distribution over the corpus of fine categories, the results of each branch CNN 540-550 are only valid for a subset of categories. Multiple full predictions from branch CNNs 540-550 are linearly combined by a probability averaging layer 560 to form a final fine class prediction weighted by the corresponding coarse classification probabilities.

以下的符号用于下面的讨论。数据集包括：N_t个训练样本{x_i，y_i}^t(其中i在1到N_t的范围内)，以及N_s个测试样本{x_i，y_i}^t(其中i在1到N_s的范围内)。x_i和y_i分别表示图像数据和图像标签。图像标签对应于图像的精细类别。在数据集{S_k}中有C个预定义的精细类别，其中k在1到C的范围内。该数据集中有C′个粗类别。The following symbols are used in the discussion below. The data set includes: N _t training samples { _xi , y _i } ^t (where i is in the range of 1 to N _t ), and N _s test samples { _xi , y _i } ^t (where i is in the range of 1 to N t N _s range). x _i and y _i denote image data and image labels, respectively. Image labels correspond to fine-grained categories of images. There are C predefined fine categories in the dataset {S _k }, where k ranges from 1 to C. There are C' coarse categories in this dataset.

与标准的深CNN模型一样，HD-CNN实现端到端分类。虽然标准的深CNN模型仅由单个CNN组成，但是HD-CNN主要包括三个部分，即单个粗类别组件B(对应于粗类别CNN 520)、多个分支精细类别组件{F^j}(其中，j在1到C′的范围内(对应于分支CNN 540-550))、以及单个概率平均层(对应于概率平均层560)。单个粗类别CNN 520接收作为输入的原始图像像素数据，并且输出在粗类别上的概率分布。粗类别概率被概率平均层560用于为由分支CNN540-550做出的完全预测指派权重。Like standard deep CNN models, HD-CNN achieves end-to-end classification. Although the standard deep CNN model consists of only a single CNN, HD-CNN mainly includes three parts, namely, a single coarse category component B (corresponding to the coarse category CNN 520), multiple branch fine category components {F ^j } (where, j in the range of 1 to C' (corresponding to branch CNN 540-550)), and a single probability averaging layer (corresponding to probability averaging layer 560). A single coarse category CNN 520 receives as input raw image pixel data and outputs a probability distribution over the coarse category. The coarse class probabilities are used by the probability averaging layer 560 to assign weights to the full predictions made by the branch CNNs 540-550.

图5还示出了分支CNN 540-550的集合，每个分支CNN做出精细类别的全集上预测。在一些示例性实施例中，分支CNN 540-550共享浅层530中的参数，但具有独立的深层。浅层是CNN中最接近原始输入的层，而深层是更靠近最终输出的层。共享浅层中的参数可以会带来以下好处。首先，在浅层中，每个CNN可以提取原始低级特征(例如，斑点、拐角)，其对于分类所有精细类别是有用的。因此，即使每个分支组件聚焦在精细类别的不同集合上，也可以在分支组件之间共享浅层。第二，共享浅层中的参数大大降低了HD-CNN中的参数的总数，这可有助于HD-CNN模型的训练成功。如果每个分支精细类别组件是完全彼此独立训练的，则HD-CNN中的自由参数的数量将与粗类别的数量成线性比例。模型中的过大的参数数量将增加过拟合的可能性。第三，HD-CNN的计算成本和内存消耗也因为共享浅层而减少，这对于在实际应用中部署HD-CNN具有实践重要性。Figure 5 also shows a collection of branch CNNs 540-550, each branch CNN making over-corpus predictions for fine categories. In some exemplary embodiments, branched CNNs 540-550 share parameters in shallow layer 530, but have independent deep layers. Shallow layers are the layers in the CNN that are closest to the original input, while deep layers are the layers that are closer to the final output. Sharing parameters in shallow layers can lead to the following benefits. First, in shallow layers, each CNN can extract raw low-level features (e.g., blobs, corners), which are useful for classifying all fine categories. Thus, shallow layers can be shared between branching components even though each branching component focuses on a different set of fine-grained categories. Second, sharing parameters in shallow layers greatly reduces the total number of parameters in HD-CNN, which can contribute to the training success of HD-CNN models. If the fine category components of each branch are trained completely independently of each other, the number of free parameters in HD-CNN will scale linearly with the number of coarse categories. An excessively large number of parameters in the model will increase the possibility of overfitting. Third, the computational cost and memory consumption of HD-CNN are also reduced due to the sharing of shallow layers, which is of practical importance for deploying HD-CNN in real applications.

概率平均层560接收所有分支CNN 540-550预测以及粗类别CNN 520预测，并产生加权平均作为图像i的最终预测p(x_i)，如下面的等式所示。Probabilistic averaging layer 560 receives all branch CNN 540-550 predictions as well as coarse category CNN 520 predictions and produces a weighted average as the final prediction p( _xi ) for image i, as shown in the following equation.

在该等式中，B_ij是粗类别CNN 520预测的图像i的粗类别j的概率。对于图像i，由第j个分支组件F^j做出的精细类别预测是p_j(x_i)。In this equation, B _ij is the probability of coarse class j for image i predicted by the coarse class CNN 520 . For image i, the fine class prediction made by the jth branch component F ^j is p _j ( _xi ).

粗类别CNN 520和分支CNN 540-550二者可被实现为任何端到端深CNN模型，其以原始图像作为输入，并且将在类别上的概率预测作为输出来返回。Both the coarse category CNN 520 and the branch CNNs 540-550 can be implemented as any end-to-end deep CNN model that takes a raw image as input and returns a probability prediction over the category as output.

对用于训练精细类别组件的多项物流损失函数使用时间稀疏度惩罚项将鼓励每个分支聚焦在精细类别的子集上。包含该时间稀疏度惩罚项的修正的损失函数由下面的等式示出：Using a temporal sparsity penalty on the multinomial logistic loss function used to train the fine-category component will encourage each branch to focus on a subset of the fine-category components. A modified loss function incorporating this temporal sparsity penalty is shown by the following equation:

在该等式中，n是训练小批次的大小，y_i是图像i的基本真值标签(ground truthlabel)，λ是正则化常数。在一些示例实施例中，值5被用于λ。B_ij是粗类别CNN 520预测的图像i的粗类别j的概率。分支j的目标时间稀疏度被表示为t_j。In this equation, n is the size of the training mini-batch, _yi is the ground truth label for image i, and λ is the regularization constant. In some example embodiments, a value of 5 is used for λ. B _ij is the probability of coarse class j for image i predicted by the coarse class CNN 520 . The target temporal sparsity of branch j is denoted t _j .

结合分支的初始化，时间稀疏度项可以确保每个分支组件聚焦于对精细类别的不同子集进行分类，并且防止少数分支接收粗类别概率体的大部分。Combined with branch initialization, the temporal sparsity term can ensure that each branch component focuses on classifying a different subset of fine categories and prevents a few branches from receiving a large portion of the coarse category probability body.

图6是示出根据一些示例实施例的在执行识别粗类别的处理600中HD-CNN服务器130的操作的流程图。处理600包括操作610、620、630、640和650。仅作为示例而非限制，操作610-650被描述为由模块210-260执行。FIG. 6 is a flowchart illustrating the operation of the HD-CNN server 130 in performing a process 600 of identifying coarse categories, according to some example embodiments. Process 600 includes operations 610 , 620 , 630 , 640 , and 650 . By way of example only and not limitation, operations 610-650 are described as being performed by modules 210-260.

在操作610中，粗类别识别模块220将训练样本集合划分为训练集和评估集。例如，将由Nt个训练样本构成的数据集{x_i，y_i}^t划分为两个部分train_train和train_val，其中i在1到Nt范围内。这可以通过选择所期望的train_train和train_val之间的样本分布来完成，例如70％对30％的分布。一旦选择了分布，针对每个集合，可以按照适当的比例随机选择样本。在操作620中，由预训练模块230使用标准训练技术，基于train_train来训练深CNN模型。例如，反向传播(back-propagation)训练算法是训练深CNN模型的一种选择。In operation 610, the coarse category identification module 220 divides the training sample set into a training set and an evaluation set. For example, divide the data set { _xi , y _i } ^t consisting of Nt training samples into two parts train_train and train_val, where i is in the range of 1 to Nt. This can be done by choosing the desired distribution of samples between train_train and train_val, for example a 70% vs. 30% distribution. Once the distribution is chosen, for each set, samples can be randomly selected in appropriate proportions. In operation 620, the deep CNN model is trained based on train_train by the pre-training module 230 using standard training techniques. For example, the back-propagation training algorithm is an option for training deep CNN models.

在操作630中，粗类别识别模块220基于train_val来绘制混淆矩阵。混淆矩阵的大小为C×C。矩阵的列对应于预测的精细类别，而矩阵的行对应于train_val中的实际精细类别。例如，如果每个预测都是正确的，那么只有矩阵的主对角线中的单元格将是不为零的。相反，如果每个预测都是不正确的，那么矩阵的主对角线中的单元将都为零。In operation 630, the coarse class recognition module 220 draws a confusion matrix based on train_val. The size of the confusion matrix is C×C. The columns of the matrix correspond to the predicted fine classes, while the rows of the matrix correspond to the actual fine classes in train_val. For example, if every prediction is correct, only the cells in the main diagonal of the matrix will be nonzero. Conversely, if every prediction is incorrect, then the cells in the main diagonal of the matrix will all be zero.

粗类别识别模块220通过从1减去混淆矩阵的每个元素并使D的对角元素归零来生成距离矩阵D。通过对D和D^T(D的转置)求平均来使得距离矩阵对称。在执行这些操作之后，每个元素D_ij度量类别i与类别j区分的容易度。The coarse class identification module 220 generates the distance matrix D by subtracting each element of the confusion matrix from 1 and zeroing the diagonal elements of D. The distance matrix is made symmetric by averaging D and ^DT (the transpose of D). After performing these operations, each element D _ij measures the ease with which category i is distinguished from category j.

在操作640中，获得精细类别的低维特征表示{f_i}，其中i在1至C的范围内。例如，拉普拉斯特征映射(Laplacian eigenmap)可被用于此目的。低维特征表示保留了低维流形(manifold)上的局部邻域信息，并且被用于将精细类别聚类到粗类别。在示例实施例中，使用k个最近邻居来构造邻接图。例如，值3可以用于k。通过使用热核(例如，具有宽度参数t＝0.95)来设置邻接图的权重。在一些示例性实施例中，{f_i}的维数为3。In operation 640, a fine-category low-dimensional feature representation {f _i } is obtained, where i is in the range of 1 to C. For example, Laplacian eigenmaps can be used for this purpose. Low-dimensional feature representations preserve local neighborhood information on low-dimensional manifolds and are used to cluster fine categories into coarse ones. In an example embodiment, the adjacency graph is constructed using k nearest neighbors. For example, a value of 3 could be used for k. The weights of the adjacency graph are set by using a hot kernel (eg, with a width parameter t=0.95). In some exemplary embodiments, {f _i } has a dimension of three.

粗类别识别模块220(在操作650中)将C个精细类别聚类到C’个粗类别中。可以使用仿射传播(affinity propagation)、k-均值聚类或其他聚类算法来执行聚类。仿射传播可以自动引入粗类别的数量，并可能导致与其他聚类方法相比在尺寸方面更均衡的聚类。均衡聚类有助于确保每个分支组件处理类似数量的精细类别并因此具有类似的工作量。仿射传播中的阻尼因子λ可能影响所得聚类的数量。在一些示例性实施例中，λ被设置为0.98。聚类的结果是从精细类别y到粗类别y′的映射P(y)＝y’。The coarse category identification module 220 clusters (in operation 650) the C fine categories into C' coarse categories. Clustering may be performed using affinity propagation, k-means clustering, or other clustering algorithms. Affine propagation can automatically introduce the number of coarse categories and may lead to clusters that are more balanced in size compared to other clustering methods. Balanced clustering helps ensure that each branch component handles a similar number of fine-grained categories and thus has a similar workload. The damping factor λ in the affine propagation may affect the number of resulting clusters. In some exemplary embodiments, λ is set to 0.98. The result of the clustering is the mapping P(y)=y' from the fine class y to the coarse class y'.

例如，通过基于数据集的50,000个训练图像和10,000个测试图像来训练深CNN模型，可以将CIFAR100数据集的100个类别划分到粗类别。粗类别的数量可被提供作为输入(例如，可以选择四个粗类别)，并且过程600用于将精细类别划分到粗类别。在一个示例实施例中，CIFAR100数据集的100个类别被划分到四个粗类别，如下表所示。For example, the 100 categories of the CIFAR100 dataset can be divided into coarse categories by training a deep CNN model based on the dataset's 50,000 training images and 10,000 testing images. The number of coarse categories may be provided as input (eg, four coarse categories may be selected), and process 600 is used to divide the fine categories into coarse categories. In an example embodiment, the 100 categories of the CIFAR100 dataset are divided into four coarse categories, as shown in the table below.

图7是示出根据一些示例实施例的在执行生成用于对图像进行分类的HD-CNN的处理700中HD-CNN服务器130的操作的流程图。处理700包括操作710、720、730、740、750和760。仅作为示例而非限制，操作710-760被描述为由模块210-260执行。FIG. 7 is a flowchart illustrating the operation of the HD-CNN server 130 in performing a process 700 of generating an HD-CNN for classifying images, according to some example embodiments. Process 700 includes operations 710 , 720 , 730 , 740 , 750 , and 760 . By way of example only and not limitation, operations 710-760 are described as being performed by modules 210-260.

在操作710中，预训练模块230在粗类别的集合上训练粗类别CNN。例如，可以已经使用处理600识别了粗类别的集合。使用映射P(y)＝y′，用粗类别来替换训练数据集的精细类别。在示例实施例中，数据集{x_i，y’_i}被用于训练标准深CNN模型，其中i在1到Nt的范围内。被训练的模型成为HD-CNN的粗类别组件(例如，粗类别CNN520)。In operation 710, the pre-training module 230 trains a coarse category CNN on the set of coarse categories. For example, a set of coarse categories may have been identified using process 600 . Using the mapping P(y) = y', the fine categories of the training dataset are replaced by the coarse categories. In an example embodiment, the dataset { _xi , _y'i } is used to train a standard deep CNN model, where i ranges from 1 to Nt. The trained model becomes the coarse category component of HD-CNN (eg, coarse category CNN520).

在示例实施例中，使用由三个卷积层、一个完全连接层和一个SOFTMAX层构成的网络。每个卷积层有64个过滤器。修正线性单元(Rectified linear units，ReLU)被用作激活单元。在卷积层之间还使用汇集层和响应归一化层。在下面的示例1表格中定义了完整的示例架构。在下面的示例2表格中定义了另一示例架构。In an example embodiment, a network consisting of three convolutional layers, one fully connected layer and one SOFTMAX layer is used. Each convolutional layer has 64 filters. Rectified linear units (ReLU) are used as activation units. Pooling and response normalization layers are also used between convolutional layers. The complete example schema is defined in the Example 1 table below. Another example schema is defined in the Example 2 table below.

在上表中，过滤器使用所指示的输入(例如，像素值)数目。例如，5x5过滤器查找5x5网格中的25个像素，以确定单个值。5x5过滤器考虑输入图像中的每个5x5网格。因此，具有64个5×5过滤器的层针对每个输入像素生成64个输出，这些值中的每一个基于以该输入像素为中心的5×5像素网格。MAX池具有针对像素集合的多个输入，并提供单个输出，即那些输入的最大值。例如，3x3MAX池层将针对每个3x3像素块输出一个值，即那9个像素中的最大值。AVG池具有针对像素集的多个输入，并提供单个输出，即那些输入的平均值(例如均值)。归一化层对从前一层输出的值进行归一化。cccp层向CNN提供非线性组件。SOFTMAX函数是归一化的指数函数，其提供多项物流回归的非线性变型。在一些示例实施例中，SOFTMAX函数获取K维的值向量，并输出K维的值向量，使得输出向量的元素总和为1并且在0至1的范围内。例如，下面的等式可被用于从输入向量z生成输出向量y：In the table above, the filter uses the indicated number of inputs (eg, pixel values). For example, a 5x5 filter looks at 25 pixels in a 5x5 grid to determine a single value. The 5x5 filter considers every 5x5 grid in the input image. Thus, a layer with 64 5x5 filters generates 64 outputs for each input pixel, each of these values based on a 5x5 pixel grid centered on that input pixel. MAX pooling has multiple inputs for a collection of pixels and provides a single output, the maximum of those inputs. For example, a 3x3MAX pooling layer will output a value for each 3x3 pixel block, which is the maximum value of those 9 pixels. AVG pooling has multiple inputs for sets of pixels and provides a single output, the average (eg mean) of those inputs. A normalization layer normalizes the values output from the previous layer. The cccp layer provides nonlinear components to the CNN. The SOFTMAX function is a normalized exponential function that provides a non-linear variant of multinomial stream regression. In some example embodiments, the SOFTMAX function takes a K-dimensional value vector and outputs a K-dimensional value vector such that the sum of the elements of the output vector is 1 and ranges from 0 to 1. For example, the following equation can be used to generate an output vector y from an input vector z:

其中j＝1，…，K. where j=1,...,K.

在操作720中，预训练模块230还训练原型精细类别组件。例如，数据集{x_i，y_i}(其中i在1到Nt范围内)被用于训练标准深CNN模型，其成为原型精细类别组件。在示例实施例中，CIFAR100数据集被用于将CNN作为原型精细类别组件训练。In operation 720, the pre-training module 230 also trains the prototype fine category component. For example, the dataset { _xi , _yi } (where i ranges from 1 to Nt) is used to train a standard deep CNN model, which becomes the prototypical fine-category component. In an example embodiment, the CIFAR100 dataset is used to train a CNN as a prototype fine category component.

在操作730中，循环开始，以处理C’个精细类别组件中的每一个。因此，针对每个精细类别组件执行操作740和750。例如，当识别出四个粗类别时，循环将针对四个精细类别组件中的每一个进行迭代。In operation 730, a loop begins to process each of the C' fine-category components. Accordingly, operations 740 and 750 are performed for each fine category component. For example, when four coarse categories are identified, the loop iterates for each of the four fine category components.

在操作740中，预训练模块230制作精细类别组件的原型精细类别组件的副本。因此，所有精细类别组件都被初始化为相同的状态。在与精细类别组件的粗类别对应的数据集部分上进一步训练精细类别组件。例如，可以使用数据集{x_i，y_i}的子集，其中P(yi)是粗类别。一旦所有精细类别组件和粗类别组件都已经被训练，则HD-CNN被构建。In operation 740, the pre-training module 230 makes a copy of the prototype fine-category component of the fine-category component. Therefore, all fine-category components are initialized to the same state. The fine category component is further trained on the portion of the dataset corresponding to the coarse category of the fine category component. For example, a subset of the dataset {x _i , y _i } can be used, where P(yi) is the coarse category. Once all fine and coarse category components have been trained, the HD-CNN is constructed.

针对精细类别组件的CNN的浅层可以保持固定，而深层允许在训练期间发生改变。例如，使用上述示例1的结构，对于每个精细类别组件，浅层conv1、pool1和norm1可以保持不变，而深层conv2、pool2、norm2、conv3、pool3、ip1和prob在每个精品类组件的训练期间被修改。在一些示例性实施例中，浅层的结构保持固定，但是在浅层内使用的值允许发生改变。关于上述示例2的结构，对于每个精细类别组件，浅层conv1、cccp1、cccp2、pool1和conv2可以保持不变，而深层cccp3、cccp4、pool2、conv3、cccp5、cccp6、pool3和prob在每个精细类别组件的训练期间被修改。The shallow layers of a CNN targeting fine category components can be kept fixed, while the deep layers are allowed to change during training. For example, using the structure of example 1 above, for each fine class component, the shallow conv1, pool1 and norm1 can remain unchanged, while the deep conv2, pool2, norm2, conv3, pool3, ip1 and prob are in each fine class component Modified during training. In some exemplary embodiments, the structure of the shallow layer remains fixed, but the values used within the shallow layer are allowed to change. Regarding the structure of Example 2 above, for each fine category component, the shallow conv1, cccp1, cccp2, pool1 and conv2 can remain unchanged, while the deep cccp3, cccp4, pool2, conv3, cccp5, cccp6, pool3 and prob are in each The fine category component is modified during training.

在操作760中，微调模块240对构造的HD-CNN进行微调。可以使用具有时间稀疏度惩罚的多项物流损失函数来执行微调。目标时间稀疏度{tj}(其中j在1到C’的范围内)可以使用映射P来定义。例如，可以使用下面的等式，其中S_k是来自精细类别k的图像集合。In operation 760, the fine-tuning module 240 fine-tunes the constructed HD-CNN. Fine-tuning can be performed using a multinomial logistic loss function with a temporal sparsity penalty. A target temporal sparsity {tj} (where j ranges from 1 to C') can be defined using a map P. For example, the following equation can be used, where S _k is the set of images from fine category k.

可以基于计算时间和期望的每次迭代的学习量来选择用于微调的批次大小。例如，可以使用批次大小250。在每个批次后，可以测量训练误差。如果训练误差的改善率低于阈值，则可以降低学习速率(例如，降低10％，折半降低，或降低另一量)。当学习速率降低时，可以修改阈值。在达到最小学习速率之后(例如，当学习速率降低到原始值的50％以下时)，在预定数量的批次已被用于微调之后，或在其任何适当的组合下，微调处理停止。The batch size for fine-tuning can be chosen based on computation time and desired amount of learning per iteration. For example, a batch size of 250 can be used. After each batch, the training error can be measured. If the rate of improvement in training error is below a threshold, the learning rate may be reduced (eg, by 10%, by half, or by another amount). The threshold can be modified when the learning rate is reduced. The fine-tuning process stops after the minimum learning rate is reached (eg, when the learning rate drops below 50% of the original value), after a predetermined number of batches have been used for fine-tuning, or any suitable combination thereof.

根据各种示例实施例，本文描述的方法中的一种或多种可以促进生成用于图像分类的HD-CNN。此外，相对于标准深CNN而言，本文描述的方法中的一种或多种可以以较高的成功率促进对图像的分类。此外，相对于以前的方法而言，本文描述的方法中的一种或多种可以有助于更快地且使用较少的计算能力来为用户训练HD-CNN。类似地，与将CNN训练到一分辨率质量的情形相比，本文描述的方法中的一种或多种方法可以有助于利用较少的训练样本来以相同分辨率质量训练HD-CNN。According to various example embodiments, one or more of the methods described herein may facilitate generation of an HD-CNN for image classification. Furthermore, one or more of the methods described in this paper can facilitate the classification of images with higher success rates relative to standard deep CNNs. Furthermore, one or more of the methods described herein may facilitate training HD-CNNs for users more quickly and using less computing power than previous methods. Similarly, one or more of the methods described herein may facilitate training an HD-CNN at the same resolution quality with fewer training samples than is the case with training a CNN to the same resolution quality.

当总体考虑这些效果时，本文描述的方法中的一个或多个可以消除针对某些工作量或资源的需求，该某些工作量或资源原本是生成或者使用用于图像分类的HD-CNN所涉及的。通过本文描述的方法中的一个或多个，还可以减少用户在订购感兴趣的项目时所付出的努力。例如，根据图像准确地识别出用户感兴趣的项目的类别可以减少用户在创建项目列表或查找要购买的项目时花费的时间或工作量。可以类似地减少由(例如在网络环境100中的)一个或多个机器、数据库或设备使用的计算资源。这样的计算资源的示例包括处理器循环、网络流量、存储器使用状况、数据存储容量、功耗以及冷却能力。When these effects are considered collectively, one or more of the methods described herein can eliminate the need for some of the workload or resources that would otherwise be generated or used by an HD-CNN for image classification. Involved. By one or more of the methods described herein, user effort in ordering an item of interest may also be reduced. For example, accurately identifying the category of items a user is interested in based on an image can reduce the time or effort a user spends creating a list of items or finding items to purchase. Computing resources used by one or more machines, databases, or devices (eg, in network environment 100) may similarly be reduced. Examples of such computing resources include processor cycles, network traffic, memory usage, data storage capacity, power consumption, and cooling capabilities.

软件架构Software Architecture

图8是示出了软件802的架构的框图800，所述软件可以安装在上述任意一个或多个设备上。图8仅为软件架构的非限制性示例，且应该了解，可以实施许多其他架构以促进实现本文中所描述的功能。软件802可以通过诸如图9的机器900之类的硬件来实现，所述机器900包括处理器910、存储器930、以及I/O组件950。在该示例架构中，软件802可被概念化为层的堆栈，其中每层可以提供特定的功能。例如，软件802包括诸如操作系统804、库806、框架808和应用810之类的层。在操作上，根据一些实施例，应用810通过软件栈调用应用编程接口(API)调用812，并响应于API调用812接收消息814。FIG. 8 is a block diagram 800 illustrating the architecture of software 802 that may be installed on any one or more of the devices described above. Figure 8 is only a non-limiting example of a software architecture, and it should be appreciated that many other architectures can be implemented to facilitate the functionality described herein. Software 802 may be implemented by hardware such as machine 900 of FIG. 9 , which includes processor 910 , memory 930 , and I/O components 950 . In this example architecture, software 802 can be conceptualized as a stack of layers, where each layer can provide specific functionality. For example, software 802 includes layers such as operating system 804 , libraries 806 , framework 808 , and applications 810 . Operationally, according to some embodiments, an application 810 invokes an application programming interface (API) call 812 through a software stack and receives a message 814 in response to the API call 812 .

在各种实现中，操作系统804管理硬件资源并提供公共服务。操作系统804包括例如内核820、服务822和驱动824。在一些实现中，内核820用作硬件和其他软件层之间的抽象层。例如，内核820尤其提供存储器管理、处理器管理(例如，调度)、组件管理、联网和安全设置等的功能。服务822可以为其他软件层提供其他公共服务。驱动824可以负责控制底层硬件或与底层硬件接口连接。例如，驱动824可以包括显示驱动、相机驱动、驱动、闪存驱动、串行通信驱动(例如通用串行总线(USB)驱动)，驱动、音频驱动、电源管理驱动等等。In various implementations, the operating system 804 manages hardware resources and provides common services. Operating system 804 includes, for example, kernel 820 , services 822 and drivers 824 . In some implementations, core 820 acts as an abstraction layer between hardware and other software layers. For example, kernel 820 provides functions for memory management, processor management (eg, scheduling), component management, networking and security settings, among others. Services 822 may provide other common services to other software layers. The driver 824 may be responsible for controlling or interfacing with the underlying hardware. For example, drivers 824 may include display drivers, camera drivers, drivers, flash drives, serial communication drivers (such as Universal Serial Bus (USB) drivers), drivers, audio drivers, power management drivers, and more.

在一些实现中，库806提供可以由应用810使用的低级公共基础设施。库806可以包括可以提供诸如存储器分配功能、串操纵功能、数学功能等的功能的系统库830(例如，C标准库)。另外，库806可以包括API库832，例如媒体库(例如，支持各种媒体格式的呈现和操纵的库，所述格式是比如运动图像专家组4(MPEG4)、高级视频编码(H.264或AVC)、运动图像专家组层3(MP3)、高级音频编码(AAC)、自适应多速率(AMR)音频编解码器、联合图像专家组(JPEG或JPG)、或便携式网络图形(PNG))、图形库(例如，用于在显示器上的图形内容中进行二维(2D)和三维(3D)渲染的OpenGL框架)、数据库(例如，提供各种关系数据库函数的SQLite)、web库(例如，提供网络浏览功能的WebKit)等。库806还可以包括各种各样的其他库834，以向应用810提供许多其他API。In some implementations, libraries 806 provide low-level common infrastructure that can be used by applications 810 . Libraries 806 may include system libraries 830 (eg, a C standard library) that may provide functionality such as memory allocation functions, string manipulation functions, math functions, and the like. Additionally, libraries 806 may include API libraries 832, such as media libraries (e.g., libraries that support rendering and manipulation of various media formats, such as Moving Picture Experts Group 4 (MPEG4), Advanced Video Coding (H.264 or AVC), Moving Picture Experts Group Layer 3 (MP3), Advanced Audio Coding (AAC), Adaptive Multi-Rate (AMR) audio codec, Joint Photographic Experts Group (JPEG or JPG), or Portable Network Graphics (PNG)) , graphics libraries (such as the OpenGL framework for two-dimensional (2D) and three-dimensional (3D) rendering in graphical content on a display), databases (such as SQLite that provides various relational database functions), web libraries (such as , WebKit that provides web browsing functionality), etc. Library 806 may also include a variety of other libraries 834 to provide application 810 with many other APIs.

根据一些实现，框架808提供可以被应用810使用的高级公共基础设施。例如，框架808提供各种图形用户界面(GUI)功能、高级资源管理、高级位置服务等。框架808可以提供可以被应用810使用的广泛的其他API，其中一些可以特定于特定的操作系统或平台。According to some implementations, framework 808 provides a high-level common infrastructure that can be used by applications 810 . For example, framework 808 provides various graphical user interface (GUI) functions, advanced resource management, advanced location services, and the like. Framework 808 may provide a wide range of other APIs that may be used by applications 810, some of which may be specific to a particular operating system or platform.

在示例实施例中，应用810包括家庭应用850、联系人应用852、浏览器应用854、书阅读器应用856、位置应用858、媒体应用860、消息收发应用862、游戏应用864、以及诸如第三方应用866之类的各种各样的其他应用。根据一些实施例，应用810是执行在程序中定义的功能的程序。可以采用各种编程语言来创建以各种方式结构化的应用810中的一个或多个，诸如面向对象的编程语言(例如，Objective-C，Java或C++)或过程编程语言(例如C或汇编语言)。在具体示例中，第三方应用866(例如，由与特定平台的供应商不同的实体使用ANDROID TM或IOS^TM软件开发工具包(SDK)而开发的应用)可以是在移动操作系统(诸如iOS^TM、Android^TM、Phone或其他移动操作系统)上运行的移动软件。在该示例中，第三方应用866可以调用由移动操作系统804提供的API调用812，以促进本文描述的功能。In an example embodiment, applications 810 include home application 850, contacts application 852, browser application 854, book reader application 856, location application 858, media application 860, messaging application 862, gaming application 864, and third-party applications such as Various other applications such as application 866. According to some embodiments, the application 810 is a program that performs functions defined in the program. Various programming languages may be employed to create one or more of the variously structured applications 810, such as object-oriented programming languages (e.g., Objective-C, Java, or C++) or procedural programming languages (e.g., C or assembly language). In a specific example, a third-party application 866 (e.g., an application developed by an entity other than the vendor of the particular platform using the ANDROID™ or IOS ^™ software development kit (SDK)) may be running on a mobile operating system such as iOS ^™ , Android ^TM , Mobile software running on the Phone or other mobile operating system). In this example, third party application 866 can invoke API calls 812 provided by mobile operating system 804 to facilitate the functionality described herein.

示例机器架构和机器可读介质Example machine architecture and machine-readable media

图9是示出了根据一些示例实施例的能够从机器可读介质(例如，机器可读存储介质)中读取指令并执行本文所讨论的方法中的任何一个或多个的机器900的组件的框图。具体地，图9示出了计算机系统的示例形式的机器900的示意性表示，在机器900中，可以执行指令916(例如，软件、程序、应用、小应用程序、app或其他可执行代码)以使机器900执行本文所讨论的方法中的任何一个或多个。在备选实施例中，机器900作为独立设备操作或可以耦合(例如，联网)到其他机器。在联网部署中，机器900可以在服务器-客户端网络环境中以服务器机器或客户端机器的能力进行操作，或者在对等(或分布式)网络环境中作为对等机器进行操作。机器900可以包括但不限于服务器计算机、客户端计算机、个人计算机(PC)、平板计算机、膝上型计算机、上网本、机顶盒(STB)、个人数字助理(PDA)、娱乐媒体系统、蜂窝电话、智能电话、移动设备、可穿戴设备(例如智能手表)、智能家居设备(例如智能家电)、其他智能设备、网络设备、网络路由器、网络交换机、网络桥、或能够顺序地或以其他方式执行指定机器900要采取的动作的指令916的任意机器。此外，尽管仅示出了单个机器900，但是术语“机器”也将被认为包括机器900的集合，机器900单独地或联合地执行指令916以执行本文讨论的方法中的任何一个或多个。9 is a diagram illustrating components of a machine 900 capable of reading instructions from a machine-readable medium (eg, a machine-readable storage medium) and performing any one or more of the methods discussed herein, according to some example embodiments block diagram. In particular, FIG. 9 shows a schematic representation of a computer system in the form of a machine 900 in which instructions 916 (e.g., software, programs, applications, applets, apps, or other executable code) can be executed. to cause machine 900 to perform any one or more of the methods discussed herein. In alternative embodiments, machine 900 operates as a standalone device or may be coupled (eg, networked) to other machines. In a networked deployment, the machine 900 can operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. Machine 900 may include, but is not limited to, server computers, client computers, personal computers (PCs), tablet computers, laptop computers, netbooks, set-top boxes (STBs), personal digital assistants (PDAs), entertainment media systems, cellular phones, smart Phones, mobile devices, wearable devices (such as smart watches), smart home devices (such as smart appliances), other smart devices, network devices, network routers, network switches, network bridges, or machines capable of sequentially or otherwise executing specified 900 any machine that instructs 916 an action to take. Further, while a single machine 900 is shown, the term "machine" will also be taken to include a collection of machines 900 that individually or jointly execute instructions 916 to perform any one or more of the methodologies discussed herein.

机器900可以包括可被配置为经由总线902彼此通信的处理器910、存储器930和I/O组件950。在示例实施例中，处理器910(例如，中央处理单元(CPU)、精简指令集计算(RISC)处理器、复杂指令集计算(CISC)处理器、图形处理单元(GPU)、数字信号处理器(DSP)、专用集成电路(ASIC)、射频集成电路(RFIC)、其他处理器或其任何适当组合)可以包括例如可以执行指令916的处理器912和处理器914。术语“处理器”旨在包括可以包括可以同时执行指令的两个或更多个独立处理器(也称为“核”)的多核处理器。尽管图9示出了多个处理器，但是机器900可以包括具有单个核的单个处理器、具有多个核的单个处理器(例如，多核处理)、具有单个核的多个处理器、具有多个核的多个处理器或其任意组合。Machine 900 may include a processor 910 , memory 930 , and I/O components 950 that may be configured to communicate with one another via a bus 902 . In an example embodiment, the processor 910 (eg, central processing unit (CPU), reduced instruction set computing (RISC) processor, complex instruction set computing (CISC) processor, graphics processing unit (GPU), digital signal processor (DSP, application specific integrated circuit (ASIC), radio frequency integrated circuit (RFIC), other processors, or any suitable combination thereof) may include, for example, processor 912 and processor 914 that may execute instructions 916 . The term "processor" is intended to include multi-core processors, which may include two or more independent processors (also referred to as "cores") that can execute instructions concurrently. Although FIG. 9 shows multiple processors, machine 900 may include a single processor with a single core, a single processor with multiple cores (e.g., multi-core processing), multiple processors with a single core, multiple processors with multiple cores or any combination thereof.

存储器930可以包括经由总线902可被处理器910访问的主存储器932、静态存储器934和存储单元936。存储单元936可以包括其上存储有指令916的机器可读介质938，所述指令916实现此处描述的方法或功能中的任意一个或多个。在机器900执行指令期间，指令916还可以完全地或至少部分地驻留在主存储器932内、静态存储器934内、处理器910中的至少一个内(例如，处理器的高速缓冲存储器内)、或其任何合适的组合内。因此，在各种实现中，主存储器932、静态存储器934和处理器910被认为是机器可读介质938。The memory 930 may include a main memory 932 , a static memory 934 and a storage unit 936 accessible by the processor 910 via the bus 902 . The storage unit 936 may include a machine-readable medium 938 having stored thereon instructions 916 implementing any one or more of the methods or functions described herein. Instructions 916 may also reside, completely or at least partially, within at least one of main memory 932, static memory 934, processor 910 (e.g., within a cache memory of the processor), during execution of the instructions by machine 900, or any suitable combination thereof. Accordingly, main memory 932 , static storage 934 and processor 910 are considered machine-readable media 938 in various implementations.

如本文所使用的，术语“存储器”指能够临时或永久地存储数据的机器可读介质938，并且可以被看作包括但不限于随机存取存储器(RAM)、只读存储器(ROM)、缓冲存储器、闪存以及高速缓存存储器。虽然机器可读介质938在示例实施例中被示为是单个介质，但是术语“机器可读介质”应当被认为包括能够存储指令916的单个介质或多个介质(例如，集中式或分布式数据库或相关联的高速缓存和服务器)。术语“机器可读介质”还将被认为包括能够存储被机器(例如机器900)执行的指令(例如，指令916)的任何介质或多个介质的组合，使得指令在被机器900的一个或多个处理器(例如，处理器910)执行时使机器900执行本文所描述的方法中的任何一个或多个。因此，“机器可读介质”指单个存储装置或设备、以及包括多个存储装置或设备的“基于云”的存储系统或存储网络。因此，术语“机器可读介质”应被理解为包括但不限于具有固态存储器(例如，闪存)、光介质、磁介质、其他非易失性存储器(例如，可擦除可编程只读存储器(EPROM))或其任意合适组合等的形式的一个或多个数据储存库。术语“机器可读介质”特别地排除非法定的信号本身。As used herein, the term "memory" refers to a machine-readable medium 938 capable of storing data temporarily or permanently, and may be considered to include, but is not limited to, random access memory (RAM), read only memory (ROM), cache memory, flash memory, and cache memory. Although machine-readable medium 938 is shown as a single medium in example embodiments, the term "machine-readable medium" should be taken to include a single medium or multiple media capable of storing instructions 916 (e.g., a centralized or distributed database or associated cache and server). The term "machine-readable medium" will also be taken to include any medium or combination of media capable of storing instructions (e.g., instructions 916) to be executed by a machine (e.g., machine 900) such that the instructions are executed by one or more of the machine 900 A processor (eg, processor 910), when executed, causes machine 900 to perform any one or more of the methods described herein. Accordingly, a "machine-readable medium" refers to a single storage device or device, as well as to a "cloud-based" storage system or storage network that includes multiple storage devices or devices. Accordingly, the term "machine-readable medium" should be understood to include, but is not limited to, devices with solid-state memory (e.g., flash memory), optical media, magnetic EPROM)) or any suitable combination thereof etc. in the form of one or more data repositories. The term "machine-readable medium" specifically excludes non-statutory signals per se.

I/O组件950包括用于接收输入、提供输出、产生输出、发送信息、交换信息、捕获测量等的各种组件。通常，应当理解，I/O组件950可以包括图9中未示出的许多其他组件。可以根据功能将I/O组件950分组，以仅用于简化以下讨论，并且分组不以任何方式进行限制。在各种示例实施例中，I/O组件950包括输出组件952和输入组件954。输出组件952包括可视组件(例如显示器，比如等离子体显示面板(PDP)、发光二极管(LED)显示器、液晶显示器(LCD)、投影仪或阴极射线管(CRT))、声学组件(例如，扬声器)、触觉组件(例如，振动马达)、其他信号发生器等。输入组件954包括字母数字输入组件(例如，键盘、被配置为接收字母数字输入的触摸屏、光电键盘或其他字母数字输入组件)、基于点的输入组件(例如，鼠标、触摸板、轨迹球、操纵杆、运动传感器或其他定点仪器)、触觉输入组件(例如，物理按钮、提供触摸或触摸手势的位置和力的触摸屏、或其他触觉输入组件)、音频输入组件(例如，麦克风)等。I/O components 950 include various components for receiving input, providing output, generating output, sending information, exchanging information, capturing measurements, and the like. In general, it should be understood that I/O components 950 may include many other components not shown in FIG. 9 . I/O components 950 may be grouped according to function only to simplify the following discussion, and the grouping is not limiting in any way. In various example embodiments, I/O components 950 include output components 952 and input components 954 . Output components 952 include visual components (e.g., a display such as a plasma display panel (PDP), light emitting diode (LED) display, liquid crystal display (LCD), projector, or cathode ray tube (CRT)), acoustic components (e.g., a speaker ), haptic components (eg, vibration motors), other signal generators, etc. Input components 954 include alphanumeric input components (e.g., a keyboard, a touch screen configured to receive alphanumeric input, an optical keyboard, or other alphanumeric input components), point-based input components (e.g., a mouse, touchpad, sticks, motion sensors, or other pointing instruments), tactile input components (e.g., physical buttons, touch screens that provide the position and force of touch or touch gestures, or other tactile input components), audio input components (e.g., microphones), etc.

在另一些示例实施例中，I/O组件950尤其包括生物测定组件956、运动组件958、环境组件960或位置组件962等的组件。例如，生物测定组件956包括用于检测表现(例如，手表现、面部表现、语音表现、身体姿势或眼睛跟踪)、测量生物信号(例如，血压、心率、体温、汗水或脑波)、识别人(例如，语音识别、视网膜识别、面部识别、指纹识别或基于脑电图的识别)等的组件。运动组件958包括加速度传感器组件(例如，加速度计)、重力传感器组件、旋转传感器组件(例如，陀螺仪)等。环境组件960包括例如照度传感器组件(例如，光度计)、温度传感器组件(例如，检测环境温度的一个或多个温度计)、湿度传感器组件、压力传感器组件(例如气压计)、声学传感器组件(例如，检测背景噪声的一个或多个麦克风)、接近传感器组件(例如，检测附近物体的红外传感器)、气体传感器(例如，机器嗅觉检测传感器、为安全而检测有害气体浓度或测量大气中的污染物的气体检测传感器)、或可以提供对应于周围物理环境的指示、测量或信号的其他组件。位置组件962包括位置传感器组件(例如，全球定位系统(GPS)接收机组件)、高度传感器组件(例如，高度计或检测气压的气压计(根据气压可以导出高度))、方位传感器组件(例如，磁力计)等。In other example embodiments, the I/O component 950 includes components such as a biometric component 956, a motion component 958, an environmental component 960, or a location component 962, among others. For example, the biometric component 956 includes functions for detecting manifestations (e.g., hand expression, facial expression, voice expression, body posture, or eye tracking), measuring biosignals (e.g., blood pressure, heart rate, body temperature, sweat, or brain waves), identifying human (e.g., speech recognition, retinal recognition, facial recognition, fingerprint recognition, or EEG-based recognition), etc. Motion components 958 include acceleration sensor components (eg, accelerometers), gravity sensor components, rotation sensor components (eg, gyroscopes), and the like. Environmental components 960 include, for example, illuminance sensor components (e.g., a photometer), temperature sensor components (e.g., one or more thermometers to detect ambient temperature), humidity sensor components, pressure sensor components (e.g., barometers), acoustic sensor components (e.g., , one or more microphones to detect background noise), proximity sensor components (e.g., infrared sensors to detect nearby objects), gas sensors (e.g., machine smell detection sensors, to detect concentrations of harmful gases for safety, or to measure pollutants in the atmosphere gas detection sensors), or other components that can provide indications, measurements, or signals corresponding to the surrounding physical environment. Location component 962 includes a position sensor component (e.g., a global positioning system (GPS) receiver component), an altitude sensor component (e.g., an altimeter or a barometer that detects air pressure from which altitude can be derived), an orientation sensor component (e.g., a magnetic Count) etc.

可以使用各种各样的技术来实现通信。I/O组件950可以包括通信组件964，通信组件964可操作以分别经由耦接982和耦接972将机器900耦合到网络980或设备970。例如，通信组件964包括网络接口组件或与网络980接口连接的另一合适设备。在另一些示例中，通信组件964包括有线通信组件、无线通信组件、蜂窝通信组件、近场通信(NFC)组件、组件(例如低能)、组件、以及经由其他模态提供通信的其他通信组件。设备970可以是另一机器或各种外围设备中的任一种(例如，经由通用串行总线(USB)耦合的外围设备)。Communications may be accomplished using a variety of techniques. I/O component 950 may include communication component 964 operable to couple machine 900 to network 980 or device 970 via coupling 982 and coupling 972, respectively. For example, communication component 964 includes a network interface component or another suitable device that interfaces with network 980 . In other examples, the communication component 964 includes a wired communication component, a wireless communication component, a cellular communication component, a near field communication (NFC) component, components (such as low energy), component, and other communicating components that provide communication via other modalities. Device 970 may be another machine or any of a variety of peripheral devices (eg, a peripheral device coupled via a Universal Serial Bus (USB)).

此外，在一些实现中，通信组件964检测标识符或包括可操作以检测标识符的组件。例如，通信组件964包括射频识别(RFID)标签读取器组件、NFC智能标签检测组件、光学读取器组件(例如，用于检测一维条形码(如通用产品代码(UPC)条形码)、多维条形码(如快速响应(QR)码、Aztec码、数据矩阵、数据字、MaxiCode、PDF417、超级码、统一商业码缩减空间符号(UCC RSS)-2D条形码以及其他光学代码)的光学传感器)、声学检测组件(例如，识别带标签的音频信号的麦克风)或其任意合适的组合。另外，可以经由通信组件964导出各种信息，诸如经由互联网协议(IP)地理位置的位置、经由信号三角测量的位置、经由检测可以指示特定位置的NFC信标信号的位置等等。Additionally, in some implementations, the communicating component 964 detects an identifier or includes a component operable to detect an identifier. For example, the communication component 964 includes a radio frequency identification (RFID) tag reader component, an NFC smart label detection component, an optical reader component (e.g., for detecting one-dimensional barcodes such as Universal Product Code (UPC) barcodes), multi-dimensional barcodes (Optical sensors such as Quick Response (QR) Code, Aztec Code, Data Matrix, Data Word, MaxiCode, PDF417, Supercode, Uniform Commercial Code Reduced Space Symbol (UCC RSS)-2D barcode and other optical codes), acoustic detection Components (eg, a microphone that recognizes tagged audio signals) or any suitable combination thereof. Additionally, various information can be derived via the communication component 964, such as location via Internet Protocol (IP) geographic location, via The location of signal triangulation, the location of NFC beacon signals via detection may indicate a particular location, and the like.

传输介质Transmission medium

在各种示例实施例中，网络980的一个或多个部分可以是自组织网络、内联网、外联网、虚拟专用网(VPN)、局域网(LAN)、无线LAN(WLAN)、广域网(WAN)、无线WAN(WWAN)、城域网(MAN)、互联网，互联网的一部分、公共交换电话网络(PSTN)的一部分、普通老式电话服务(POTS)网络、蜂窝电话网络、无线网络、网络、另一类型的网络、或两个或更多个这样的网络的组合。例如，网络980或网络980的一部分可以包括无线或蜂窝网络，并且耦接982可以是码分多址(CDMA)连接、全球移动通信系统(GSM)连接或其他类型的蜂窝或无线耦接。在该示例中，耦接982可以实现各种类型的数据传输技术中的任何一种，例如单载波无线电传输技术(1xRTT)、演进数据优化(EVDO)技术、通用分组无线电服务(GPRS)技术、GSM演进增强数据速率(EDGE)技术、包括3G的第三代合作伙伴计划(3GPP)、第四代无线(4G)网络、通用移动电信系统(UMTS)、高速分组接入(HSPA)、全球微波接入互操作性(WiMAX)、长期演进(LTE)标准、由各种标准设置组织定义的其他标准、其他远程协议或其他数据传输技术。In various example embodiments, one or more portions of network 980 may be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN) , Wireless WAN (WWAN), Metropolitan Area Network (MAN), Internet, Part of the Internet, Part of the Public Switched Telephone Network (PSTN), Plain Old Telephone Service (POTS) Network, Cellular Telephone Network, Wireless Network, network, another type of network, or a combination of two or more such networks. For example, network 980, or a portion of network 980, may include a wireless or cellular network, and coupling 982 may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile Communications (GSM) connection, or other type of cellular or wireless coupling. In this example, coupling 982 may implement any of various types of data transmission technologies, such as single-carrier radio transmission technology (1xRTT), evolution data optimized (EVDO) technology, general packet radio service (GPRS) technology, Enhanced Data Rates for GSM Evolution (EDGE) technology, Third Generation Partnership Project (3GPP) including 3G, Fourth Generation Wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Global Microwave Access Interoperability (WiMAX), Long Term Evolution (LTE) standards, other standards defined by various standards setting organizations, other long range protocols or other data transmission technologies.

在示例实施例中，使用传输介质在网络980上经由网络接口设备(例如，通信组件964中包括的网络接口组件)并利用多个公知传输协议(例如，超文本传输协议(HTTP))中的任意一个来发送或接收指令916。类似地，在其他示例实施例中，使用传输介质经由耦接972(例如，对等耦接)向设备970发送或接收指令916。术语“传输介质”应被认为包括能够存储、编码或携带用于被机器900执行的指令916的任何无形介质，并且包括用于促进该软件的通信的数字或模拟通信信号或其他无形介质。In an example embodiment, a transport medium is used over network 980 via a network interface device (eg, a network interface component included in communication component 964) and utilizing one of a number of well-known transport protocols (eg, Hypertext Transfer Protocol (HTTP)). Either to send or receive instructions 916. Similarly, in other example embodiments, instructions 916 are sent to or received from device 970 via a coupling 972 (eg, a peer-to-peer coupling) using a transmission medium. The term "transmission medium" shall be taken to include any intangible medium capable of storing, encoding or carrying instructions 916 for execution by machine 900, and includes digital or analog communication signals or other intangible media used to facilitate communication of such software.

此外，携带机器可读指令的传输介质或信号包括机器可读介质938的一个实施例。Additionally, a transmission medium or signal carrying machine-readable instructions comprises an embodiment of machine-readable medium 938 .

语言language

在该说明书中，复数实例可以实现被描述为单数实例的组件、操作或结构。虽然一个或多个方法的各个操作被示意和描述为分离的操作，但是各个操作中的一个或多个可以同时执行，并且无需按所示顺序执行操作。在示例配置中被示为分离组件的结构和功能可以被实现为组合结构或组件。类似地，被示为单个组件的结构和功能可以被实现为分离的组件。这些和其他变型、修改、添加和改进落入本主题的范围内。In this specification, plural instances may implement components, operations, or structures described as singular instances. Although various operations of one or more methods are illustrated and described as separate operations, one or more of the various operations may be performed concurrently, and the operations need not be performed in the order presented. Structure and functionality shown as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality illustrated as a single component may be implemented as separate components. These and other variations, modifications, additions and improvements fall within the scope of the subject matter.

尽管已经参考具体示例实施例描述了本发明主题的概述，但是在不脱离本公开的实施例的更宽范围的情况下，可以对这些实施例进行各种修改和改变。本发明主题的这些实施例在本文中可以单独地或共同地由术语“发明”提及，以仅仅为了方便，并且不旨在自动地将本申请的范围限制为任何单个公开或发明构思(如果事实上公开了一个以上)。Although an overview of the inventive subject matter has been described with reference to specific example embodiments, various modifications and changes may be made to these embodiments without departing from the broader scope of the disclosed embodiments. These embodiments of the inventive subject matter may be referred to herein, individually or collectively, by the term "invention" for convenience only and are not intended to automatically limit the scope of the application to any single disclosure or inventive concept (if In fact more than one is disclosed).

本文充分详细地描述了示出的实施例以使得本领域技术人员能够实现公开的教导。可以利用并根据这些实施例得出其他实施例，从而可以在不脱离本公开的范围的情况下做出结构和逻辑上的替换和改变。因此，该“具体实施方式”不应当看做是限制意义，并且各种实施例的范围仅通过所附权利要求以及权利要求的等同物的全部范围来限定。The illustrated embodiments are described herein in sufficient detail to enable those skilled in the art to implement the disclosed teachings. Other embodiments may be utilized and derived from these embodiments, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. Therefore, the "details of embodiments" should not be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full scope of equivalents to which the claims are entitled.

如本文所使用的，术语“或”可以被解释为包括性或排他性的意义。此外，可以针对本文中描述为单个实例的资源、操作或结构提供多个实例。另外，各种资源、操作、模块、引擎和数据存储之间的边界在某种程度上是任意的，并且在具体说明性配置的上下文中示出了特定操作。设想了功能的其他分配，并且这些分配可以落入本公开的各种实施例的范围内。一般来说，在示例配置中作为单独资源呈现的结构和功能可以被实现为组合的结构或资源。类似地，作为单个资源呈现的结构和功能可以被实现为单独的资源。这些和其他变型、修改、添加和改进落入由所附权利要求表示的本公开的实施例的范围内。因此，说明书和附图应当被看做说明性的而不是限制意义的。As used herein, the term "or" can be interpreted in an inclusive or exclusive sense. Furthermore, multiple instances may be provided for a resource, operation or structure described herein as a single instance. Additionally, the boundaries between the various resources, operations, modules, engines, and data stores are somewhat arbitrary, and certain operations are shown in the context of specific illustrative configurations. Other allocations of functionality are contemplated and may fall within the scope of the various embodiments of the present disclosure. In general, structures and functionality presented as separate resources in example configurations may be implemented as combined structures or resources. Similarly, structures and functionality presented as a single resource may be implemented as separate resources. These and other variations, modifications, additions and improvements fall within the scope of the embodiments of the present disclosure as represented by the appended claims. Accordingly, the specification and drawings are to be regarded as illustrative rather than restrictive.

以下列举的示例定义了本文讨论的方法、机器可读介质和系统(例如，装置)的各种示例实施例：The following enumerated examples define various example embodiments of the methods, machine-readable media, and systems (e.g., apparatuses) discussed herein:

示例1一种系统，包括：Example 1 A system comprising:

粗类别识别模块，被配置为：The coarse category recognition module is configured as:

访问包括分类数据的数据集，所述分类数据具有多个精细类别；accessing a data set comprising categorical data having a plurality of fine-grained categories;

识别多个粗类别，所述粗类别的数量少于精细类别的数量；以及identifying a plurality of coarse categories, the number of coarse categories being less than the number of fine categories; and

针对每个精细类别，确定相关联的粗类别；For each fine category, an associated coarse category is determined;

预训练模块，被配置为：The pre-training module is configured as:

训练用于在粗类别之间进行区分的基础卷积神经网络(CNN)；以及Train an underlying convolutional neural network (CNN) to differentiate between coarse categories; and

训练每个粗类别的精细CNN，所述粗类别的精细CNN用于在与所述粗类别相关联的精细类别之间进行区分；以及training a fine CNN for each coarse category for discriminating between fine categories associated with the coarse category; and

分类模块，被配置为：The classification module, configured as:

接收对数据进行分类的请求；receive requests to classify data;

使用所述基础CNN，确定所述数据的粗类别；Using the base CNN, determine a coarse category for the data;

使用所确定的粗类别的精细CNN，确定所述数据的精细类别；以及determining a fine class of said data using the fine CNN of the determined coarse class; and

响应于所述请求，发送所述数据的精细类别。In response to the request, a fine category of the data is sent.

示例2根据示例1所述的系统，其中，所述粗类别识别模块还被配置为：Example 2 The system according to Example 1, wherein the coarse category identification module is further configured to:

将所述数据集划分为训练集和值集；dividing the data set into a training set and a value set;

使用所述训练集来训练第一CNN模型；以及using the training set to train a first CNN model; and

使用所述值集生成所述第一CNN模型的混淆矩阵；其中generating a confusion matrix for the first CNN model using the set of values; wherein

针对每个精细类别确定相关联的粗类别包括：向所述混淆矩阵应用仿射传播算法。Determining an associated coarse class for each fine class includes applying an affine propagation algorithm to the confusion matrix.

示例3根据示例1或示例2所述的系统，其中，所述粗类别识别模块还被配置为：Example 3 The system according to example 1 or example 2, wherein the coarse category identification module is further configured to:

使用拉普拉斯特征映射获取精细类别的低维特征表示。Obtain low-dimensional feature representations of fine-grained categories using Laplacian feature maps.

示例4根据示例1到3中任一合适项所述的系统，其中，所述训练模块被配置为使用包括以下各项的操作来训练每个粗类别的精细CNN：Example 4 The system of any suitable one of Examples 1 to 3, wherein the training module is configured to train a fine CNN for each coarse category using operations comprising:

使用所述训练集来训练第二CNN模型；using the training set to train a second CNN model;

根据所述第二CNN生成每个粗类别的精细CNN；以及generating a fine CNN for each coarse category based on the second CNN; and

使用所述训练集的子集来训练每个粗类别的精细CNN，所述子集不包括具有与所述粗类别不相关联的精细类别的数据。A fine CNN for each coarse category is trained using a subset of the training set that does not include data with fine categories not associated with the coarse category.

示例5根据示例1至4中任一合适项所述的设备，其中：Example 5. The apparatus of any suitable one of Examples 1 to 4, wherein:

所述预训练模块还被配置为：将用于在粗类别之间进行区分的CNN与用于在精细类别之间进行区分的每个CNN进行组合，以形成分层深CNN(HD-CNN)；以及The pre-training module is also configured to combine the CNN for distinguishing between coarse categories with each CNN for distinguishing between fine categories to form a Hierarchical Deep CNN (HD-CNN) ;as well as

所述系统还包括微调模块，所述微调模块被配置为对所述HD-CNN进行微调。The system also includes a fine-tuning module configured to fine-tune the HD-CNN.

示例6根据示例5所述的系统，其中，所述微调模块被配置为使用包括以下各项的操作来对所述HD-CNN进行微调：Example 6 The system of example 5, wherein the fine-tuning module is configured to fine-tune the HD-CNN using operations comprising:

使用学习因子开始所述微调；start said fine-tuning using a learning factor;

通过使用所述学习因子迭代一系列训练批次来训练所述HD-CNN；training the HD-CNN by iterating over a series of training batches using the learning factors;

在每次迭代之后，将所述训练批次的训练误差与阈值进行比较；After each iteration, comparing the training error of the training batch with a threshold;

基于所述比较来确定所述训练批次的训练误差低于阈值；以及determining, based on the comparison, that the training error for the training batch is below a threshold; and

响应于确定所述训练批次的训练误差低于阈值，修改所述学习因子。The learning factor is modified in response to determining that the training error for the training batch is below a threshold.

示例7根据示例5或示例6所述的系统，其中，所述微调模块被配置为使用包括以下各项的操作来对所述HD-CNN进行微调：Example 7 The system of example 5 or example 6, wherein the fine-tuning module is configured to fine-tune the HD-CNN using operations comprising:

在对每个所述CNN的评估中应用时间稀疏元素，以在精细类别之间进行区分。A temporal sparsity element is applied in the evaluation of each of the described CNNs to differentiate between fine-grained categories.

示例8根据示例1到7中任一合适项所述的系统，其中，包括分类数据的所述数据集包括：分类图像。Example 8 The system of any suitable one of Examples 1 to 7, wherein the data set comprising classified data comprises classified images.

示例9一种方法，包括：Example 9 A method comprising:

识别多个粗类别，所述粗类别的数量少于精细类别的数量；identifying a plurality of coarse categories, the number of coarse categories being less than the number of fine categories;

训练用于在粗类别之间进行区分的基础卷积神经网络(CNN)，所述基础CNN由机器的处理器实现；以及training an underlying convolutional neural network (CNN) for distinguishing between coarse categories, the underlying CNN being implemented by the machine's processor; and

训练每个粗类别的精细CNN，所述粗类别的精细CNN用于在与所述粗类别相关联的精细类别之间进行区分；training a fine CNN for each coarse category for discriminating between fine categories associated with the coarse category;

接收对数据进行分类的请求；receive requests to classify data;

示例10根据示例9的方法，还包括：Example 10 The method according to Example 9, further comprising:

示例11根据示例9或示例10所述的方法，还包括：使用拉普拉斯特征映射获取精细类别的低维特征表示。Example 11 The method according to Example 9 or Example 10, further comprising: using a Laplacian feature map to obtain a low-dimensional feature representation of a fine category.

示例12根据示例9到11中任一项所述的方法，其中，所述训练每个粗类别的精细CNN包括：Example 12 The method according to any one of Examples 9 to 11, wherein said training a fine CNN for each coarse category comprises:

示例13根据示例9到12中任一项所述的方法，还包括：Example 13. The method of any one of Examples 9-12, further comprising:

将所述基础CNN与每个精细CNN进行组合以形成分层深CNN(HD-CNN)；以及Combining the base CNN with each fine CNN to form a Hierarchical Deep CNN (HD-CNN); and

对所述HD-CNN进行微调。Fine-tune the HD-CNN.

示例14根据示例9到13中任一项所述的方法，其中，对所述HD-CNN的微调包括：Example 14 The method of any one of Examples 9 to 13, wherein fine-tuning the HD-CNN comprises:

示例15根据示例9到14中任一项所述的方法，其中，所述微调模块对所述HD-CNN的微调包括：Example 15 The method according to any one of Examples 9 to 14, wherein the fine-tuning of the HD-CNN by the fine-tuning module includes:

示例16根据权利要求9所述的方法，其中，包括分类数据的所述数据集包括：分类图像。Example 16. The method of claim 9, wherein the data set comprising classified data comprises classified images.

示例17一种承载指令的机器可读介质，所述指令能够由机器的处理器执行以执行根据示例9到16中任一项所述的方法。Example 17 A machine-readable medium carrying instructions executable by a processor of the machine to perform the method of any one of Examples 9-16.

Claims

1. a kind of system, including：

Thick classification identification module, is configured as：

Access includes the data set of grouped data, and the grouped data has multiple fine classifications；

Multiple thick classifications are recognized, the quantity of the thick classification is less than the quantity of fine classification；And

For each fine classification, it is determined that associated thick classification；

Pre-training module, is configured as：

Train the basic convolutional neural networks (CNN) for being made a distinction between thick classification；And

The fine CNN of each thick classification of training, the fine CNN of the thick classification are used for associated with the thick classification fine Made a distinction between classification；And

Sort module, is configured as：

Receive the request classified to data；

Using the basic CNN, the thick classification of the data is determined；

Using the fine CNN of identified thick classification, the fine classification of the data is determined；And

In response to the request, the fine classification of the data is sent.

2. system according to claim 1, wherein, the thick classification identification module is additionally configured to：

The data set is divided into training set and value collects；

The first CNN models are trained using the training set；And

The confusion matrix of the first CNN models is generated using described value collection；Wherein

Determine that associated thick classification includes for each fine classification：Affine propagation algorithm is applied to the confusion matrix.

3. system according to claim 2, wherein, the thick classification identification module is additionally configured to：

The low-dimensional character representation of fine classification is obtained using laplacian eigenmaps.

4. system according to claim 2, wherein, the training module is configured with including the operation of the following To train the fine CNN of each thick classification：

The 2nd CNN models are trained using the training set；

The fine CNN of each thick classification is generated according to the 2nd CNN；And

Train the fine CNN of each thick classification using the subset of the training set, the subset do not include having with it is described thick The data of the unconnected fine classification of classification.

5. system according to claim 1, wherein,

The pre-training module is additionally configured to：By the CNN for being made a distinction between thick classification and in fine classification Between each CNN for making a distinction be combined, to form layering depth CNN (HD-CNN)；And

The system also includes fine setting module, and the fine setting module is configured as being finely adjusted the HD-CNN.

6. system according to claim 5, wherein, the fine setting module is configured with including the operation of the following To be finely adjusted to the HD-CNN：

Start the fine setting using Studying factors；

The HD-CNN is trained by using a series of training batches of the Studying factors iteration；

After each iteration, the training error of the training batch is compared with threshold value；

Based on it is described compare determine it is described training batch training error be less than threshold value；And

In response to determining that the training error of the training batch is less than threshold value, the Studying factors are changed.

7. system according to claim 5, wherein, the fine setting module is configured with including the operation of the following To be finely adjusted to the HD-CNN：

The sparse element of application time in the assessment to each CNN, to be made a distinction between fine classification.

8. system according to claim 1, wherein, including the data set of grouped data includes：Classification chart picture.

9. a kind of computer implemented method, including：

Multiple thick classifications are recognized, the quantity of the thick classification is less than the quantity of fine classification；

Train the basic convolutional neural networks (CNN) for being made a distinction between thick classification；

The fine CNN of each thick classification of training, the fine CNN of the thick classification are used for associated with the thick classification fine Made a distinction between classification；

Receive the request classified to data；

Using the basic CNN, the thick classification of the data is determined；

In response to the request, the fine classification of the data is sent.

10. method according to claim 9, in addition to：

The data set is divided into training set and value collects；

The first CNN models are trained using the training set；And

11. method according to claim 10, in addition to：The low-dimensional of fine classification is obtained using laplacian eigenmaps Character representation.

12. method according to claim 10, wherein, the fine CNN of each thick classification of training includes：

The 2nd CNN models are trained using the training set；

13. method according to claim 9, in addition to：

The basic CNN is combined to form layering depth CNN (HD-CNN) with each fine CNN；And

The HD-CNN is finely adjusted.

14. method according to claim 13, wherein, the fine setting to the HD-CNN includes：

Start the fine setting using Studying factors；

15. method according to claim 13, wherein, fine setting of the fine setting module to the HD-CNN includes：

16. method according to claim 9, wherein, including the data set of grouped data includes：Classification chart picture.

17. a kind of non-transitory machine readable media, realize there is instruction thereon, the instruction can by machine computing device Include the operation of the following to perform：

Train the basic convolutional neural networks (CNN) for being made a distinction between thick classification, the CNN by machine processor Realize；

Receive the request classified to data；

Using the basic CNN, the thick classification of the data is determined；

In response to the request, the fine classification of the data is sent.

18. a kind of machine readable media for carrying instruction, the instruction can be completed according to power by the computing device of machine Profit requires the method any one of 9 to 16.