JP6397144B2

JP6397144B2 - Business discovery from images

Info

Publication number: JP6397144B2
Application number: JP2017561856A
Authority: JP
Inventors: キアン・ユ; リロン・ヤチヴ; マーティン・クリスチャン・スタンプ; ヴィネイ・ダモダール・シェット; クリスチャン・シェゲディー; ドゥミトル・アーハン; サチャ・クリストフ・アーノウド
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2015-08-07
Filing date: 2016-08-04
Publication date: 2018-09-26
Anticipated expiration: 2036-08-04
Also published as: GB2554238A; DE202016007816U1; GB2554238B; CN107690657A; GB201717115D0; DE112016001830T5; WO2017027321A1; US20170039457A1; KR101856120B1; US9594984B2; EP3332355A1; CN107690657B; JP2018524678A; KR20170122836A

Description

関連出願の相互参照
本出願は、2015年8月7日に出願された米国特許出願第14/821,128号の継続出願であり、その開示は参照により本明細書に組み込まれる。 CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of US patent application Ser. No. 14 / 821,128 filed Aug. 7, 2015, the disclosure of which is incorporated herein by reference.

今日インターネット上で入手可能な地理位置に基づいたストリートレベルの多数の写真は、正確な地図を作成することに役立つ、人工構造物を検出および監視するためのユニークな機会を提供する。そのような構造物の例には、レストラン、衣料品店、ガソリンスタンド、薬局、コインランドリーなどのローカル事業が含まれ得る。人気のある検索エンジンのローカルな関連クエリを通じてそのような事業を検索することに消費者の高い関心がある。そのようなローカル事業が存在するかどうかを世界規模で正確に特定することは、簡単な仕事ではない。 Numerous street-level photos based on the geographic location available today on the Internet provide a unique opportunity to detect and monitor man-made structures that help create accurate maps. Examples of such structures may include local businesses such as restaurants, clothing stores, gas stations, pharmacies, coin laundry. There is a high consumer interest in searching for such businesses through local related queries of popular search engines. Accurately identifying whether such a local business exists globally is not an easy task.

本開示の態様は、方法を提供する。本方法は、1つまたは複数のコンピューティングデバイスを使用して、トレーニング画像のセットおよびトレーニング画像内の1つまたは複数の事業店舗位置を識別するデータを使用してディープニューラルネットワークをトレーニングするステップであって、ディープニューラルネットワークが、各トレーニング画像上に第1の複数のバウンディングボックスを出力する、ステップと、1つまたは複数のコンピューティングデバイスを使用して、第1の画像を受信するステップと、1つまたは複数のコンピューティングデバイスおよびディープニューラルネットワークを使用して、第1の画像を評価するステップと、1つまたは複数のコンピューティングデバイスおよびディープニューラルネットワークを使用して、第1の画像内の事業店舗位置を識別する第2の複数のバウンディングボックスを生成するステップとを含む。 Aspects of the present disclosure provide a method. The method comprises using one or more computing devices to train a deep neural network using a set of training images and data identifying one or more business store locations within the training images. A deep neural network outputting a first plurality of bounding boxes on each training image; and receiving the first image using one or more computing devices; Evaluating the first image using one or more computing devices and deep neural networks, and using one or more computing devices and deep neural networks in the first image Identify business locations And generating a second plurality of bounding boxes that.

一例では、本方法はまた、1つまたは複数のコンピューティングデバイスおよびディープニューラルネットワークを使用して、識別された事業店舗位置の各々において事業情報を検出するステップと、1つまたは複数のコンピューティングデバイスを使用して、第2の複数のバウンディングボックス内の各バウンディングボックスからの情報を、バウンディングボックスによって識別される事業店舗位置において検出された事業情報に追加することによって、事業情報のデータベースを更新するステップとを含む。この例では、本方法はまた、1つまたは複数のコンピューティングデバイスを使用して、事業情報を求めるユーザからの要求を受信するステップと、1つまたは複数のコンピューティングデバイスを使用して、更新されたデータベースから要求された事業情報を取り出すステップとを含む。 In one example, the method also includes detecting business information at each of the identified business store locations using one or more computing devices and a deep neural network, and the one or more computing devices. To update the business information database by adding information from each bounding box in the second plurality of bounding boxes to the business information detected at the business store location identified by the bounding box. Steps. In this example, the method also includes receiving a request from a user for business information using one or more computing devices, and updating using one or more computing devices. Retrieving the requested business information from the requested database.

別の例では、第2の複数のバウンディングボックスは、2つの別個の事業店舗位置を識別する第1の画像内に並べて配置された2つのバウンディングボックスを含む。一例では、ディープニューラルネットワークをトレーニングするステップは、所与のトレーニング画像の一部に粗いスライディングウィンドウを適用するステップと、所与のトレーニング画像の一部の位置に基づいて1つまたは複数のバウンディングボックスを除去するステップとをさらに含む。別の例では、第2の複数のバウンディングボックスを生成するステップはまた、第1の画像の一部に粗いスライディングウィンドウを適用するステップと、所与のトレーニング画像の一部の位置に基づいて1つまたは複数のバウンディングボックスを除去するステップとを含む。 In another example, the second plurality of bounding boxes includes two bounding boxes arranged side by side in a first image that identifies two distinct business store locations. In one example, training a deep neural network includes applying a coarse sliding window to a portion of a given training image and one or more bounding boxes based on the location of the portion of the given training image. And a step of removing. In another example, generating the second plurality of bounding boxes also includes applying a coarse sliding window to a portion of the first image and 1 based on the location of the portion of the given training image. Removing one or more bounding boxes.

さらに別の例では、ディープニューラルネットワークをトレーニングするステップはまた、バウンディングボックスが事業店舗の画像を含む可能性を表す、バウンディングボックスごとの信頼スコアを決定するステップと、設定されたしきい値未満の信頼スコアを有するバウンディングボックスに対応するバウンディングボックスを除去するステップとを含む。さらなる例では、第2の複数のバウンディングボックスを生成するステップはまた、バウンディングボックスが事業店舗の画像を含む可能性を表す、バウンディングボックスごとの信頼スコアを決定するステップと、設定されたしきい値未満の信頼スコアを有するバウンディングボックスに対応するバウンディングボックス位置を除去するステップとを含む。別の例では、ディープニューラルネットワークをトレーニングするステップはまた、事後分類(post-classification)を使用するステップを含み、第2の複数のバウンディングボックスを生成するステップは、事後分類を使用するステップをさらに備える。 In yet another example, training the deep neural network also includes determining a confidence score for each bounding box that represents the likelihood that the bounding box contains business store images, and less than a set threshold. Removing a bounding box corresponding to the bounding box having a confidence score. In a further example, generating the second plurality of bounding boxes also includes determining a confidence score for each bounding box that represents the likelihood that the bounding box includes an image of the business store, and a set threshold value. Removing a bounding box position corresponding to a bounding box having a confidence score of less than. In another example, training the deep neural network also includes using post-classification, and generating the second plurality of bounding boxes further includes using post-classification. Prepare.

さらなる例では、第2の複数のバウンディングボックスを生成するステップはまた、所与のバウンディングボックスが事業店舗を含む確率を計算するステップと、計算された確率に基づいて、第2の複数のバウンディングボックスをランク付けするステップと、ランク付けに基づいて、1つまたは複数のバウンディングボックスを除去するステップとを含む。さらに別の例では、第2の複数のバウンディングボックスを生成するステップはまた、識別された事業店舗位置の見通しを妨げる第2の複数のバウンディングボックス内のオブジェクトを除去するステップを含む。別の例では、トレーニング画像および第1の画像はパノラマである。 In a further example, generating the second plurality of bounding boxes also includes calculating a probability that the given bounding box includes a business store, and based on the calculated probability, the second plurality of bounding boxes. And removing one or more bounding boxes based on the ranking. In yet another example, generating the second plurality of bounding boxes also includes removing objects in the second plurality of bounding boxes that interfere with the prospect of identified business store locations. In another example, the training image and the first image are panoramas.

本開示の別の態様は、システムを提供する。本システムは、ディープニューラルネットワークおよび1つまたは複数のコンピューティングデバイスを含む。1つまたは複数のコンピューティングデバイスは、トレーニング画像のセットおよびトレーニング画像内の1つまたは複数の事業店舗位置を識別するデータを使用してディープニューラルネットワークをトレーニングすることであって、ディープニューラルネットワークが、各トレーニング画像上に第1の複数のバウンディングボックスを出力する、トレーニングすることと、ディープニューラルネットワークにおいて第1の画像を受信することと、ディープニューラルネットワークを使用して、第1の画像を評価することと、ディープニューラルネットワークを使用して、第1の画像内の事業店舗位置を識別する第2の複数のバウンディングボックスを生成することとを行うように構成される。 Another aspect of the present disclosure provides a system. The system includes a deep neural network and one or more computing devices. One or more computing devices are training a deep neural network using a set of training images and data identifying one or more business store locations within the training images, where the deep neural network Output a first plurality of bounding boxes on each training image, train, receive the first image in the deep neural network, and evaluate the first image using the deep neural network And generating a second plurality of bounding boxes that identify the business store location in the first image using the deep neural network.

一例では、1つまたは複数のコンピューティングデバイスはまた、所与のトレーニング画像の一部に粗いスライディングウィンドウを適用することによってディープニューラルネットワークをトレーニングすることと、所与のトレーニング画像の一部の位置に基づいて1つまたは複数のバウンディングボックスを除去することとを行うように構成される。別の例では、1つまたは複数のコンピューティングデバイスはまた、第1の画像の一部に粗いスライディングウィンドウを適用することと、所与のトレーニング画像の一部の位置に基づいて1つまたは複数のバウンディングボックスを除去することとによって、第2の複数のバウンディングボックスを生成するように構成される。 In one example, one or more computing devices can also train a deep neural network by applying a coarse sliding window to a portion of a given training image and the location of a portion of the given training image. And removing one or more bounding boxes based on. In another example, the one or more computing devices also apply one or more coarse sliding windows to a portion of the first image and one or more based on the location of the portion of the given training image. Is configured to generate a second plurality of bounding boxes.

さらに別の例では、1つまたは複数のコンピューティングデバイスはまた、バウンディングボックスが事業店舗の画像を含む可能性を表す、バウンディングボックスごとの信頼スコアを決定することと、設定されたしきい値未満の信頼スコアを有するバウンディングボックスに対応するバウンディングボックスを除去することとによって、ディープニューラルネットワークをトレーニングするように構成される。さらなる例では、1つまたは複数のコンピューティングデバイスはまた、バウンディングボックスが事業店舗の画像を含む可能性を表す、バウンディングボックスごとの信頼スコアを決定することと、設定されたしきい値未満の信頼スコアを有するバウンディングボックスに対応するバウンディングボックス位置を除去することとによって、第2の複数のバウンディングボックスを生成するように構成される。別の例では、1つまたは複数のコンピューティングデバイスはまた、事後分類を使用することによってディープニューラルネットワークをトレーニングすることと、事後分類を使用することによって第2の複数のバウンディングボックスを生成することとを行うように構成されるように構成される。 In yet another example, one or more computing devices may also determine a confidence score for each bounding box that represents the likelihood that the bounding box will contain business store images, and less than a set threshold. The deep neural network is configured to be trained by removing bounding boxes corresponding to bounding boxes having a confidence score of. In a further example, the one or more computing devices may also determine a confidence score for each bounding box that represents the likelihood that the bounding box includes an image of the business store, and a confidence that is below a set threshold. The second plurality of bounding boxes are configured to be generated by removing bounding box positions corresponding to the bounding box having the score. In another example, one or more computing devices can also train a deep neural network by using posterior classification and generate a second plurality of bounding boxes by using posterior classification. It is comprised so that it may perform.

さらなる例では、1つまたは複数のコンピューティングデバイスはまた、所与のバウンディングボックスが事業店舗を含む確率を計算することと、計算された確率に基づいて、第2の複数のバウンディングボックスをランク付けすることと、ランク付けに基づいて、1つまたは複数のバウンディングボックスを除去することとを行うことによって、第2の複数のバウンディングボックスを生成するように構成される。さらに別の例では、1つまたは複数のコンピューティングデバイスはまた、識別された事業店舗位置の見通しを妨げる第2の複数のバウンディングボックス内のオブジェクトを除去することによって、第2の複数のバウンディングボックスを生成するように構成される。 In a further example, one or more computing devices may also calculate a probability that a given bounding box includes a business store and rank a second plurality of bounding boxes based on the calculated probability. And removing the one or more bounding boxes based on the ranking to generate a second plurality of bounding boxes. In yet another example, the one or more computing devices may also remove the second plurality of bounding boxes by removing objects in the second plurality of bounding boxes that impede visibility of the identified business store location. Is configured to generate

本開示のさらなる態様は、プログラムのコンピュータ可読命令が記憶された非一時的有形コンピュータ可読記憶媒体を提供する。本命令は、1つまたは複数のコンピューティングデバイスによって実行されると、1つまたは複数のコンピューティングデバイスに方法を実行させる。本方法は、トレーニング画像のセットおよびトレーニング画像内の1つまたは複数の事業店舗位置を識別するデータを使用してディープニューラルネットワークをトレーニングするステップであって、ディープニューラルネットワークが、各トレーニング画像上に第1の複数のバウンディングボックスを出力する、ステップと、ディープニューラルネットワークにおいて第1の画像を受信するステップと、ディープニューラルネットワークを使用して、第1の画像を評価するステップと、ディープニューラルネットワークを使用して、第1の画像内の事業店舗位置を識別する第2の複数のバウンディングボックスを生成するステップとを含む。 A further aspect of the present disclosure provides a non-transitory tangible computer readable storage medium having stored thereon computer readable instructions of a program. The instructions, when executed by one or more computing devices, cause the one or more computing devices to perform the method. The method comprises training a deep neural network using a set of training images and data identifying one or more business store locations within the training image, wherein the deep neural network is on each training image. Outputting a first plurality of bounding boxes; receiving a first image in the deep neural network; evaluating the first image using the deep neural network; and deep neural network Using to generate a second plurality of bounding boxes that identify business store locations in the first image.

本開示の態様による例示的なシステムの機能図である。FIG. 3 is a functional diagram of an example system in accordance with aspects of the present disclosure. 図1の例示的なシステムの絵図である。FIG. 2 is a pictorial diagram of the exemplary system of FIG. 本開示の態様による例示的な図である。FIG. 4 is an exemplary diagram according to aspects of the disclosure. 本開示の態様による別の例示的な図である。FIG. 4 is another exemplary diagram according to an aspect of the present disclosure. 本開示の態様による例示的な入力および出力を示す図である。FIG. 6 illustrates exemplary inputs and outputs in accordance with aspects of the present disclosure. 本開示の態様による例示的な流れ図である。4 is an exemplary flow diagram in accordance with aspects of the present disclosure.

概要
本技術は、画像内の異なる事業店舗を識別するバウンディングボックスを自動的に生成することに関する。言い換えれば、単一の畳み込みネットワーク評価を使用して、複数のバウンディングボックスをそれらの信頼スコアとともに直接予測することができる。畳み込みニューラルネットワークおよび事後分類におけるディープラーニングを使用することによって、他の方法よりも高い精度と速度でパノラマ画像内の店舗を識別することができる。事業店舗の正確な検出とセグメント化は、後処理時に特定の事業に関する情報を抽出する機会を提供する。たとえば、識別される事業に関する情報を提供するためにテキストおよび画像が抽出されてもよく、場合によっては、それは事業の位置をより正確に決定するために使用されてもよい。 Overview The present technology relates to automatically generating a bounding box that identifies different business stores in an image. In other words, a single convolution network evaluation can be used to predict multiple bounding boxes directly along with their confidence scores. By using convolutional neural networks and deep learning in post-classification, stores in a panoramic image can be identified with higher accuracy and speed than other methods. Accurate business segment detection and segmentation provides an opportunity to extract information about a specific business during post-processing. For example, text and images may be extracted to provide information about the identified business, and in some cases it may be used to more accurately determine the location of the business.

ストリートレベルの写真から任意の事業店舗を抽出することは難しい問題である。複雑さは、事業カテゴリおよび地域間の店舗の外観の高度なクラス内変動性、店舗の物理的規模の本質的な曖昧さ、都市部において互いに隣接している事業、ならびに世界中に店舗が出現するというスケールそのものに由来する。これらの要因によって、これは人間のアノテータにとっても曖昧な作業になる。ノイズ、被写体ぶれ、遮蔽、照明変化、鏡面反射、遠近感、地理位置特定エラーなどの画像取得要因が、この問題の複雑さにさらに寄与する。おそらく世界中に数億の企業が存在し、何十億というストリートレベルの画像が存在する可能性がある。この問題の規模と企業が入れ替わる率を考慮すると、手作業によるアノテーションは法外な作業であり、持続可能な解決策ではない。自動化された手法には、世界中の企業を合理的な時間枠で検出するための実行時効率が非常に求められる。 Extracting arbitrary business stores from street-level photos is a difficult problem. Complexity is a high degree of intra-class variability in store appearance across business categories and regions, intrinsic ambiguity of store physical scale, businesses adjacent to each other in urban areas, and stores appearing around the world It comes from the scale itself. These factors make this an ambiguous task for human annotators. Image acquisition factors such as noise, subject blur, occlusion, illumination changes, specular reflection, perspective, and geolocation errors further contribute to the complexity of this problem. There are probably hundreds of millions of companies around the world and billions of street-level images. Given the size of this problem and the rate at which companies change, manual annotation is an outrageous task and not a sustainable solution. Automated techniques are very demanding on run-time efficiency to detect companies around the world in a reasonable time frame.

事業店舗を検出することは、画像から利用可能な事業リスティングを抽出するために、複数のステップからなるプロセスの中でまず最も重要なステップである。店舗を正確に検出することによって、店舗の地理位置特定、テキストのOCR、事業名および他の属性の抽出、カテゴリ分類などの、さらなる下流処理が可能になる。 Detecting business stores is the first and most important step in the multi-step process for extracting available business listings from images. Accurate store detection allows further downstream processing such as store geolocation, text OCR, business name and other attribute extraction, and categorization.

画像内の事業店舗を検出するために、畳み込みニューラルネットワークが使用されてもよい。畳み込みネットワークは、結合されたパラメータを有するノードのセットを含むニューラルネットワークである。利用可能なトレーニングデータのサイズおよび計算能力の可用性の増加を、区分的線形ユニットおよびドロップアウトトレーニングなどのアルゴリズム進歩と組み合わせることによって、多くのコンピュータビジョンタスクの大幅な改善がもたらされる。多くのタスクのために今日利用できるような巨大なデータセットでは、過学習(overfitting)は問題ではなく、ネットワークのサイズの増加によって、テストの精度が向上する。コンピューティングリソースの最適な使用が制限要因になる。この目的のために、ディープニューラルネットワークの分散されたスケーラブルな実装形態が使用されてもよい。 A convolutional neural network may be used to detect business stores in the image. A convolutional network is a neural network that includes a set of nodes with coupled parameters. Combining increased size of available training data and availability of computational power with algorithmic advances such as piecewise linear units and dropout training results in significant improvements in many computer vision tasks. For large data sets that are available today for many tasks, overfitting is not a problem, and increasing the size of the network improves test accuracy. Optimal use of computing resources becomes a limiting factor. For this purpose, a distributed and scalable implementation of deep neural networks may be used.

従来、オブジェクト検出は、画像内の関心のあるオブジェクトを徹底的に検索することによって行われる。そのような手法は、その位置におけるオブジェクトの存在に対応する確率マップを生成する。次いで、非最大抑制または平均シフトベースの手法のいずれかによるこの確率マップの後処理により、離散的な検出結果を生成する。徹底的な検索の計算上の複雑さに対抗するために、いくつかの提案を生成するために画像セグメンテーション技法を使用する選択的検索は、検索するパラメータの数を大幅に削減する場合がある。 Conventionally, object detection is performed by exhaustively searching for objects of interest in the image. Such an approach generates a probability map corresponding to the presence of the object at that location. A post-processing of this probability map by either non-maximum suppression or an average shift based approach is then generated to produce discrete detection results. To counter the computational complexity of exhaustive search, selective search that uses image segmentation techniques to generate several suggestions may significantly reduce the number of parameters to search.

本明細書で開示される技術は、最終的な検出スコアを割り当てるためにディープニューラルネットワークを使用し、ピクセルから離散バウンディングボックスまでの完全に学習された手法を採用する。エンドツーエンドで学習された手法は、多数の提案と信頼を同時に予測するために、単一のネットワークを使用して提案生成と後処理を統合するという利点がある。この手法の信頼出力だけに依存することによって高品質の結果を生成してもよいが、最も信頼性の高い提案に対して特別な専用事後分類器ネットワークを実行することによって精度をさらに押し上げてもよい。この追加分の事後分類段階を用いても、この技術は前世代よりも数桁も高速になり得る。 The technique disclosed herein uses a deep neural network to assign a final detection score and employs a fully learned approach from pixels to discrete bounding boxes. The end-to-end learned approach has the advantage of integrating proposal generation and post-processing using a single network to simultaneously predict multiple proposals and trust. Relying solely on the confidence output of this approach may produce high quality results, but it may be further improved by implementing a special dedicated post-classifier network for the most reliable proposals. Good. Even with this additional post-classification step, this technique can be orders of magnitude faster than the previous generation.

ディープニューラルネットワークをトレーニングするために、トレーニング画像のセットおよび1つまたは複数の事業店舗位置を識別するデータが使用されてもよい。トレーニング画像を使用することによって、ディープニューラルネットワークは、第1の複数のバウンディングボックスを各自の信頼スコアとともに出力してもよい。各バウンディングボックスの信頼スコアは、バウンディングボックスが事業店舗の画像を含む可能性を表してもよい。各バウンディングボックスは、事業店舗位置とマッチングされてもよい。トレーニングにおいて、マルチクロップ評価(multi-crop evaluation)とも呼ばれる粗いスライディングウィンドウを使用してトレーニング画像が評価されてもよい。ディープニューラルネットワークをさらにトレーニングするために、マルチクロップ評価の結果を改良するために事後分類が適用されてもよい。事後分類は、所与のバウンディングボックスが事業店舗を含む確率を計算することを含んでもよい。 Data that identifies a set of training images and one or more business store locations may be used to train a deep neural network. By using the training image, the deep neural network may output the first plurality of bounding boxes along with their respective confidence scores. The confidence score for each bounding box may represent the likelihood that the bounding box includes an image of the business store. Each bounding box may be matched with a business store location. In training, training images may be evaluated using a coarse sliding window, also called multi-crop evaluation. In order to further train the deep neural network, post-classification may be applied to improve the results of the multi-crop evaluation. Subsequent classification may include calculating the probability that a given bounding box includes a business store.

トレーニングされたディープニューラルネットワークは、評価されるべき画像を受信してもよい。画像の特徴は、マルチクロップ評価および事後分類を使用して識別および評価されてもよい。評価に基づいて、ディープニューラルネットワークは、可能な事業店舗位置を識別する第2の複数のバウンディングボックスを生成してもよく、各バウンディングボックスは、1つの事業店舗のみの画像を含んでもよい。 A trained deep neural network may receive an image to be evaluated. Image features may be identified and evaluated using multi-crop evaluation and post-classification. Based on the evaluation, the deep neural network may generate a second plurality of bounding boxes that identify possible business store locations, and each bounding box may include an image of only one business store.

例示的なシステム
図1および図2は、上述した特徴が実装されることが可能な例示的なシステム100を含む。これは、本開示の範囲、または本明細書に記載された特徴の有用性を限定するものとみなされるべきではない。この例では、システム100は、1つまたは複数のコンピューティングデバイス110、120、130、および140ならびにストレージシステム150を含むことができる。コンピューティングデバイス110の各々は、1つまたは複数のプロセッサ112、メモリ114、および典型的には汎用コンピューティングデバイスに存在する他の構成要素を含むことができる。コンピューティングデバイス110のメモリ114は、1つまたは複数のプロセッサ112によって実行可能な命令116を含む、1つまたは複数のプロセッサ112によってアクセス可能な情報を記憶することができる。 Exemplary System FIGS. 1 and 2 include an exemplary system 100 in which the features described above can be implemented. This should not be considered as limiting the scope of the present disclosure or the usefulness of the features described herein. In this example, system 100 can include one or more computing devices 110, 120, 130, and 140 and storage system 150. Each of the computing devices 110 may include one or more processors 112, memory 114, and other components that typically reside in general purpose computing devices. The memory 114 of the computing device 110 may store information accessible by the one or more processors 112, including instructions 116 executable by the one or more processors 112.

メモリ114はまた、プロセッサによって取り出され、操作され、または記憶され得るデータ118を含むことができる。データは画像を含んでもよい。画像は、パノラマ画像でもよく、視野が180度よりも大きい、たとえば最大で360度の画像でもよい。さらに、パノラマ画像は、球形またはほぼ球形であってもよい。画像は、各画像内の各事業店舗の位置に関する情報に関連付けられた様々な事業店舗を表してもよい。この情報は、単一の事業店舗を表すピクセルの範囲を特定することができ、たとえば、いくつかの画像は、各事業店舗がある画像上に位置するバウンディングボックスを含んでもよい。これらの画像のいくつかは、トレーニング画像として識別されてもよい。事業店舗の位置に関する情報に関連付けられていない他の画像もまた、メモリに記憶されてもよい。データはまた、各事業店舗の地理位置特定を含んでもよい。メモリは、ハードドライブ、メモリカード、ROM、RAM、DVD、CD-ROM、書込み可能メモリ、および読出し専用メモリなどの、プロセッサによってアクセス可能な情報を記憶することができる任意の非一時的なタイプのものであり得る。 The memory 114 can also include data 118 that can be retrieved, manipulated, or stored by the processor. The data may include an image. The image may be a panoramic image or an image having a field of view larger than 180 degrees, for example, a maximum of 360 degrees. Further, the panoramic image may be spherical or nearly spherical. The image may represent various business stores associated with information regarding the location of each business store in each image. This information can identify a range of pixels that represent a single business store, for example, some images may include a bounding box located on an image where each business store is located. Some of these images may be identified as training images. Other images that are not associated with information about the location of the business store may also be stored in the memory. The data may also include the geolocation of each business store. Memory can be any non-transitory type that can store information accessible by the processor, such as hard drives, memory cards, ROM, RAM, DVD, CD-ROM, writable memory, and read-only memory. Can be a thing.

命令116は、1つまたは複数のプロセッサによって、機械コードなどの直接実行される、またはスクリプトなどの間接的に実行される命令の任意のセットであり得る。これに関して、「命令」、「アプリケーション」、「ステップ」、および「プログラム」という用語は、本明細書では交換可能に使用され得る。命令は、プロセッサによる直接処理のためのオブジェクトコード形式で、あるいは要求に応じて解釈されるか、または事前にコンパイルされた、スクリプトまたは独立したソースコードモジュールの集合を含む任意の他のコンピューティングデバイス言語で記憶されてもよい。命令は、コンピューティングデバイス110などの1つまたは複数のコンピューティングデバイスにディープニューラルネットワークとして動作させる命令を含んでもよい。命令の機能、方法、およびルーチンについては、以下でより詳細に説明する。 The instructions 116 may be any set of instructions that are executed directly, such as machine code, or indirectly, such as a script, by one or more processors. In this regard, the terms “instructions”, “applications”, “steps”, and “programs” may be used interchangeably herein. The instructions are in object code form for direct processing by the processor, or any other computing device that includes a set of scripts or independent source code modules that are interpreted on demand or pre-compiled It may be stored in a language. The instructions may include instructions that cause one or more computing devices, such as computing device 110, to operate as a deep neural network. Instruction functions, methods, and routines are described in more detail below.

データ118は、命令116に従って、1つまたは複数のプロセッサ112によって取り出され、記憶され、または変更され得る。たとえば、本明細書に記載される主題は、任意の特定のデータ構造によって制限されないが、データは、コンピュータレジスタに、多くの異なるフィールドおよびレコード、またはXML文書を有するテーブルとしてリレーショナルデータベースに記憶され得る。データはまた、これに限定されないが、バイナリ値、ASCII、またはユニコードなどの、任意のコンピューティングデバイスで読取り可能なフォーマットでフォーマットされ得る。さらに、データは、数字、記述テキスト、専用コード、ポインタ、他のネットワーク位置などの他のメモリに記憶されたデータへの参照、または関連データを計算するための機能によって使用される情報などの関連情報を識別するために十分な任意の情報を備えることができる。 Data 118 may be retrieved, stored, or modified by one or more processors 112 in accordance with instructions 116. For example, the subject matter described herein is not limited by any particular data structure, but data can be stored in a relational database as a table with many different fields and records, or XML documents, in computer registers. . The data may also be formatted in any computing device readable format such as, but not limited to, binary values, ASCII, or Unicode. In addition, data can be related, such as numbers, descriptive text, dedicated codes, pointers, references to data stored in other memories such as network locations, or information used by functions to calculate related data Any information sufficient to identify the information can be provided.

1つまたは複数のプロセッサ112は、市販のCPUなどの任意の従来のプロセッサであり得る。あるいは、プロセッサは、特定用途向け集積回路("ASIC")または他のハードウェアベースのプロセッサなどの専用コンポーネントであり得る。必須ではないが、コンピューティングデバイス110の1つまたは複数は、ビデオの復号、ビデオフレームと画像とのマッチング、ビデオの変形、変形されたビデオの符号化などの、特定のコンピューティングプロセスをより速くまたはより効率的に実行するための特化したハードウェアコンポーネントを含んでもよい。 The one or more processors 112 may be any conventional processor such as a commercially available CPU. Alternatively, the processor may be a dedicated component such as an application specific integrated circuit ("ASIC") or other hardware-based processor. Although not required, one or more of the computing devices 110 may make certain computing processes faster, such as video decoding, video frame and image matching, video transformation, and modified video encoding. Or it may include specialized hardware components for more efficient execution.

図1は、コンピューティングデバイス110のプロセッサ、メモリ、および他の要素を同じブロック内にあるものとして機能的に示すが、プロセッサ、コンピュータ、コンピューティングデバイス、またはメモリは、実際には、同一の物理的な筐体内に収容されてもされなくてもよい、複数のプロセッサ、コンピュータ、コンピューティングデバイス、またはメモリを備えることができる。たとえば、メモリは、コンピューティングデバイス110の筐体とは異なる筐体に配置されたハードドライブでもよく、他の記憶媒体でもよい。したがって、プロセッサ、コンピュータ、コンピューティングデバイス、またはメモリへの参照は、並行して動作しても動作しなくてもよいプロセッサ、コンピュータ、コンピューティングデバイス、またはメモリの集合への参照を含むと理解される。たとえば、コンピューティングデバイス110は、負荷分散サーバファームとして動作するサーバコンピューティングデバイスを含んでもよい。さらに、以下で説明されるいくつかの機能は、単一のプロセッサを有する単一のコンピューティングデバイス上で行われるように示されているが、本明細書に記載される主題の様々な態様は、たとえばネットワーク160を介して情報を通信する、複数のコンピューティングデバイスによって実装され得る。 Although FIG. 1 functionally illustrates the processor, memory, and other elements of the computing device 110 as being in the same block, the processor, computer, computing device, or memory may actually be the same physical A plurality of processors, computers, computing devices, or memory may or may not be housed in a typical enclosure. For example, the memory may be a hard drive arranged in a case different from the case of the computing device 110, or may be another storage medium. Thus, a reference to a processor, computer, computing device or memory is understood to include a reference to a collection of processors, computers, computing devices or memory that may or may not operate in parallel. The For example, the computing device 110 may include a server computing device that operates as a load balancing server farm. Moreover, although some functions described below are illustrated as being performed on a single computing device having a single processor, various aspects of the subject matter described herein are not limited to May be implemented by multiple computing devices that communicate information over a network 160, for example.

コンピューティングデバイス110の各々は、ネットワーク160の異なるノードにあってよく、ネットワーク160の他のノードと直接的かつ間接的に通信することができる。図1〜図2にはいくつかのコンピューティングデバイスしか示されていないが、典型的なシステムは多数の接続されたコンピューティングデバイスを含むことができ、異なるコンピューティングデバイスのそれぞれはネットワーク160の異なるノードにあることが理解されるべきである。ネットワーク160は、モデルの複数の層を使用するディープニューラルネットワークであってもよく、下位層の出力は、より高いレベルの層の出力を構築するために使用される。本明細書で説明するネットワーク160および介在ノードは、ネットワークがインターネット、ワールドワイドウェブ、特定のイントラネット、ワイドエリアネットワーク、またはローカルネットワークの一部となり得るように、様々なプロトコルおよびシステムを使用して相互接続することができる。ネットワークは、イーサネット(登録商標)、WiFi、およびHTTPなどの標準通信プロトコル、1つまたは複数の企業に独占的であるプロトコル、ならびに前述の様々な組合せを利用することができる。上述のように情報が送信または受信されるときにある種の利点が取得されるが、本明細書に記載される主題の他の態様は、情報の任意の特定の送信方法に限定されない。 Each of the computing devices 110 may be at different nodes of the network 160 and can communicate directly and indirectly with other nodes of the network 160. Although only a few computing devices are shown in FIGS. 1-2, a typical system can include a number of connected computing devices, each of the different computing devices being different on the network 160. It should be understood that it is in the node. The network 160 may be a deep neural network that uses multiple layers of the model, and the lower layer outputs are used to build higher level layer outputs. The network 160 and intervening nodes described herein interact with each other using various protocols and systems so that the network can be part of the Internet, the World Wide Web, a specific intranet, a wide area network, or a local network. Can be connected. The network can utilize standard communication protocols such as Ethernet, WiFi, and HTTP, protocols that are proprietary to one or more companies, and various combinations of the foregoing. Although certain advantages are obtained when information is transmitted or received as described above, other aspects of the subject matter described herein are not limited to any particular method of transmitting information.

一例として、コンピューティングデバイス110の各々は、ネットワークを介してストレージシステム150ならびにコンピューティングデバイス120、130、および140と通信することができるウェブサーバを含んでもよい。たとえば、サーバコンピューティングデバイス110の1つまたは複数は、コンピューティングデバイス120、130、または140のディスプレイ122、132、または142などのディスプレイ上でユーザ220、230、または240などのユーザに情報を送信および提示するために、ネットワーク160を使用してもよい。これに関して、コンピューティングデバイス120、130、および140はクライアントコンピューティングデバイスとみなされてもよく、本明細書に記載された機能のすべてまたは一部を実行してもよい。 As an example, each of the computing devices 110 may include a web server that can communicate with the storage system 150 and the computing devices 120, 130, and 140 via a network. For example, one or more of the server computing devices 110 send information to a user such as a user 220, 230, or 240 on a display such as the display 122, 132, or 142 of the computing device 120, 130, or 140 And network 160 may be used to present and present. In this regard, computing devices 120, 130, and 140 may be considered client computing devices and may perform all or part of the functionality described herein.

クライアントコンピューティングデバイスの各々は、上述の1つまたは複数のプロセッサ、メモリ、および命令を備えるサーバコンピューティングデバイス110と同様に構成されてもよい。各クライアントコンピューティングデバイス120、130、または140は、ユーザ220、230、240による使用を意図したパーソナルコンピューティングデバイスであってもよく、中央処理装置(CPU)、データおよび命令を記憶するメモリ(たとえば、RAMおよび内部ハードドライブ)、ディスプレイ122、132、または142などのディスプレイ(たとえば、スクリーン、タッチスクリーン、プロジェクタ、テレビ、または情報を表示するように動作可能な他のデバイスを有するモニタ)、およびユーザ入力デバイス124(たとえば、マウス、キーボード、タッチスクリーン、またはマイクロフォン)などのパーソナルコンピューティングデバイスに関連して通常使用されるすべての構成要素を有する。クライアントコンピューティングデバイスはまた、静止画像をキャプチャするため、またはビデオストリームを録画するためのカメラ126、スピーカ、ネットワークインタフェースデバイス、およびこれらの要素を互いに接続するために使用されるすべての構成要素を含んでもよい。 Each of the client computing devices may be configured similarly to the server computing device 110 that includes one or more processors, memory, and instructions as described above. Each client computing device 120, 130, or 140 may be a personal computing device intended for use by users 220, 230, 240, such as a central processing unit (CPU), memory that stores data and instructions (e.g., RAM and internal hard drive), displays such as displays 122, 132, or 142 (e.g., screens, touch screens, projectors, televisions, or monitors with other devices operable to display information), and users It has all the components normally used in connection with a personal computing device such as an input device 124 (eg, a mouse, keyboard, touch screen, or microphone). The client computing device also includes a camera 126 for capturing still images or recording video streams, speakers, network interface devices, and all components used to connect these elements to each other. But you can.

クライアントコンピューティングデバイス120、130、および140はそれぞれ、フルサイズのパーソナルコンピューティングデバイスを備えてもよいが、それらは、代わりにインターネットなどのネットワークを介してサーバとデータをワイヤレスに交換できるモバイルコンピューティングデバイスを備えてもよい。ほんの一例として、クライアントコンピューティングデバイス120は、モバイル電話でもよく、インターネットを介して情報を取得することができるワイヤレス対応のPDA、タブレットPC、またはネットブックなどのデバイスでもよい。別の例では、クライアントコンピューティングデバイス130は、ヘッドマウントコンピューティングシステムであってもよい。一例として、ユーザは、小型のキーボード、キーパッド、マイクロフォンを使用して、カメラによる視覚信号を使用して、またはタッチスクリーンを使用して情報を入力してもよい。 Client computing devices 120, 130, and 140 may each comprise a full-size personal computing device, but they can instead exchange data wirelessly with a server over a network such as the Internet. A device may be provided. By way of example only, the client computing device 120 may be a mobile phone or a device such as a wireless enabled PDA, tablet PC, or netbook that can obtain information over the Internet. In another example, client computing device 130 may be a head mounted computing system. As an example, a user may enter information using a small keyboard, keypad, microphone, using visual signals from a camera, or using a touch screen.

メモリ114と同様に、ストレージシステム150は、ハードドライブ、メモリカード、ROM、RAM、DVD、CD-ROM、書込み可能メモリ、および読出し専用メモリなどのサーバコンピューティングデバイス110によってアクセス可能な情報を記憶することができる、任意のタイプのコンピュータ化されたストレージであり得る。さらに、ストレージシステム150は、データが同じまたは異なる地理的位置に物理的に配置されてもよい複数の異なるストレージデバイスに記憶される分散ストレージシステムを含んでもよい。ストレージシステム150は、図1に示されるようにネットワーク160を介してコンピューティングデバイスに接続されてもよく、および/またはコンピューティングデバイス110〜140(図示せず)のいずれかのメモリに直接接続されてもよく、それに組み込まれてもよい。 Similar to memory 114, storage system 150 stores information accessible by server computing device 110, such as hard drives, memory cards, ROM, RAM, DVD, CD-ROM, writable memory, and read-only memory. It can be any type of computerized storage that can. Further, the storage system 150 may include a distributed storage system in which data is stored on a plurality of different storage devices that may be physically located at the same or different geographic locations. Storage system 150 may be connected to computing devices via network 160 as shown in FIG. 1 and / or directly connected to the memory of any of computing devices 110-140 (not shown). Or may be incorporated into it.

ストレージシステム150はまた、画像を記憶してもよい。これらの画像は、とりわけ1つまたは複数の事業店舗を表すパノラマ画像、または、視野が180度よりも大きい、たとえば最大で360度の画像などの様々なタイプの画像を含んでもよい。いくつかの例では、所与の画像は、その所与の画像内の各事業店舗の位置を識別する店舗情報と関連付けられてもよい。たとえば、所与の画像の店舗情報は、所与の画像内の1つまたは複数の事業店舗の形状に対応する1つまたは複数の店舗および/または画像座標に対応する、所与の画像内の画素の範囲を含んでもよい。一例として、店舗情報は、画像内の各事業店舗位置に対応するバウンディングボックスによって表されてもよい。以下に説明するように、画像のうちの少なくともいくつかはトレーニング画像として識別されてもよい。ストレージシステム150はまた、いくつかの事業店舗の地理位置情報、または地理的位置に関する情報を含んでもよい。 The storage system 150 may also store images. These images may include various types of images, such as panoramic images that represent one or more business stores, or images that have a field of view greater than 180 degrees, for example, up to 360 degrees. In some examples, a given image may be associated with store information that identifies the location of each business store within the given image. For example, store information for a given image may be stored in a given image corresponding to one or more stores and / or image coordinates corresponding to the shape of one or more business stores in the given image. It may include a range of pixels. As an example, the store information may be represented by a bounding box corresponding to each business store position in the image. As described below, at least some of the images may be identified as training images. The storage system 150 may also include geolocation information for some business stores, or information about the geographic location.

例示的な方法
図3に示されるように、ディープニューラルネットワーク310は、トレーニング画像320のセットを使用することによってトレーニングされてもよい。これらのトレーニング画像320は、トレーニング画像のセット内の1つまたは複数の事業店舗位置を識別する店舗情報330に関連付けられるストレージシステム150の画像を含んでもよい。上述したように、店舗情報は、関連付けられた画像内に位置する1つまたは複数の事業店舗を表す画像の画素の範囲であってもよい。 Exemplary Method As shown in FIG. 3, the deep neural network 310 may be trained by using a set of training images 320. These training images 320 may include images of the storage system 150 associated with store information 330 that identifies one or more business store locations within the set of training images. As described above, the store information may be a range of pixels of an image that represents one or more business stores located within the associated image.

トレーニング画像内の事業店舗の位置のための事業店舗位置データを使用してトレーニング画像を評価するために、ディープニューラルネットワーク310が使用されてもよい。ディープニューラルネットワークを使用する画像に、マルチクロップ評価とも呼ばれる粗いスライディングウィンドウが適用されてもよい。各ウィンドウの位置は、画像の「クロップ」と見なされてもよい。高密度スライディングウィンドウ手法と比較して、粗いスライディングウィンドウの手法は、スライディングウィンドウの数を数桁減少させる。たとえば、粗いスライディングウィンドウは、高密度スライディングウィンドウが使用する可能性が高い30万個のウィンドウの代わりに、360度の全パノラマ画像について100個のウィンドウを評価してもよい。単一のクロップ評価もうまくいくかもしれないが、高解像度のパノラマ画像では、単一のパノラマ画像の低解像度バージョンからより小さい店舗を確実に検出することはできない。したがって、粗いスライディングウィンドウを使用することによって、実際に店舗の検出品質を高めることができる。 Deep neural network 310 may be used to evaluate the training image using business store location data for the location of the business store in the training image. A coarse sliding window, also called multi-crop evaluation, may be applied to an image using a deep neural network. The position of each window may be considered the “crop” of the image. Compared to the high-density sliding window method, the coarse sliding window method reduces the number of sliding windows by several orders of magnitude. For example, a coarse sliding window may evaluate 100 windows for a full 360 degree panoramic image instead of 300,000 windows that are likely to be used by a high density sliding window. Although a single crop evaluation may work, high resolution panoramic images cannot reliably detect smaller stores from a low resolution version of a single panoramic image. Therefore, the detection quality of the store can be actually improved by using a rough sliding window.

ディープニューラルネットワーク310のトレーニング中に、画像上に重ね合わされた第1の複数のバウンディングボックス340が識別されてもよい。バウンディングボックスは、画像内の画像の一部を識別する画像上の長方形であってもよい。バウンディングボックスはまた、任意の他の多角形または形状であってもよい。各バウンディングボックスの形状およびサイズは、各事業店舗位置の形状に依存してもよい。 During training of the deep neural network 310, a first plurality of bounding boxes 340 superimposed on the image may be identified. The bounding box may be a rectangle on the image that identifies a portion of the image in the image. The bounding box may also be any other polygon or shape. The shape and size of each bounding box may depend on the shape of each business store location.

各バウンディングボックス340は、各画像に関連付けられたデータに基づいて、事業店舗位置とマッチングされてもよい。画像内で互いに直接隣接する事業店舗位置が別々のバウンディングボックスによって定義されるように、1つの事業店舗位置のみが単一のバウンディングボックスに囲まれてもよい。マッチングは、事業店舗位置と所与のバウンディングボックスとの間のエッジ重みがボックスの重なりの量に関連する、最大の重み付けマッチングを含んでもよい。たとえば、エッジ重みは、交差点のサイズを所与のバウンディングボックスと事業店舗位置との和集合のサイズで割ったものとして定義されるジャカード類似度係数であってもよい。 Each bounding box 340 may be matched with a business store location based on data associated with each image. Only one business store location may be surrounded by a single bounding box so that business store locations directly adjacent to each other in the image are defined by separate bounding boxes. Matching may include maximum weight matching where the edge weight between the business store location and a given bounding box is related to the amount of box overlap. For example, the edge weight may be a Jacquard similarity coefficient defined as the size of an intersection divided by the size of the union of a given bounding box and business store location.

ディープニューラルネットワークによって評価されるトレーニング画像のセットについて、第1の複数のバウンディングボックスの各バウンディングボックスの座標を決定するために、ディープニューラルネットワークが使用されてもよい。座標は、事業店舗位置に対応する画像座標などの画像座標であってもよい。画像座標は、画像自体に対する座標系を使用してバウンディングボックスの位置を定義してもよく、または緯度/経度座標または任意の他の地理位置特定座標であってもよい。 For the set of training images evaluated by the deep neural network, the deep neural network may be used to determine the coordinates of each bounding box of the first plurality of bounding boxes. The coordinates may be image coordinates such as image coordinates corresponding to the business store position. The image coordinates may define the position of the bounding box using a coordinate system relative to the image itself, or may be latitude / longitude coordinates or any other geolocation coordinates.

信頼スコア350は、バウンディングボックス340ごとに計算されてもよい。第1の複数のバウンディングボックスの各バウンディングボックスの信頼スコア350は、バウンディングボックスが事業店舗の画像を含む可能性を表してもよい。 A confidence score 350 may be calculated for each bounding box 340. The confidence score 350 for each bounding box of the first plurality of bounding boxes may represent the likelihood that the bounding box includes an image of the business store.

トレーニング画像を評価するとき、バウンディングボックスは特定の状況下では除去されてもよい。たとえば、設定されたしきい値未満の信頼スコアを有するバウンディングボックスが除去されてもよい。さらに、画像のクロップのエッジのうちの1つに隣接するバウンディングボックスは、クロップのエッジも画像のエッジでない限り、除去されてもよい。こうすることで、オブジェクトを完全には含まないバウンディングボックスを除去することができ、事業店舗の検出がより正確になる。さらに、任意の所与のクロップからの所与の距離よりも大きい任意のバウンディングボックスも同様に除去されてもよい。 When evaluating training images, the bounding box may be removed under certain circumstances. For example, bounding boxes that have a confidence score below a set threshold may be removed. Further, a bounding box adjacent to one of the crop edges of the image may be removed as long as the crop edges are not image edges. In this way, the bounding box that does not completely contain the object can be removed, and the business store can be detected more accurately. Furthermore, any bounding box that is larger than a given distance from any given crop may be removed as well.

ディープニューラルネットワーク310はまた、事後分類によってトレーニングされてもよい。事後分類の準備時に、ディープニューラルネットワークの受容野にアフィン変換が適用されてもよい。事後分類では、結果を絞り込むために、複数のクロップ評価から特定された第1の複数のバウンディングボックスがさらに分類される。言い換えれば、各バウンディングボックスが事業店舗位置を含む信頼度を高めるために、別の分類器が結果に適用される。たとえば、第2の信頼スコアがバウンディングボックスごとに計算されてもよい。所与のバウンディングボックスが事業店舗を含む確率は、計算された信頼スコアに基づいて計算されてもよい。この確率は、ディープニューラルネットワークにおける各バウンディングボックスの信頼スコアの積と、事後分類における各バウンディングボックスの信頼スコアとを合計することによって計算されてもよい。あるいは、確率は、ディープニューラルネットワークにおける信頼スコアと、所定のバウンディングボックスに対する事後分類とを乗算することによって計算されてもよい。 Deep neural network 310 may also be trained by post classification. An affine transformation may be applied to the receptive field of the deep neural network when preparing for post classification. In the post-classification, the first plurality of bounding boxes identified from the plurality of crop evaluations are further classified in order to narrow down the results. In other words, another classifier is applied to the result to increase the confidence that each bounding box includes the business store location. For example, a second confidence score may be calculated for each bounding box. The probability that a given bounding box includes a business store may be calculated based on the calculated confidence score. This probability may be calculated by summing the product of the confidence score of each bounding box in the deep neural network and the confidence score of each bounding box in the post classification. Alternatively, the probability may be calculated by multiplying the confidence score in the deep neural network by the posterior classification for a given bounding box.

確率は、設定されたしきい値よりも低い確率に関連付けられたバウンディングボックスを除去することによって、第1の複数のバウンディングボックスをフィルタリングするために使用されてもよい。また、この確率は、バウンディングボックスをランク付けするために使用されてもよい。第1の複数のバウンディングボックスは、ランク付けに基づいてフィルタリングされてもよい。たとえば、設定された数より下にランク付けされたバウンディングボックスは、除去されてもよい。 The probability may be used to filter the first plurality of bounding boxes by removing bounding boxes associated with a probability that is lower than a set threshold. This probability may also be used to rank the bounding box. The first plurality of bounding boxes may be filtered based on the ranking. For example, bounding boxes ranked below a set number may be removed.

トレーニングされている間に、ディープニューラルネットワークは、それぞれの信頼スコア350を有するトレーニング画像上に第1の複数のバウンディングボックス340を出力してもよい。バウンディングボックスごとに決定された座標および事後分類によって計算された確率が、ネットワークの出力に含まれてもよい。この情報は、後の使用のためにストレージシステム150に記憶されてもよい。 While being trained, the deep neural network may output a first plurality of bounding boxes 340 on the training image having a respective confidence score 350. Coordinates determined for each bounding box and probabilities calculated by posterior classification may be included in the output of the network. This information may be stored in the storage system 150 for later use.

トレーニングされた後、ディープニューラルネットワーク310は、図4に示されるように、1つまたは複数の画像420を評価してもよい。トレーニング画像320と同様に、画像420もストレージシステム150に記憶されてもよい。しかしながら、トレーニング画像とは異なり、画像420は、画像内の事業店舗位置を識別するデータと関連付けられていなくてもよい。画像420は、パノラマ画像でもよく、視野が180度よりも大きい、たとえば最大で360度の画像でもよい。さらに、パノラマ画像は、球形またはほぼ球形であってもよい。パノラマにおける検出は、結果として視野がより小さい画像が生じ得るリコールの損失を回避する。ディープニューラルネットワーク310を使用することによって、画像420は、画像の特徴を識別するために評価されてもよい。トレーニングに関して上述したように、評価は、マルチクロップ評価および事後分類を含んでもよい。 After being trained, the deep neural network 310 may evaluate one or more images 420 as shown in FIG. Similar to training image 320, image 420 may also be stored in storage system 150. However, unlike the training image, the image 420 may not be associated with data identifying the business store location in the image. The image 420 may be a panoramic image or an image having a field of view larger than 180 degrees, for example, a maximum of 360 degrees. Further, the panoramic image may be spherical or nearly spherical. Detection in the panorama avoids the loss of recall that can result in images with a smaller field of view. By using the deep neural network 310, the image 420 may be evaluated to identify image features. As described above with respect to training, evaluation may include multi-crop evaluation and post classification.

評価に基づいて、ディープニューラルネットワーク310は、図4に示されるように、画像内の可能な事業店舗位置を識別する第2の複数のバウンディングボックス440を生成してもよい。各バウンディングボックス440は、1つの事業店舗のみの画像を含んでもよい。したがって、隣接する事業店舗の行は、行内の各事業店舗を囲む複数のバウンディングボックスによってセグメント化されてもよい。さらに、各バウンディングボックス440は、各バウンディングボックスが事業店舗の画像を含む可能性を表す信頼スコア450と関連付けられてもよい。 Based on the evaluation, the deep neural network 310 may generate a second plurality of bounding boxes 440 that identify possible business store locations in the image, as shown in FIG. Each bounding box 440 may include an image of only one business store. Accordingly, the rows of adjacent business stores may be segmented by a plurality of bounding boxes surrounding each business store in the row. Further, each bounding box 440 may be associated with a confidence score 450 representing the likelihood that each bounding box includes an image of the business store.

図5に示されるように、画像510は、ディープニューラルネットワーク310によって評価されてもよい。その結果、ディープニューラルネットワーク310は、522、524、526、および528を含む複数のバウンディングボックスを識別してもよく、それぞれが個々の事業店舗の画像を含む。画像524、526、および528は、隣接する事業店舗を個々の店舗として識別する。 As shown in FIG. 5, the image 510 may be evaluated by the deep neural network 310. As a result, the deep neural network 310 may identify a plurality of bounding boxes including 522, 524, 526, and 528, each including an image of an individual business store. Images 524, 526, and 528 identify adjacent business stores as individual stores.

いくつかの例では、第2の複数のバウンディングボックス440は、設定されたしきい値未満の信頼スコアを有するバウンディングボックスを除去することによってフィルタリングされてもよい。追加的または代替的に、同じまたは類似の地理位置(または、言い換えれば、同じ事業店舗)に関連付けられた複数の画像にあるバウンディングボックスがマージされてもよい。これらのバウンディングボックスをマージすることは、偽陽性であるオブジェクトを除去することを含んでもよい。偽陽性の例は、事業店舗の前に一時的に停車している車両である。次いで、フィルタリングされたバウンディングボックスは、評価された画像と関連付けられ、後の使用のためにストレージシステム150に記憶されてもよい。 In some examples, the second plurality of bounding boxes 440 may be filtered by removing bounding boxes that have a confidence score below a set threshold. Additionally or alternatively, bounding boxes in multiple images associated with the same or similar geographic location (or in other words, the same business store) may be merged. Merging these bounding boxes may include removing objects that are false positives. An example of a false positive is a vehicle that is temporarily stopped in front of a business store. The filtered bounding box may then be associated with the evaluated image and stored in the storage system 150 for later use.

各バウンディングボックス440内の事業情報が検出されてもよい。光学的文字認識などの情報抽出の既知の方法が使用されてもよい。検出された事業情報は、名前、単語、ロゴ、商品、または特定のバウンディングボックス内で視認できるその他のアイテムを含んでもよい。次いで、その事業情報が事業情報のデータベースに追加されてもよい。データベースは、後の使用のためにストレージシステム150に記憶されてもよい。 Business information in each bounding box 440 may be detected. Known methods of information extraction such as optical character recognition may be used. The detected business information may include names, words, logos, merchandise, or other items that are visible within a particular bounding box. The business information may then be added to the business information database. The database may be stored in the storage system 150 for later use.

ユーザ220、230、240は、コンピューティングデバイス120、130、140を使用して事業情報を要求してもよい。ユーザ要求に応答して、事業情報は、ストレージシステム150内のデータベースからコンピューティングデバイス110によって取り出され、コンピューティングデバイス120、130、140に送信されてもよい。 Users 220, 230, 240 may request business information using computing devices 120, 130, 140. In response to the user request, business information may be retrieved by the computing device 110 from a database in the storage system 150 and sent to the computing devices 120, 130, 140.

図6は、ディープニューラルネットワーク310において実行されてもよい、上述した態様のうちのいくつかによる例示的な流れ図600である。しかしながら、説明された特徴は、異なる構成を有する様々なシステムのいずれによって実装されてもよい。さらに、本方法に含まれる動作は、記載された正確な順序で実行される必要はない。むしろ、様々な動作が異なる順序でまたは同時に処理されてもよく、動作が追加または省略されてもよい。 FIG. 6 is an exemplary flowchart 600 according to some of the above-described aspects that may be performed in the deep neural network 310. However, the described features may be implemented by any of a variety of systems having different configurations. Further, the actions involved in the method need not be performed in the exact order described. Rather, the various operations may be processed in different orders or simultaneously, and operations may be added or omitted.

ブロック610において、トレーニング画像のセットおよびトレーニング画像内の1つまたは複数の事業店舗位置を識別するデータを使用して、ディープニューラルネットワークがトレーニングされてもよい。ブロック620において、ディープニューラルネットワークにおいて第1の画像が受信されてもよい。第1の画像は、ブロック630においてディープニューラルネットワークによって評価されてもよい。ブロック640において、第1の画像内の事業店舗位置を識別する2つ以上のバウンディングボックスのセットが生成されてもよい。 At block 610, the deep neural network may be trained using a set of training images and data identifying one or more business store locations within the training images. At block 620, a first image may be received at the deep neural network. The first image may be evaluated by the deep neural network at block 630. At block 640, a set of two or more bounding boxes that identify business store locations in the first image may be generated.

上述の機能を使用すると、他の方法を使用しては達成できない速度および精度で、画像の大きいデータベース内の店舗を識別することができる。具体的には、この機能によって、さらなる分析および/または処理を必要とするヒートマップや確率マップなどの中間出力を使用することなしに、バウンディングボックスが分析の直接出力となることができる。さらに、この機能によって、隣接する事業店舗が、1つの店舗として識別される代わりに適切にセグメント化されるようになる。店舗の画像の周りにバウンディングボックスを生成するために選択的検索を使用することと比較して、上述の方法を使用するための計算コストははるかに低く、速度ははるかに速い。上述の方法を使用することは、ヒートマップを有意義なバウンディングボックスに変換するために広範な後処理を必要とし、ラベルノイズの影響を受けやすい、トレーニングされたヒートマップ手法を使用することよりも、効率が良い場合がある。ディープ畳み込みニューラルネットワークをトレーニングし、使用する、記載された方法は、正確な結果を生成しながら作り出すために相当の労力を要するであろうものを自動化する。バウンディングボックスが生成された後、利用可能な事業リスティング、より正確な店舗の地理位置、および画像内で利用可能なより多くの情報を自動的に抽出するために、画像が使用されてもよい。 Using the functions described above, stores in a large database of images can be identified with speed and accuracy that cannot be achieved using other methods. Specifically, this feature allows the bounding box to be a direct output of the analysis without using intermediate outputs such as heat maps and probability maps that require further analysis and / or processing. In addition, this feature allows adjacent business stores to be appropriately segmented instead of being identified as one store. Compared to using selective search to generate bounding boxes around store images, the computational cost for using the above method is much lower and the speed is much faster. Using the method described above requires extensive post-processing to convert the heat map into a meaningful bounding box, rather than using a trained heat map approach that is sensitive to label noise. It may be efficient. The described method of training and using deep convolutional neural networks automates what would require significant effort to produce while producing accurate results. After the bounding box is generated, the image may be used to automatically extract available business listings, more accurate store geolocation, and more information available in the image.

本明細書における発明は、特定の実施形態を参照して説明されたが、これらの実施形態は、本発明の原理および用途の単なる例示であることを理解されたい。したがって、例示的な実施形態に対して多くの変更が行われてもよく、添付の特許請求の範囲によって定義される本発明の趣旨および範囲から逸脱することなしに他の構成が考案されてもよいことを理解されたい。 Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. Accordingly, many modifications may be made to an exemplary embodiment, and other configurations may be devised without departing from the spirit and scope of the invention as defined by the appended claims. Please understand that it is good.

100 システム
110 コンピューティングデバイス、サーバコンピューティングデバイス
112 プロセッサ
114 メモリ
116 命令
118 データ
120 コンピューティングデバイス、クライアントコンピューティングデバイス
122 ディスプレイ
124 ユーザ入力デバイス
126 カメラ
130 コンピューティングデバイス、クライアントコンピューティングデバイス
132 ディスプレイ
140 コンピューティングデバイス
142 ディスプレイ
150 ストレージシステム
160 ネットワーク
220 ユーザ
230 ユーザ
240 ユーザ
310 ディープニューラルネットワーク
320 トレーニング画像
330 事業店舗位置、データ
340 バウンディングボックス
350 信頼スコア
420 画像
440 バウンディングボックス
450 信頼スコア
510 画像
522 バウンディングボックス
524 バウンディングボックス
526 バウンディングボックス
528 バウンディングボックス
600 流れ図 100 system
110 Computing devices, server computing devices
112 processor
114 memory
116 instructions
118 data
120 computing devices, client computing devices
122 display
124 User input devices
126 camera
130 Computing devices, client computing devices
132 displays
140 Computing devices
142 display
150 storage systems
160 network
220 users
230 users
240 users
310 deep neural network
320 training images
330 Business location, data
340 bounding box
350 confidence score
420 images
440 bounding box
450 confidence score
510 images
522 Bounding box
524 bounding box
526 bounding box
528 bounding box
600 Flow chart

Claims

Training a deep neural network using a set of training images and data identifying one or more business store locations within the training images using one or more computing devices, comprising: The deep neural network outputs a first plurality of bounding boxes on each training image; and
Receiving a first image using the one or more computing devices;
Evaluating the first image using the one or more computing devices and the deep neural network;
Generating a second plurality of bounding boxes identifying two or more business store locations in the first image using the one or more computing devices and the deep neural network. ,Method.

Detecting business information at each of the identified business store locations using the one or more computing devices and the deep neural network;
Using the one or more computing devices, the information from each bounding box in the second plurality of bounding boxes is converted into business information detected at the business store location identified by the bounding box. Updating the business information database by adding;
Receiving a request from a user for business information using the one or more computing devices;
2. The method of claim 1, further comprising: using the one or more computing devices to retrieve the requested business information from the updated database.

The method of claim 1, wherein the second plurality of bounding boxes includes two bounding boxes arranged side by side in the first image that identifies two distinct business store locations.

Training the deep neural network comprises
Applying a coarse sliding window to a portion of a given training image;
The method of claim 1, further comprising: removing one or more bounding boxes based on the position of the portion of the given training image.

Generating the second plurality of bounding boxes;
Applying a coarse sliding window to a portion of the first image;
The method of claim 1, further comprising: removing one or more bounding boxes based on the position of the portion of the first image.

Training the deep neural network comprises
Determining a confidence score for each bounding box that represents the likelihood that the bounding box includes an image of a business store;
The method of claim 1, further comprising: removing a bounding box corresponding to a bounding box having a confidence score that is less than a set threshold.

Generating the second plurality of bounding boxes;
Determining a confidence score for each bounding box that represents the likelihood that the bounding box includes an image of a business store;
The method of claim 1, further comprising: removing a bounding box position corresponding to a bounding box having a confidence score less than a set threshold.

Training the deep neural network further comprises using post-classification;
Generating the second plurality of bounding boxes further comprises using posterior classification;
The method of claim 1.

Generating the second plurality of bounding boxes;
Calculating the probability that a given bounding box includes a business store;
Ranking the second plurality of bounding boxes based on the calculated probabilities;
The method of claim 1, further comprising: removing one or more bounding boxes based on the ranking.

The method of claim 1, wherein generating the second plurality of bounding boxes further comprises removing objects in the second plurality of bounding boxes that impede visibility of the identified business store location. Method.

The method of claim 1, wherein the training image and the first image are panoramic.

Deep neural network,
One or more computing devices,
Training the deep neural network using a set of training images and data identifying one or more business store locations within the training images, wherein the deep neural network includes a first on each training image; Output multiple bounding boxes in one, training,
Receiving a first image in the deep neural network;
Using the deep neural network to evaluate the first image;
One or more computing devices configured to use the deep neural network to generate a second plurality of bounding boxes that identify business store locations in the first image; A system comprising:

The one or more computing devices are:
Applying a coarse sliding window to a portion of a given training image;
The system of claim 12, further configured to train the deep neural network by removing one or more bounding boxes based on the position of the portion of the given training image. .

The one or more computing devices are:
Applying a coarse sliding window to a portion of the first image;
13. The apparatus of claim 12, further configured to generate the second plurality of bounding boxes by removing one or more bounding boxes based on the position of the portion of the first image. The system described.

The one or more computing devices are:
Determining a confidence score for each bounding box representing the likelihood that the bounding box includes an image of a business store;
The system of claim 12, further configured to train the deep neural network by removing bounding boxes corresponding to bounding boxes having confidence scores below a set threshold.

The one or more computing devices are:
Determining a confidence score for each bounding box representing the likelihood that the bounding box includes an image of a business store;
13. The apparatus of claim 12, further configured to generate the second plurality of bounding boxes by removing bounding box positions corresponding to bounding boxes having a confidence score less than a set threshold. System.

The one or more computing devices are:
Training the deep neural network by using post-classification;
13. The system of claim 12, further configured to generate the second plurality of bounding boxes by using post classification.

The one or more computing devices are:
Calculating the probability that a given bounding box contains a business store;
Ranking the second plurality of bounding boxes based on the calculated probabilities;
13. The system of claim 12, further configured to generate the second plurality of bounding boxes by removing one or more bounding boxes based on the ranking.

The one or more computing devices generate the second plurality of bounding boxes by removing objects in the second plurality of bounding boxes that impede visibility of the identified business store location. The system of claim 12, further configured as follows.

A non-transitory tangible computer readable storage medium having stored thereon computer readable instructions of a program, wherein the instructions are executed by one or more computing devices when the instructions are executed by the one or more computing devices. And the method comprises
Training a deep neural network using a set of training images and data identifying one or more business store locations within the training image, wherein the deep neural network includes a first on each training image. Outputting a plurality of bounding boxes
Receiving a first image in the deep neural network;
Using the deep neural network to evaluate the first image;
Using the deep neural network to generate a second plurality of bounding boxes that identify business store locations in the first image.