JP7542165B1

JP7542165B1 - Information processing device, information processing method, and information processing program

Info

Publication number: JP7542165B1
Application number: JP2024048036A
Authority: JP
Inventors: 聡哉石井田; 佑磨松岡; 誠一郎森; 昇平大村
Original assignee: SoftBank Corp
Current assignee: SoftBank Corp
Priority date: 2024-03-25
Filing date: 2024-03-25
Publication date: 2024-08-29
Anticipated expiration: 2044-03-25

Abstract

【課題】アノテーションを付す対象となる目的対象物の領域を推定することができる情報処理装置、情報処理方法及び情報処理プログラムを提供する。【解決手段】情報処理装置は、画像を取得する取得部と、目的対象物についての指示内容を受け付ける受付部と、受付部によって受け付けた指示内容に基づいて、取得部によって取得した画像に記録される目的対象物の領域を推定する推定部と、推定部によって推定した目的対象物の領域を出力するよう制御する出力制御部と、を備える。【選択図】図１[Problem] To provide an information processing device, information processing method, and information processing program capable of estimating the area of a target object to be annotated. [Solution] The information processing device includes an acquisition unit that acquires an image, a reception unit that receives instructions about the target object, an estimation unit that estimates the area of the target object recorded in the image acquired by the acquisition unit based on the instructions received by the reception unit, and an output control unit that controls the area of the target object estimated by the estimation unit to be output. [Selected Figure] Figure 1

Description

特許法第３０条第２項適用公開者ソフトバンク株式会社公開場所ｈｔｔｐｓ：／／ｗｗｗ．ｄｏｃｓ．ｔｓｋ－ｐｆ．ｃｏｍ／ａｎｎｏｔａｔｉｏｎ－ｓｃｒｅｅｎ／ａｉ－ａｎｎｏｔａｔｉｏｎ／ｐｒｏｍｐｔ－ａｎｎｏｔａｔｉｏｎ公開日令和５年７月５日 Applicable under Article 30, Paragraph 2 of the Patent Act Publisher: SoftBank Corp. Published at: https://www.docs.tsk-pf.com/annotation-screen/ai-annotation/prompt-annotation Published on: July 5, 2023

特許法第３０条第２項適用公開者ソフトバンク株式会社公開場所東京ビックサイト公開日令和５年７月１１日第１回ＡＩＷｏｒｌｄ夏Patent Law Article 30, Paragraph 2 applied Distributor SoftBank Corp. Distributed at Tokyo Big Sight Distributed on July 11, 2023 1st AI World Summer

特許法第３０条第２項適用公開者ソフトバンク株式会社公開場所ｈｔｔｐ：／／ａｎｓ．ｂｂ．ｌｏｃａｌ／＃／ｄｅｔａｉｌ／ｄ２１ｃ９２５ｂ８０ｆｆｃｄｄｃｃｆ９８１８９５３ｅ７２９８８ｄ？ｋｅｙｗｏｒｄ＝ｔａｓｕｋｉ％２０ａｎｎｏｔａｔｉｏｎ公開日令和５年９月５日Patent Law Article 30, Paragraph 2 applies. Distributor: SoftBank Corp. Disclosure location: http://ans. bb. local/#/detail/d21c925b80ffcddccf9818953e72988d? keyword=tasuki%20annotation Disclosure date: September 5, 2023

特許法第３０条第２項適用公開者ソフトバンク株式会社公開場所ｈｔｔｐｓ：／／ｔｓｋ－ｐｆ．ｃｏｍ／ａｎｎｏｔａｔｉｏｎ－ｔｏｏｌ公開日令和５年１１月２日Patent Law Article 30, Paragraph 2 applicable Publisher SoftBank Corp. Publication location https://tsk-pf. com/annotation-tool Publication date November 2, 2023

特許法第３０条第２項適用公開者株式会社アイスマイリー公開場所ｈｔｔｐｓ：／／ａｉｓｍｉｌｅｙ．ｃｏ．ｊｐ／ｐｒｏｄｕｃｔ／ｓｂｉ＿ｔａｓｕｋｉ－ａｎｎｏｔａｔｉｏｎ－ｔｏｏｌ／公開日令和６年１月２６日 Patent Law Article 30, Paragraph 2 applies. Publisher: iSmiley Co., Ltd. Published at: https://aiSmiley.co.jp/product/sbi_tasuki-annotation-tool/ Published on: January 26, 2024

特許法第３０条第２項適用公開者ソフトバンク株式会社公開場所ｈｔｔｐｓ：／／ｂｉｚ．ｔｍ．ｓｏｆｔｂａｎｋ．ｊｐ／ｐｇ１２４５０－ｗｅｂ－ｄｏｃ－ｅｎｔｒｙ－ｔａｓｕｋｉ．ｈｔｍｌ公開日令和６年１月３０日Applicable under Article 30, Paragraph 2 of the Patent Act. Disclosed by SoftBank Corp. Published at https://biz.tm.softbank.jp/pg12450-web-doc-entry-tasuki.html Published on January 30, 2024

特許法第３０条第２項適用公開者ソフトバンク株式会社公開場所ｈｔｔｐｓ：／／ｗｗｗ．ｓｏｆｔｂａｎｋ．ｊｐ／ｂｉｚ／ｎｅｗｓ／ｃｌｏｕｄ／２０２４０１３０／公開日令和６年１月３０日Applicable under Article 30, Paragraph 2 of the Patent Act. Disclosed by SoftBank Corp. Disclosed at https://www.softbank.jp/biz/news/cloud/20240130/ Disclosed on January 30, 2024

本開示は、情報処理装置、情報処理方法及び情報処理プログラムに関する。 The present disclosure relates to an information processing device, an information processing method, and an information processing program.

従来、画像のアノテーションを支援する装置がある。その装置は、アノテーションの付与候補となる対象画像の複数の対象領域を、その対象画像に表れる特徴に基づいて分類して分類情報を生成する。その装置は、分類情報を可視化して、対象画像と対比可能に表示する。 Conventionally, there is a device that supports image annotation. The device classifies multiple target areas of a target image that are candidates for annotation based on the features that appear in the target image, and generates classification information. The device visualizes the classification information and displays it so that it can be compared with the target image.

特開２０２２－１３１９３７号公報JP 2022-131937 A

ところで、アノテーションを付与する作業は手間がかかるものであり、アノテーションを付与する対象を自動で特定することが求められている。 However, the task of annotating is time-consuming, and there is a need to automatically identify targets to which annotations should be added.

本開示は、アノテーションを付す対象となる目的対象物の領域を推定することができる情報処理装置、情報処理方法及び情報処理プログラムを提供する。 The present disclosure provides an information processing device, an information processing method, and an information processing program that can estimate the area of a target object to be annotated.

一態様の情報処理装置は、画像を取得する取得部と、目的対象物についての指示内容を受け付ける受付部と、受付部によって受け付けた指示内容に基づいて、取得部によって取得した画像に記録される目的対象物の領域を推定する推定部と、推定部によって推定した目的対象物の領域を出力するよう制御する出力制御部と、を備える。 An information processing device of one embodiment includes an acquisition unit that acquires an image, a reception unit that receives instructions about a target object, an estimation unit that estimates the area of the target object recorded in the image acquired by the acquisition unit based on the instructions received by the reception unit, and an output control unit that controls the area of the target object estimated by the estimation unit to be output.

本開示の情報処理装置、情報処理方法及び情報処理プログラムは、アノテーションを付す対象となる目的対象物の領域を推定することができる。 The information processing device, information processing method, and information processing program disclosed herein can estimate the area of a target object to which annotation is to be added.

一実施形態に係る情報処理装置の制御に基づいて表示する画面について説明するための図である。11 is a diagram for explaining a screen displayed based on control of an information processing device according to an embodiment. FIG. 一実施形態に係る情報処理装置について説明するためのブロック図である。FIG. 1 is a block diagram illustrating an information processing device according to an embodiment. 画像（画像情報）の一例について説明するための図である。FIG. 2 is a diagram for explaining an example of an image (image information). 図３に例示する画像（画像情報）についての指示内容（プロンプト）の一例について説明するための図である。4 is a diagram for explaining an example of an instruction content (prompt) regarding the image (image information) illustrated in FIG. 3 . FIG. 一実施形態に係る情報処理方法について説明するためのフローチャートである。1 is a flowchart illustrating an information processing method according to an embodiment.

以下、一実施形態について説明する。 One embodiment is described below.

［情報処理装置１００の概要］
まず、一実施形態に係る情報処理装置１００の概要について説明する。
図１は、一実施形態に係る情報処理装置１００の制御に基づいて表示する画面について説明するための図である。 [Overview of information processing device 100]
First, an overview of an information processing device 100 according to an embodiment will be described.
FIG. 1 is a diagram for explaining a screen displayed under the control of an information processing device 100 according to an embodiment.

情報処理装置１００は、例えば、受け付けた文字列による指示内容１４１（プロンプト）に基づいて、画像２００に記録され、その文字列に対応する物体（目的対象物）の領域を推定（特定）する推定装置（特定装置）等として構成されてもよい。また、情報処理装置１００は、推定した物体（目的対象物）の領域に対してアノテーションを付与する付与装置等として構成されてもよい。情報処理装置１００は、上述した一例の装置に限らず、種々の装置等を構成してもよい。
情報処理装置１００は、例えば、サーバ、デスクトップ、ラップトップ、タブレット及びスマートフォン等のコンピュータであってもよい。 The information processing device 100 may be configured as an estimation device (identification device) that estimates (identifies) an area of an object (target object) that is recorded in the image 200 and corresponds to a character string based on, for example, an instruction content 141 (prompt) that is a received character string. The information processing device 100 may also be configured as an annotation device that adds an annotation to the estimated area of the object (target object). The information processing device 100 is not limited to the device of the above example, and may constitute various devices.
The information processing device 100 may be a computer such as a server, a desktop, a laptop, a tablet, or a smartphone.

情報処理装置１００は、画像２００を取得する。画像２００には、１又は複数の物体が記録される。
図１に例示する場合、情報処理装置１００は、道路、及び、その道路を走行する車両（例えば、普通車２０１及びトラック２０２等の物体）を記録した画像２００を取得して表示部１３３に表示する。また、情報処理装置１００は、その情報処理装置１００に対する指示内容（プロンプト）を入力するインターフェース（例えば、チャットを用いた入力インターフェース１４０等）を表示部１３３に表示する。 The information processing device 100 acquires an image 200. In the image 200, one or more objects are recorded.
1 , the information processing device 100 acquires an image 200 recording a road and a vehicle (e.g., objects such as a passenger car 201 and a truck 202) traveling on the road, and displays it on the display unit 133. The information processing device 100 also displays an interface (e.g., an input interface 140 using chat) for inputting instructions (prompts) to the information processing device 100 on the display unit 133.

情報処理装置１００は、例えば、入力インターフェース１４０を介して、文字列（テキスト）等による指示内容１４１（プロンプト）を受け付ける。文字列は、種々の内容であってよいが、領域を推定する対象となる物体（目的対象物）の内容等であってもよい。すなわち一例として、画像２００に記録される物体として普通車２０１及びトラック２０２があり、画像２００中の普通車２０１の領域を推定する場合、情報処理装置１００は、文字列として「普通車」（指示内容）１４１を受け付ける。 The information processing device 100 receives, for example, a prompt 141 in the form of a character string (text) or the like via the input interface 140. The character string may have various contents, and may also be the contents of an object (target object) whose area is to be estimated. That is, as an example, when a passenger car 201 and a truck 202 are recorded as objects in an image 200, and the area of the passenger car 201 in the image 200 is to be estimated, the information processing device 100 receives "passenger car" (prompt) 141 as the character string.

情報処理装置１００は、上述したように受け付けた指示内容１４１に基づいて、画像２００に記録される目的対象物の領域を推定する。情報処理装置１００は、例えば、公知の物体認識処理等を始めとする種々の処理を用いることにより、画像２００に記録される、指示内容１４１に対応する物体（目的対象物）の領域を推定する。一例として、画像２００に記録される物体として普通車２０１及びトラック２０２があり、画像２００中の普通車２０１の領域を推定する場合、情報処理装置１００は、文字列として「普通車」（指示内容）１４１を受け付けると、画像２００中の普通車２０１（目的対象物）の領域を推定（特定）する。
この場合、情報処理装置１００は、推定（特定）した目的対象物の領域の大きさと、入力インターフェース１４０を介して入力された閾値とに基づいて、閾値以上となる大きさを有する目的対象物の領域を特定してもよい。 The information processing device 100 estimates the area of the target object recorded in the image 200 based on the instruction content 141 received as described above. The information processing device 100 estimates the area of the object (target object) corresponding to the instruction content 141 recorded in the image 200 by using various processes including, for example, a known object recognition process. As an example, when there are a regular car 201 and a truck 202 as objects recorded in the image 200 and the area of the regular car 201 in the image 200 is to be estimated, the information processing device 100 estimates (specifies) the area of the regular car 201 (target object) in the image 200 when it receives the character string "regular car" (instruction content) 141.
In this case, the information processing device 100 may identify a region of the target object having a size equal to or larger than a threshold value based on the size of the estimated (identified) region of the target object and a threshold value input via the input interface 140.

情報処理装置１００は、上述したように推定（特定）した目的対象物の領域を出力する。情報処理装置１００は、例えば、画像２００中の目的対象物の領域に、他の領域とは異なる色を付して表示する。図１に例示する場合では、画像２００中の目的対象物（普通車２０１）の領域に斜線を付し、他の領域（例えば、トラック２０２等）には斜線を付さず、目的対象物（普通車２０１）の領域を他の領域（例えば、トラック２０２等）とは異なる態様で示すようになっている。 The information processing device 100 outputs the area of the target object estimated (identified) as described above. For example, the information processing device 100 displays the area of the target object in the image 200 in a color different from other areas. In the example shown in FIG. 1, the area of the target object (a passenger car 201) in the image 200 is shaded and other areas (e.g., truck 202, etc.) are not shaded, so that the area of the target object (a passenger car 201) is shown in a different manner from other areas (e.g., truck 202, etc.).

また、情報処理装置１００は、上述したように受け付けた文字列の指示内容１４１（プロンプト）に基づいて、画像２００中において特定した目的対象物の領域にアノテーションを付す。すなわち、図１に例示する場合には、情報処理装置１００は、文字列（指示内容１４１（プロンプト））として入力された「普通車」に基づいて、斜線を付した目的対象物（普通車２０１）の領域を特定すると、その目的対象物（普通車２０１）の領域に「普通車」のアノテーションを付してもよい。
なお、アノテーションを付与する対象は、普通車２０１及びトラック２０２に限定されることはなく、種々の物体であってもよい。 Furthermore, the information processing device 100 annotates the region of the target object identified in the image 200 based on the instruction content 141 (prompt) of the character string received as described above. That is, in the example shown in Fig. 1, the information processing device 100 may identify the region of the target object (standard-sized car 201) marked with diagonal lines based on "standard-sized car" input as the character string (instruction content 141 (prompt)), and may annotate the region of the target object (standard-sized car 201) with "standard-sized car."
Note that the objects to which annotations are added are not limited to the passenger car 201 and the truck 202, but may be various other objects.

［情報処理装置１００の詳細］
次に、一実施形態に係る情報処理装置１００について詳細に説明する。
図２は、一実施形態に係る情報処理装置１００について説明するためのブロック図である。
図３は、画像２００（画像情報）の一例について説明するための図である。
図４は、図３に例示する画像２００（画像情報）についての指示内容（プロンプト）の一例について説明するための図である。 [Details of information processing device 100]
Next, the information processing device 100 according to an embodiment will be described in detail.
FIG. 2 is a block diagram for explaining the information processing device 100 according to an embodiment.
FIG. 3 is a diagram for explaining an example of an image 200 (image information).
FIG. 4 is a diagram for explaining an example of an instruction content (prompt) regarding the image 200 (image information) illustrated in FIG.

情報処理装置１００は、例えば、入力部１２１、通信部１３１、記憶部１３２、表示部１３３及び制御部１１０等を備える。通信部１３１、記憶部１３２及び表示部１３３は、出力部の一実施形態であってもよい。制御部１１０は、例えば、取得部１１１、受付部１１２、推定部１１３、出力制御部１１４、学習部１１５及びＡＩ部１１６等を備える。制御部１１０は、例えば、情報処理装置１００の演算処理装置等によって構成されてもよい。制御部１１０（例えば、演算処理装置等）は、例えば、記憶部１３２等に記憶される各種プログラム等を適宜読み出して実行することにより、各部（例えば、取得部１１１、受付部１１２、推定部１１３、出力制御部１１４、学習部１１５及びＡＩ部１１６等）の機能を実現してもよい。すなわち、コンピュータ実装により、各部の機能を実現してもよい。 The information processing device 100 includes, for example, an input unit 121, a communication unit 131, a storage unit 132, a display unit 133, and a control unit 110. The communication unit 131, the storage unit 132, and the display unit 133 may be an embodiment of the output unit. The control unit 110 includes, for example, an acquisition unit 111, a reception unit 112, an estimation unit 113, an output control unit 114, a learning unit 115, and an AI unit 116. The control unit 110 may be configured, for example, by a calculation processing unit of the information processing device 100. The control unit 110 (for example, a calculation processing unit, etc.) may realize the functions of each unit (for example, the acquisition unit 111, the reception unit 112, the estimation unit 113, the output control unit 114, the learning unit 115, and the AI unit 116, etc.) by, for example, appropriately reading and executing various programs stored in the storage unit 132, etc. That is, the functions of each unit may be realized by computer implementation.

入力部１２１は、例えば、グラフィカルユーザインターフェース（ＧＵＩ）等であってもよい。また、入力部１２１は、例えば、キーボード及びマウス等であってもよい。 The input unit 121 may be, for example, a graphical user interface (GUI) or the like. The input unit 121 may also be, for example, a keyboard and a mouse or the like.

通信部１３１は、例えば、情報処理装置１００の外部にある装置（外部装置）（図示せず）等との間で種々の情報の送受信が可能な通信インターフェースである。 The communication unit 131 is, for example, a communication interface capable of transmitting and receiving various information to and from a device (external device) (not shown) external to the information processing device 100.

記憶部１３２は、例えば、種々の情報及びプログラムを記憶してもよい。記憶部１３２の一例は、メモリ、ソリッドステートドライブ及びハードディスクドライブ等であってもよい。なお、記憶部１３２は、例えば、クラウド上にある記憶領域及びサーバ等であってもよい。 The storage unit 132 may store, for example, various information and programs. Examples of the storage unit 132 may be a memory, a solid state drive, a hard disk drive, etc. The storage unit 132 may be, for example, a storage area and a server on the cloud, etc.

表示部１３３は、例えば、種々の文字、記号及び画像等を表示することが可能なディスプレイである。 The display unit 133 is, for example, a display capable of displaying various characters, symbols, images, etc.

取得部１１１は、画像２００（画像情報）を取得する。
取得部１１１は、例えば、記憶部１３２に記憶される画像２００（画像情報）を取得する。
また、取得部１１１は、例えば、通信部１３１を介して、画像２００（画像情報）を外部装置（図示せず）から取得する。外部装置は、例えば、サーバ及びユーザ端末等であってもよい。ユーザ端末は、情報処理装置１００のユーザが使用する端末であり、デスクトップ、ラップトップ、タブレット及びスマートフォン等であってもよい。
また、取得部１１１は、例えば、画像２００（画像情報）が記憶される外部メモリ（図示せず）が情報処理装置１００のインターフェース（図示せず）に接続された場合、その外部メモリから画像２００（画像情報）を取得してもよい。
画像２００は、静止画又は動画であってもよい。画像２００には、１又は複数の物体が記録される。
ここで一例として図３に示すように、取得部１１１は、道路、及び、その道路を走行する車両（例えば、普通車２０１及びトラック２０２等の物体）を記録した画像２００（画像情報）を取得してもよい。 The acquisition unit 111 acquires an image 200 (image information).
The acquisition unit 111 acquires, for example, an image 200 (image information) stored in the storage unit 132 .
Furthermore, the acquisition unit 111 acquires the image 200 (image information) from an external device (not shown), for example, via the communication unit 131. The external device may be, for example, a server or a user terminal. The user terminal is a terminal used by a user of the information processing device 100, and may be a desktop, a laptop, a tablet, a smartphone, or the like.
In addition, the acquisition unit 111 may acquire the image 200 (image information) from an external memory (not shown) in which the image 200 (image information) is stored, for example, when the external memory is connected to an interface (not shown) of the information processing device 100.
Image 200 may be a still image or a video image. Image 200 captures one or more objects.
As an example, as shown in FIG. 3, the acquisition unit 111 may acquire an image 200 (image information) recording a road and a vehicle traveling on the road (e.g., objects such as a passenger car 201 and a truck 202).

受付部１１２は、例えば、入力インターフェース１４０を介して、目的対象物についての指示内容１４１（プロンプト）を受け付ける。受付部１１２は、例えば、目的対象物についての文字列（テキスト）による指示内容１４１（プロンプト）を受け付けてもよい。文字列（テキスト）は、種々の内容であってよいが、一例として、領域を推定する対象となる物体（目的対象物）の内容等であってもよい。入力インターフェース１４０は、例えば、チャットを利用して指示内容１４１（プロンプト）を受け付けるインターフェース等であってもよい。 The reception unit 112 receives, for example, instruction contents 141 (prompt) regarding the target object via the input interface 140. The reception unit 112 may receive, for example, instruction contents 141 (prompt) in the form of a character string (text) regarding the target object. The character string (text) may be of various kinds, and may be, for example, the content of an object (target object) whose area is to be estimated. The input interface 140 may be, for example, an interface that receives instruction contents 141 (prompt) using chat.

すなわち一例として図４に示すように、画像２００に記録される物体として普通車２０１及びトラック２０２があり、画像２００中の普通車２０１の領域を推定する場合、情報処理装置１００は、文字列として「普通車」（指示内容）１４１を受け付ける。情報処理装置１００は、入力インターフェース１４０に「普通車」が入力されるのに応じて、普通車２０１の領域を推定するためのチャットの返答として「推論します」等の内容１４２を表示部１３３（入力インターフェース１４０）に表示してもよい。 That is, as an example, as shown in FIG. 4, objects recorded in an image 200 include a passenger car 201 and a truck 202, and when estimating the area of the passenger car 201 in the image 200, the information processing device 100 accepts a character string "Passengers" (instruction content) 141. In response to "Passengers" being input to the input interface 140, the information processing device 100 may display content 142 such as "I will infer" on the display unit 133 (input interface 140) as a chat response for estimating the area of the passenger car 201.

同様に一例として、画像２００に記録される物体として普通車２０１及びトラック２０２があり、画像２００中のトラック２０２の領域を推定する場合、情報処理装置１００は、文字列として「トラック」（指示内容）を受け付ける。
同様に一例として、画像２００に記録される物体として普通車２０１及びトラック２０２があり、画像２００中の普通車２０１の領域及びトラック２０２の領域の両方を推定する場合、情報処理装置１００は、文字列として「普通車」及び「トラック」（複数の指示内容）を受け付けてもよい。 Similarly, as an example, objects recorded in image 200 include a passenger car 201 and a truck 202, and when estimating the area of truck 202 in image 200, information processing device 100 accepts "truck" (instruction content) as the character string.
Similarly, as an example, when objects recorded in image 200 include a passenger car 201 and a truck 202, and the area of both the passenger car 201 and the area of the truck 202 in image 200 are to be estimated, the information processing device 100 may accept the character strings "passenger car" and "truck" (multiple instruction contents).

推定部１１３は、受付部１１２によって受け付けた指示内容１４１に基づいて、取得部１１１によって取得した画像２００に記録される目的対象物の領域を推定する。推定部１１３は、例えば、公知の物体認識処理等を始めとする種々の処理を用いることにより、画像２００に記録される、指示内容１４１に対応する物体（目的対象物）の領域を推定する。
上述した一例を用いる場合、すなわち画像２００に記録される物体として普通車２０１及びトラック２０２があり、画像２００中の普通車２０１の領域を推定する場合、推定部１１３は、文字列として「普通車」（指示内容）１４１を受け付けると（図４参照）、画像２００中の普通車２０１（目的対象物）の領域を推定（特定）する。
同様に一例として、画像２００に記録される物体として普通車２０１及びトラック２０２があり、画像２００中のトラック２０２の領域を推定する場合、推定部１１３は、文字列として「トラック」（指示内容）を受け付けると、画像２００中のトラック２０２（目的対象物）の領域を推定（特定）する。
同様に一例として、画像２００に記録される物体として普通車２０１及びトラック２０２があり、画像２００中の普通車２０１の領域及びトラック２０２の領域の両方を推定する場合、情報処理装置１００は、文字列として「普通車」及び「トラック」（複数の指示内容）を受け付けると、画像２００中の普通車２０１及びトラック２０２（複数の目的対象物）の領域を推定（特定）してもよい。
推定部１１３は、推定が成功すると、入力インターフェース１４０に「成功しました」等の内容１４４のチャットの返答を表示してもよい。推定部１１３は、例えば、推定が成功しなかった場合は、「指示内容（プロンプト）を再入力して下さい」等の内容のチャットの返答を表示してもよい。 The estimation unit 113 estimates the area of the target object recorded in the image 200 acquired by the acquisition unit 111, based on the instruction content 141 accepted by the acceptance unit 112. The estimation unit 113 estimates the area of the object (target object) corresponding to the instruction content 141, recorded in the image 200, by using various processes including, for example, a known object recognition process.
When using the example described above, that is, when the objects recorded in image 200 are a passenger car 201 and a truck 202, and the area of passenger car 201 in image 200 is to be estimated, when estimation unit 113 receives a character string "passenger car" (instruction content) 141 (see Figure 4), it estimates (identifies) the area of passenger car 201 (target object) in image 200.
Similarly, as an example, when objects recorded in image 200 include a passenger car 201 and a truck 202 and the area of truck 202 in image 200 is to be estimated, when the estimation unit 113 receives the character string "truck" (instruction content), it estimates (identifies) the area of truck 202 (target object) in image 200.
Similarly, as an example, when objects recorded in image 200 include a passenger car 201 and a truck 202, and the area of both the passenger car 201 and the area of the truck 202 in image 200 are to be estimated, the information processing device 100 may, when receiving the character strings "passenger car" and "truck" (multiple instruction contents), estimate (identify) the areas of the passenger car 201 and the truck 202 (multiple target objects) in image 200.
When the estimation is successful, the estimation unit 113 may display a chat reply with content 144 such as "Successful" on the input interface 140. When the estimation is unsuccessful, the estimation unit 113 may display a chat reply with content such as "Please re-enter the prompt," for example.

推定部１１３は、目的対象物の領域として、画像２００における目的対象物の画素領域及び目的対象物の輪郭境界のうち少なくとも一方を推定してもよい。すなわち、推定部１１３は、目的対象物における、画素領域及び輪郭境界のグループのうち少なくとも一方を推定してもよい。目的対象物の画素領域は、目的対象物の輪郭の内側となる面領域であってもよい。目的対象物の輪郭境界は、目的対象物の輪郭と言い換えることができる。 The estimation unit 113 may estimate at least one of the pixel region of the target object in the image 200 and the contour boundary of the target object as the region of the target object. That is, the estimation unit 113 may estimate at least one of a group of pixel regions and contour boundaries in the target object. The pixel region of the target object may be a surface region that is inside the contour of the target object. The contour boundary of the target object can be rephrased as the contour of the target object.

推定部１１３は、目的対象物の領域の大きさに応じた値を付してもよい。推定部１１３は、例えば、上述したように推定した複数の目的対象物の領域のうち、最も大きい領域に１００の値を付し、最も小さい領域に０の値を付し、最も小さい領域から最も大きい領域までの各領域に、その領域の大きさに応じて０より大きく１００よりも小さい値を付してもよい。なお、値（値の範囲）は、上述した０から１００の範囲に限定されず、種々の値（種々の値の範囲）を付してもよい。 The estimation unit 113 may assign a value according to the size of the area of the target object. For example, the estimation unit 113 may assign a value of 100 to the largest area of the areas of the multiple target objects estimated as described above, assign a value of 0 to the smallest area, and assign a value greater than 0 and less than 100 to each area from the smallest area to the largest area according to the size of the area. Note that the values (range of values) are not limited to the range from 0 to 100 described above, and various values (ranges of values) may be assigned.

なお、上述した推定部１１３の具体的な処理内容の一例として、推定部１１３は、まず指示内容１４１（プロンプト）に対応した、画像２００中の各物体（各目的対象物）の矩形領域を検出して矩形領域画像を生成し、その後に各矩形領域画像の物体について輪郭近似を行い輪郭の座標を取得してもよい。この場合、推定部１１３は、例えば、ＡＩ等の処理によって矩形領域画像を生成してもよい。推定部１１３は、ＡＩ等の処理の一例として、物体（目的対象物）の信頼度を規定する「信頼度スコア」が入力されると、その信頼度スコアに応じて推定される物体（目的対象物）を特定し、その特定した物体（目的対象物）の周囲に矩形領域の画像を生成してもよい。なお一例として、信頼度スコアは、０から１の範囲の値等であってもよい。さらに、推定部１１３は、取得したその座標に基づいて、物体（物体の輪郭）を多角形（ポリゴン）で囲み、そのポリゴンの座標点を取得してもよい。推定部１１３は、複数の物体それぞれに対応して、上述したようにポリゴンの座標点を取得してもよい。すなわち、推定部１１３は、ポリゴンの座標点を用いて、物体の輪郭（輪郭領域）、及び、その輪郭（輪郭領域）の内側となる物体内部（物体の画素領域）のうち少なくとも一方を推定してもよい。
次に、推定部１１３は、複数の物体それぞれのポリゴン（ポリゴンの座標点で囲う領域）のうち最も大きいポリゴンの大きさと最も小さいポリゴンの大きさとに基づいて、複数のポリゴンそれぞれの大きさをスコアリングする。すなわち一例として、推定部１１３は、例えば、最も大きいポリゴンに１００の値を付し、最も小さいポリゴンに０の値を付し、複数のポリゴンそれぞれの大きさに応じて０～１００の間で値を付す。 As an example of the specific processing contents of the estimation unit 113 described above, the estimation unit 113 may first detect a rectangular area of each object (each target object) in the image 200 corresponding to the instruction contents 141 (prompt) to generate a rectangular area image, and then perform contour approximation for the object in each rectangular area image to obtain the coordinates of the contour. In this case, the estimation unit 113 may generate a rectangular area image by, for example, processing such as AI. As an example of processing such as AI, when a "reliability score" that specifies the reliability of the object (target object) is input, the estimation unit 113 may specify an object (target object) that is estimated according to the reliability score, and generate an image of a rectangular area around the specified object (target object). As an example, the reliability score may be a value in the range from 0 to 1. Furthermore, the estimation unit 113 may surround the object (object contour) with a polygon (polygon) based on the acquired coordinates, and obtain the coordinate points of the polygon. The estimation unit 113 may acquire the coordinate points of the polygon as described above for each of the multiple objects. That is, the estimation unit 113 may estimate at least one of the contour (contour region) of the object and the inside of the object (pixel region of the object) that is inside the contour (contour region) by using the coordinate points of the polygon.
Next, the estimation unit 113 scores the size of each of the polygons (areas enclosed by the coordinate points of the polygons) of the multiple objects based on the size of the largest polygon and the size of the smallest polygon. That is, as an example, the estimation unit 113 assigns a value between 0 and 100 depending on the size of each of the multiple polygons, for example, assigning a value of 100 to the largest polygon and a value of 0 to the smallest polygon.

なお、推定部１１３は、受付部１１２によって指示内容１４１を受け付けると、指示内容１４１毎（目的対象物の種類毎）に異なるスレッド（すなわち、ページ等）を生成し、各スレッド（各ページ）で１つの目的対象物の領域を生成してもよい。 When the receiving unit 112 receives the instruction content 141, the estimation unit 113 may generate a different thread (i.e., a page, etc.) for each instruction content 141 (for each type of target object) and generate an area for one target object in each thread (each page).

出力制御部１１４は、推定部１１３によって推定した目的対象物の領域を出力するよう出力部を制御する。ここで、出力部は、例えば、通信部１３１、記憶部１３２及び表示部１３３等であってもよい。 The output control unit 114 controls the output unit to output the area of the target object estimated by the estimation unit 113. Here, the output unit may be, for example, the communication unit 131, the memory unit 132, the display unit 133, etc.

一例として、出力制御部１１４は、例えば、推定部１１３によって画像２００中の複数の目的対象物（例えば、普通車２０１（又は、トラック２０２）の領域を推定した場合、画像２００中において推定した普通車２０１（又は、トラック２０２）（目的対象物）の領域の特定する態様で出力するよう出力部を制御してもよい。 As an example, when the estimation unit 113 estimates the areas of multiple target objects (e.g., passenger car 201 (or truck 202)) in the image 200, the output control unit 114 may control the output unit to output in a manner that identifies the area of the estimated passenger car 201 (or truck 202) (target object) in the image 200.

また一例として、出力制御部１１４は、例えば、推定部１１３によって画像２００中の複数の目的対象物（例えば、普通車２０１及びトラック２０２）それぞれの領域を推定した場合、入力インターフェース１４０を介して文字列（「普通車」及び「トラック」のうちの一方）が入力されると、入力された文字列に応じた目的対象物の領域、すなわち目的対象物を特定するための文字列として「普通車」（又は、「トラック」）が入力されると画像２００中の普通車２０１（又は、トラック２０２）（目的対象物）の領域を出力するよう出力部を制御してもよい。 As another example, the output control unit 114 may control the output unit to, for example, when the estimation unit 113 estimates the areas of each of a plurality of target objects (e.g., a passenger car 201 and a truck 202) in the image 200, output the area of the target object corresponding to the input character string (either "a passenger car" or "a truck") via the input interface 140, i.e., output the area of the passenger car 201 (or the truck 202) (target object) in the image 200 when "a passenger car" (or "a truck") is input as a character string for identifying the target object.

また一例として、出力制御部１１４は、例えば、推定部１１３によって画像２００中の目的対象物（例えば、普通車２０１）の領域を推定し、その後、推定部１１３によって同一画像２００中の目的対象物（例えば、トラック２０２）の領域を推定した場合、推定した目的対象物毎（普通車２０１及びトラック２０２それぞれ毎）にスレッド（例えば、普通車２０１の画面及びトラック２０２の画面）を作成して、複数のスレッドの一方（１つ）又は両方（複数）を出力するよう出力部を制御してもよい。この場合、出力制御部１１４は、例えば、入力インターフェース１４０を介して文字列（例えば、「普通車」及び「トラック」のうちの一方）が入力されると、又は、入力インターフェース１４０を介してスレッドの１つが選択されると、入力された文字列又は選択されたスレッドに応じた目的対象物（例えば、普通車２０１又はトラック２０２）の領域を出力するよう出力部を制御してもよい。すなわち、出力制御部１１４は、入力インターフェース１４０を介して、複数のスレッドについて表示又は非表示の切り替えを行ってもよい。 As another example, the output control unit 114 may, for example, when the estimation unit 113 estimates the area of a target object (e.g., a passenger car 201) in the image 200, and then the estimation unit 113 estimates the area of a target object (e.g., a truck 202) in the same image 200, create a thread (e.g., a screen of the passenger car 201 and a screen of the truck 202) for each estimated target object (each of the passenger car 201 and the truck 202), and control the output unit to output one (one) or both (multiple) of the multiple threads. In this case, the output control unit 114 may, for example, control the output unit to output the area of the target object (e.g., a passenger car 201 or a truck 202) corresponding to the input character string or the selected thread when a character string (e.g., one of "a passenger car" and "a truck") is input via the input interface 140, or when one of the threads is selected via the input interface 140. That is, the output control unit 114 may switch between displaying and hiding multiple threads via the input interface 140.

出力制御部１１４は、入力された値（閾値）以上の大きさの目的対象物の領域を出力するよう制御してもよい。出力制御部１１４は、推定部１１３によって付した値（目的対象物の領域の大きさ）と、入力インターフェース１４０を介して入力された閾値とに基づいて、閾値以上となる値（領域の大きさ）を有する目的対象物の領域を特定、特定した目的対象物の領域を出力するよう制御してもよい。
具体的な一例として、出力制御部１１４は、入力インターフェース１４０を介して入力された閾値と、各物体（各目的対象物）のポリゴンの大きさとを比較し、閾値未満の大きさとなるポリゴンに対応する各物体（各目的対象物）を非表示とするようフィルタ処理を行ってもよい。 The output control unit 114 may perform control to output a region of the target object having a size equal to or larger than an input value (threshold value). The output control unit 114 may perform control to identify a region of the target object having a value (region size) equal to or larger than a threshold value based on the value (region size) assigned by the estimation unit 113 and the threshold value input via the input interface 140, and to output the identified region of the target object.
As a specific example, the output control unit 114 may compare the size of the polygons of each object (each target object) with a threshold value input via the input interface 140, and perform filtering to hide each object (each target object) corresponding to a polygon whose size is less than the threshold value.

一例として図４に示すように、出力制御部１１４は、閾値を０から１００の範囲で指定するスライダ１４３を入力インターフェース１４０に表示し、入力部１２１の操作に応じてスライダ１４３が動かされると、そのスライダ１４３の位置に応じて閾値を変更可能としてもよい。出力制御部１１４は、その閾値と、物体（目的対象物）（図４に例示する場合は普通車２０１）の領域の大きさとに基づいて、閾値以上となる物体（目的対象物）（図４に例示する場合は普通車２０１）の領域を、他の領域とは異なる態様で表示部１３３に表示してもよい。図４に例示する場合では、出力制御部１１４は、画像２００中の目的対象物（普通車２０１）の領域に斜線を付し、他の領域（例えば、トラック２０２、及び、図４には図示しない閾値未満の大きさの普通車等）には斜線を付さず、目的対象物（普通車２０１）の領域を他の領域（例えば、トラック２０２等）とは異なる態様で示すようになっている。 As an example, as shown in FIG. 4, the output control unit 114 may display a slider 143 on the input interface 140 for specifying the threshold in the range of 0 to 100, and when the slider 143 is moved in response to the operation of the input unit 121, the threshold may be changed in response to the position of the slider 143. Based on the threshold and the size of the area of the object (target object) (a passenger car 201 in the example shown in FIG. 4), the output control unit 114 may display the area of the object (target object) (a passenger car 201 in the example shown in FIG. 4) that is equal to or greater than the threshold on the display unit 133 in a manner different from other areas. In the example shown in FIG. 4, the output control unit 114 shades the area of the target object (a regular car 201) in the image 200, and does not shade other areas (e.g., a truck 202 and a regular car of a size less than the threshold, not shown in FIG. 4), so that the area of the target object (a regular car 201) is shown in a different manner from the other areas (e.g., the truck 202, etc.).

出力制御部１１４は、目的対象物の領域に対して指示内容１４１に基づいたアノテーションを付して出力するよう制御してもよい。また、出力制御部１１４は、上述したように受付部１１２によって受け付けた文字列（指示内容１４１（プロンプト））に基づいて、画像２００中において特定した目的対象物の領域にアノテーションを付してもよい。出力制御部１１４は、入力インターフェース１４０を介して入力された値（閾値）以上の大きさの目的対象物の領域に対して指示内容１４１（プロンプト）に基づいたアノテーションを付して出力するよう制御してもよい。
一例として図４に示す場合、出力制御部１１４は、文字列（指示内容１４１（プロンプト））として入力された「普通車」に基づいて、斜線を付すように目的対象物（普通車２０１）の領域を特定すると、その目的対象物（普通車２０１）の領域に「普通車」のアノテーションを付してもよい。 The output control unit 114 may perform control so as to add an annotation based on the instruction content 141 to the region of the target object and output the result. The output control unit 114 may also perform control so as to add an annotation to the region of the target object identified in the image 200 based on the character string (instruction content 141 (prompt)) received by the reception unit 112 as described above. The output control unit 114 may perform control so as to add an annotation based on the instruction content 141 (prompt) to the region of the target object having a size equal to or larger than the value (threshold value) input via the input interface 140 and output the result.
As an example, in the case shown in Figure 4, the output control unit 114 may identify the area of the target object (standard car 201) by adding a diagonal line based on the character string (instruction content 141 (prompt)) "standard car", and may annotate the area of the target object (standard car 201) with "standard car".

上述した出力の一例として、出力制御部１１４は、目的対象物の領域、及び、アノテーションを付した画像２００のグループのうち少なくとも１つの情報を外部装置（図示せず）に送信するよう通信部１３１を制御してもよい。外部装置は、例えば、サーバ及びユーザ端末等であってもよい。
また出力の一例として、出力制御部１１４は、目的対象物の領域、及び、アノテーションを付した画像２００のグループのうち少なくとも１つの情報を記憶するよう記憶部１３２を制御してもよい。
また出力の一例として、出力制御部１１４は、目的対象物の領域、及び、アノテーションを付した画像２００のグループのうち少なくとも１つを表示するよう表示部１３３を制御してもよい。この場合、出力制御部１１４は、例えば、画像２００中の目的対象物の領域を、その領域を除く他の領域とは異なる態様（例えば、異なる色、輪郭の囲う線の太さを変える、目的対象物の領域を明滅させる若しくは明るくする）を付して表示するよう表示部１３３を制御してもよい。 As an example of the above-mentioned output, the output control unit 114 may control the communication unit 131 to transmit at least one piece of information of the region of the target object and the group of the annotated images 200 to an external device (not shown). The external device may be, for example, a server, a user terminal, or the like.
As another example of the output, the output control unit 114 may control the storage unit 132 to store at least one piece of information on the region of the target object and the group of annotated images 200 .
As an example of output, the output control unit 114 may control the display unit 133 to display at least one of the region of the target object and the group of the annotated image 200. In this case, the output control unit 114 may control the display unit 133 to display the region of the target object in the image 200 in a manner different from other regions excluding the region (e.g., a different color, a different line thickness surrounding the outline, or blinking or brightening the region of the target object).

学習部１１５は、例えば、出力制御部１１４によって文字列（指示内容１４１（プロンプト））に対応するアノテーションが付された画像等を学習して学習済モデルを生成してもよい。すなわち、学習部１１５は、目的対象物の領域に対してアノテーションが付された場合、その目的対象物の領域を学習して学習済モデルを生成してもよい。
学習部１１５は、上述した一例を用いる場合、すなわち画像に記録される物体として、普通車２０１のアノテーションが付された領域、及び、トラック２０２のアノテーションが付された領域がある場合、それらのアノテーションが付された画像等を学習して学習済モデルを生成してもよい。 The learning unit 115 may generate a trained model by, for example, learning an image or the like that has been annotated by the output control unit 114 with a character string (the instruction content 141 (prompt)). That is, when an annotation is added to a region of a target object, the learning unit 115 may learn the region of the target object to generate a trained model.
When using the example described above, i.e., when there is an area annotated with a passenger car 201 and an area annotated with a truck 202 as objects recorded in an image, the learning unit 115 may learn the images with those annotations to generate a learned model.

ＡＩ部１１６は、学習部１１５によって生成した学習済モデルに対して対象を入力し、その対象中の目的対象物を推定してもよい。対象の一例は、静止画及び動画等の画像（画像情報）であってもよい。すなわち、ＡＩ部１１６は、推定対象となる画像（対象）を学習済モデルに入力すると、その画像（対象）に記録される目的対象物（推定対象物）を推定することが可能となる。
具体的な一例として、学習部１１５によって、普通車２０１のアノテーションが付された普通車２０１の領域と、トラック２０２のアノテーションが付されたトラック２０２の領域とを学習した学習済モデルが生成された場合、ＡＩ部１１６は、画像（対象）と、学習済モデルとに基づいて、画像（対象）中の普通車２０１又はトラック２０２（目的対象物）を推定してもよい。 The AI unit 116 may input a target to the trained model generated by the learning unit 115 and estimate a target object in the target. An example of the target may be an image (image information) such as a still image or a video. That is, when the AI unit 116 inputs an image (target) to be estimated to the trained model, it becomes possible to estimate a target object (estimated object) recorded in the image (target).
As a specific example, when the learning unit 115 generates a trained model that has learned the area of passenger car 201 annotated with passenger car 201 and the area of truck 202 annotated with truck 202, the AI unit 116 may estimate passenger car 201 or truck 202 (target object) in the image (target) based on the image (target) and the trained model.

［情報処理方法］
次に、一実施形態に係る情報処理方法について説明する。
図５は、一実施形態に係る情報処理方法について説明するためのフローチャートである。 [Information processing method]
Next, an information processing method according to an embodiment will be described.
FIG. 5 is a flowchart illustrating an information processing method according to an embodiment.

ステップＳＴ１０１において、取得部１１１は、画像２００（画像情報）を取得する。 In step ST101, the acquisition unit 111 acquires an image 200 (image information).

ステップＳＴ１０２において、受付部１１２は、目的対象物についての指示内容１４１（プロンプト）を受け付ける。受付部１１２は、目的対象物についての文字列（テキスト）による指示内容１４１（プロンプト）を受け付けてもよい。 In step ST102, the reception unit 112 receives an instruction content 141 (prompt) for the target object. The reception unit 112 may receive an instruction content 141 (prompt) for the target object in the form of a character string (text).

ステップＳＴ１０３において、推定部１１３は、ステップＳＴ１０１で受け付けた指示内容１４１に基づいて、ステップＳＴ１０２で取得した画像２００に記録される目的対象物の領域を推定する。推定部１１３は、目的対象物の領域として、画像２００における目的対象物の画素領域及び目的対象物の輪郭境界のうち少なくとも一方を推定してもよい。推定部１１３は、目的対象物の領域の大きさに応じた値を付してもよい。 In step ST103, the estimation unit 113 estimates the area of the target object recorded in the image 200 acquired in step ST102 based on the instruction content 141 received in step ST101. The estimation unit 113 may estimate at least one of the pixel area of the target object in the image 200 and the contour boundary of the target object as the area of the target object. The estimation unit 113 may assign a value according to the size of the area of the target object.

ステップＳＴ１０４において、出力制御部１１４は、ステップＳＴ１０３で推定した目的対象物の領域を出力するよう出力部を制御する。
出力制御部１１４は、ステップＳＴ１０３で付した値（目的対象物の領域の大きさに応じた値）と、入力インターフェース１４０を介して入力された閾値（値）とに基づいて、入力された閾値（値）以上の大きさの目的対象物の領域を出力するよう制御してもよい。
出力制御部１１４は、ステップＳＴ１０２で受け付けた指示内容１４１（文字列等のプロンプト）に基づいて、目的対象物の領域に対してアノテーションを付してもよい。 In step ST104, the output control section 114 controls the output section to output the area of the target object estimated in step ST103.
The output control unit 114 may control the output of an area of the target object that is equal to or larger than the input threshold (value) based on the value assigned in step ST103 (a value corresponding to the size of the area of the target object) and a threshold (value) input via the input interface 140.
The output control section 114 may annotate the region of the target object based on the instruction content 141 (prompt such as a character string) received in step ST102.

ステップＳＴ１０４の処理に応じて、目的対象物の領域に対してアノテーションが付された場合、学習部１１５は、その目的対象物の領域を学習して学習済モデルを生成してもよい。
ＡＩ部１１６は、学習部１１５によって生成した学習済モデルに対して対象を入力し、その対象中の目的対象物を推定してもよい。 When an annotation is added to the region of the target object in accordance with the processing of step ST104, the learning unit 115 may learn the region of the target object to generate a learned model.
The AI unit 116 may input an object to the trained model generated by the learning unit 115 and estimate a target object within the object.

［機能及び回路について］
次に、上述した情報処理装置１００の機能及び回路について説明する。
情報処理装置１００の各部は、コンピュータの演算処理装置等の機能として実現されてもよい。すなわち、情報処理装置１００の取得部１１１、受付部１１２、推定部１１３、出力制御部１１４、学習部１１５及びＡＩ部１１６（制御部１１０）は、コンピュータの演算処理装置等による取得機能、受付機能、推定機能、出力制御機能、学習機能及びＡＩ機能（制御機能）としてそれぞれ実現されてもよい。
情報処理プログラムは、上述した各機能をコンピュータに実現させることができる。情報処理プログラムは、例えば、メモリ、ソリッドステートドライブ、ハードディスクドライブ又は光ディスク等の、コンピュータで読み取り可能な非一時的な記憶媒体等に記録されてもよい。記憶媒体は、例えば、情報処理プログラムを格納する非一時的なコンピュータ可読媒体と言い換えてもよい。また、情報処理プログラムは、オンラインで伝送されてもよい。
また、上述したように、情報処理装置１００の各部は、コンピュータの演算処理装置等で実現されてもよい。その演算処理装置等は、例えば、集積回路等によって構成される。このため、情報処理装置１００の各部は、演算処理装置等を構成する回路として実現されてもよい。すなわち、情報処理装置１００の取得部１１１、受付部１１２、推定部１１３、出力制御部１１４、学習部１１５及びＡＩ部１１６（制御部１１０）は、コンピュータの演算処理装置等を構成する取得回路、受付回路、推定回路、出力制御回路、学習回路及びＡＩ回路（制御回路）として実現されてもよい。
また、情報処理装置１００の入力部１２１、並びに、通信部１３１、記憶部１３２及び表示部１３３（出力部）は、例えば、演算処理装置等の機能を含む入力機能、並びに、通信機能、記憶機能及び表示機能（出力機能）として実現されてもよい。また、情報処理装置１００の入力部１２１、並びに、通信部１３１、記憶部１３２及び表示部１３３（出力部）は、例えば、集積回路等によって構成されることにより入力回路、並びに、通信回路、記憶回路及び表示回路（出力回路）として実現されてもよい。また、情報処理装置１００の入力部１２１、並びに、通信部１３１、記憶部１３２及び表示部１３３（出力部）は、例えば、複数のデバイスによって構成されることにより入力部装置、並びに、通信装置、記憶装置及び表示装置（出力装置）として構成されてもよい。 [Functions and circuits]
Next, the functions and circuits of the information processing device 100 will be described.
Each unit of the information processing device 100 may be realized as a function of a computer's arithmetic processing device, etc. That is, the acquisition unit 111, the reception unit 112, the estimation unit 113, the output control unit 114, the learning unit 115, and the AI unit 116 (control unit 110) of the information processing device 100 may be realized as an acquisition function, a reception function, an estimation function, an output control function, a learning function, and an AI function (control function) by the computer's arithmetic processing device, etc.
The information processing program can cause a computer to realize each of the above-mentioned functions. The information processing program may be recorded in a non-transitory computer-readable storage medium, such as a memory, a solid state drive, a hard disk drive, or an optical disk. The storage medium may be rephrased as a non-transitory computer-readable medium that stores the information processing program. The information processing program may also be transmitted online.
Also, as described above, each part of the information processing device 100 may be realized by a computer arithmetic processing device or the like. The arithmetic processing device or the like is configured by, for example, an integrated circuit or the like. Therefore, each part of the information processing device 100 may be realized as a circuit constituting the arithmetic processing device or the like. That is, the acquisition unit 111, the reception unit 112, the estimation unit 113, the output control unit 114, the learning unit 115, and the AI unit 116 (control unit 110) of the information processing device 100 may be realized as an acquisition circuit, a reception circuit, an estimation circuit, an output control circuit, a learning circuit, and an AI circuit (control circuit) constituting the arithmetic processing device or the like of a computer.
The input unit 121, the communication unit 131, the storage unit 132, and the display unit 133 (output unit) of the information processing device 100 may be realized as, for example, an input function including the functions of an arithmetic processing device, and a communication function, a storage function, and a display function (output function). The input unit 121, the communication unit 131, the storage unit 132, and the display unit 133 (output unit) of the information processing device 100 may be realized as, for example, an input circuit, a communication circuit, a storage circuit, and a display circuit (output circuit) by being configured by an integrated circuit, etc. The input unit 121, the communication unit 131, the storage unit 132, and the display unit 133 (output unit) of the information processing device 100 may be realized as, for example, an input unit device, a communication device, a storage device, and a display device (output device) by being configured by a plurality of devices.

情報処理装置１００は、上述した複数の各部のうち１又は任意の複数を組み合わせることが可能である。
本開示では、「情報」の文言を使用しているが、「情報」の文言は「データ」と言い換えることができ、「データ」の文言は「情報」と言い換えることができる。 The information processing device 100 can combine one or any two or more of the above-mentioned units.
In this disclosure, the term "information" is used, but the term "information" can be replaced with "data" and the term "data" can be replaced with "information."

［本実施形態の態様及び効果］
次に、本実施形態の一態様及び各態様が奏する効果について説明する。なお、以下に記載する各態様は出願時の一例であり、本実施形態は以下に記載する態様に限定されることはない。すなわち、本実施形態は以下に記載する各態様に限定されることはなく、上述した各部を適宜組み合わせて実現されてもよい。また、下位の態様は、それよりも上位の態様のいずれでも引用できる場合がある。
また、以下に記載する本実施形態の効果は一例であり、各態様が奏する効果は以下に記載するものに限定されることはない。また、各態様は、例えば、以下に記載する少なくとも１つの効果を奏してもよい。 [Aspects and Effects of the Present Embodiment]
Next, one aspect of this embodiment and the effects of each aspect will be described. Note that each aspect described below is an example at the time of filing, and this embodiment is not limited to the aspects described below. In other words, this embodiment is not limited to the aspects described below, and may be realized by appropriately combining each of the above-mentioned parts. In addition, a lower aspect may be able to cite any of the higher aspects.
In addition, the effects of the present embodiment described below are merely examples, and the effects of each aspect are not limited to those described below. In addition, each aspect may, for example, achieve at least one of the effects described below.

（態様１）
一態様の情報処理装置は、画像を取得する取得部と、目的対象物についての指示内容を受け付ける受付部と、受付部によって受け付けた指示内容に基づいて、取得部によって取得した画像に記録される目的対象物の領域を推定する推定部と、推定部によって推定した目的対象物の領域を出力するよう制御する出力制御部と、を備える。
これにより、情報処理装置は、文字列（テキスト）の指示内容（プロンプト）を受け付けると、その指示内容（プロンプト）に応じた画像中の目的対象物の領域について出力（例えば、表示等）を行うことができる。
情報処理装置は、例えば、画像中に複数種類の目的対象物の領域がある場合には、出力内容（表示内容）を切り替えることにより１種類の目的対象物の領域について出力（例えば、表示等）を行うことができる。
情報処理装置は、出力対象となる目的対象物の領域に対して、文字列（テキスト）に対応するアノテーションを付すことができる。 (Aspect 1)
An information processing device of one embodiment includes an acquisition unit that acquires an image, a reception unit that receives instructions regarding a target object, an estimation unit that estimates the area of the target object to be recorded in the image acquired by the acquisition unit based on the instructions received by the reception unit, and an output control unit that controls the area of the target object estimated by the estimation unit to be output.
As a result, when the information processing device receives an instruction (prompt) in the form of a string of characters (text), it can output (e.g., display) the area of the target object in the image according to the instruction (prompt).
For example, when there are areas of multiple types of target objects in an image, the information processing device can output (e.g., display) the area of one type of target object by switching the output content (display content).
The information processing device can add an annotation corresponding to a character string (text) to a region of a target object to be output.

（態様２）
一態様の情報処理装置では、推定部は、目的対象物の領域として、画像における目的対象物の画素領域及び目的対象物の輪郭境界のうち少なくとも一方を推定することとしてもよい。
これにより、情報処理装置は、画像中の目的対象物における、画素領域及び輪郭境界のグループのうちの少なくとも一方に基づいて、アノテーションを付すことができる。 (Aspect 2)
In the information processing device of one aspect, the estimation unit may estimate, as the region of the target object, at least one of a pixel region of the target object in the image and a contour boundary of the target object.
This allows the information processing device to annotate a target object in an image based on at least one of a group of pixel regions and a group of contour boundaries.

（態様３）
一態様の情報処理装置では、推定部は、目的対象物の領域の大きさに応じた値を付し、出力制御部は、入力された値以上の大きさの目的対象物の領域を出力するよう制御することとしてもよい。
これにより、情報処理装置は、入力された値（閾値）未満の大きさの領域についてはノイズと推定することができる。情報処理装置は、誤った目的対象物の領域（ノイズ）にアノテーションを付すことを防ぐことができる。 (Aspect 3)
In one embodiment of the information processing device, the estimation unit may assign a value according to the size of the area of the target object, and the output control unit may control the output to output an area of the target object that is equal to or larger than the input value.
This allows the information processing device to estimate that a region whose size is less than the input value (threshold value) is noise, and prevents the information processing device from annotating a wrong region (noise) of a target object.

（態様４）
一態様の情報処理装置では、受付部は、目的対象物についてのテキストによる指示内容を受け付けることとしてもよい。
これにより、情報処理装置は、テキスト（文字列）の指示内容（プロンプト）に基づいて、そのテキスト（文字列）の内容に対応する目的対象物の領域にアノテーションを付すことができる。すなわち、情報処理装置は、テキスト（文字列）の内容に対応するアノテーションを目的対象物の領域に付すことができる。 (Aspect 4)
In the information processing device of one aspect, the receiving unit may receive instructions in the form of text regarding the target object.
This allows the information processing device to annotate a region of the target object that corresponds to the content of the text (character string) based on the instruction content (prompt) of the text (character string). That is, the information processing device can annotate a region of the target object that corresponds to the content of the text (character string).

（態様５）
一態様の情報処理装置では、出力制御部は、目的対象物の領域に対して指示内容に基づいたアノテーションを付して出力するよう制御することとしてもよい。
これにより、情報処理装置は、例えば、アノテーションを付した、画像中の目的対象物の領域等を学習することにより学習済モデルを生成することができる。すなわち、情報処理装置は、画像中における目的対象物に対して自動でアノテーションを付して、正解データを生成することができる。
すなわち、情報処理装置は、画像のアップロードを受け付け、さらに入力部を介して目的対象物についての指示内容（プロンプト）を受け付けると、自動的に画像中の目的対象物にアノテーションを付すことができる。情報処理装置は、自動的にアノテーションを付すことができるので、手動で画像中の目的対象物を特定してアノテーションを付す場合に比べて、作業時間及び作業工数（作業コスト）を大幅に減らすことができる。 (Aspect 5)
In the information processing device of one aspect, the output control unit may perform control so as to add an annotation based on the instruction content to the region of the target object and output the region.
This allows the information processing device to generate a trained model by learning, for example, an annotated region of a target object in an image. That is, the information processing device can automatically annotate a target object in an image and generate ground truth data.
That is, when the information processing device receives an upload of an image and further receives instructions (prompts) regarding the target object via the input unit, it can automatically annotate the target object in the image. Since the information processing device can automatically annotate, it is possible to significantly reduce the work time and work man-hours (work costs) compared to manually identifying the target object in the image and annotating it.

（態様６）
一態様の情報処理装置は、目的対象物の領域に対してアノテーションが付された場合、その目的対象物の領域を学習して学習済モデルを生成する学習部を備えることとしてもよい。
これにより、情報処理装置は、正解データに基づいて自動的に学習を行って学習済モデルを生成することができる。 (Aspect 6)
An information processing device according to one embodiment may include a learning unit that, when an annotation is added to a region of a target object, learns the region of the target object and generates a learned model.
This enables the information processing device to automatically perform learning based on the correct answer data and generate a learned model.

（態様７）
一態様の情報処理装置は、学習部によって生成した学習済モデルに対して対象を入力し、その対象中の目的対象物を推定するＡＩ部を備えることとしてもよい。
これにより、情報処理装置は、学習済モデルを生成する際に学習した種々の目的対象物について推定を行うことができる。 (Aspect 7)
An information processing device according to one embodiment may include an AI unit that inputs an object to a trained model generated by a learning unit and estimates a target object among the objects.
This enables the information processing device to make inferences about various target objects that were learned when generating the trained model.

（態様８）
一態様の情報処理方法では、コンピュータが、画像を取得する取得ステップと、目的対象物についての指示内容を受け付ける受付ステップと、受付ステップによって受け付けた指示内容に基づいて、取得ステップによって取得した画像に記録される目的対象物の領域を推定する推定ステップと、推定ステップによって推定した目的対象物の領域を出力するよう制御する出力制御ステップと、を実行する。
これにより、情報処理方法は、上述した一態様の情報処理装置と同様の効果を奏することができる。 (Aspect 8)
In one aspect of the information processing method, a computer executes an acquisition step of acquiring an image, a reception step of receiving instructions regarding a target object, an estimation step of estimating the area of the target object recorded in the image acquired by the acquisition step based on the instructions received by the reception step, and an output control step of controlling the output of the area of the target object estimated by the estimation step.
As a result, the information processing method can achieve the same effects as the information processing device according to the above-described aspect.

（態様９）
一態様の情報処理プログラムは、コンピュータに、画像を取得する取得機能と、目的対象物についての指示内容を受け付ける受付機能と、受付機能によって受け付けた指示内容に基づいて、取得機能によって取得した画像に記録される目的対象物の領域を推定する推定機能と、推定機能によって推定した目的対象物の領域を出力するよう制御する出力制御機能と、を実現させる。
これにより、情報処理プログラムは、上述した一態様の情報処理装置と同様の効果を奏することができる。 (Aspect 9)
One embodiment of an information processing program enables a computer to realize an acquisition function for acquiring an image, a reception function for receiving instructions regarding a target object, an estimation function for estimating the area of the target object recorded in the image acquired by the acquisition function based on the instructions received by the reception function, and an output control function for controlling the output of the area of the target object estimated by the estimation function.
As a result, the information processing program can achieve the same effects as the information processing device according to the above-described aspect.

１００情報処理装置
１１０制御部
１１１取得部
１１２受付部
１１３推定部
１１４出力制御部
１１５学習部
１１６ＡＩ部
１２１入力部
１３１通信部
１３２記憶部
１３３表示部
１４０入力インターフェース
１４１指示内容（プロンプト）
１４２チャットの返答内容
１４３スライダ
１４４チャットの返答内容
２００画像
２０１普通車
２０２トラック 100 Information processing device 110 Control unit 111 Acquisition unit 112 Reception unit 113 Estimation unit 114 Output control unit 115 Learning unit 116 AI unit 121 Input unit 131 Communication unit 132 Storage unit 133 Display unit 140 Input interface 141 Instruction content (prompt)
142 Chat reply content 143 Slider 144 Chat reply content 200 Image 201 Regular car 202 Truck

Claims

An acquisition unit that acquires an image;
a reception unit that receives instructions about a target object by inputting and replying to the inputted character string through chat ;
an estimation unit that estimates an area of a target object recorded in an image acquired by the acquisition unit based on the instruction content accepted by the acceptance unit;
an output control unit that controls the area of the target object estimated by the estimation unit to be output;
An information processing device comprising:

The information processing apparatus according to claim 1 , wherein the estimation unit estimates, as the region of the target object, at least one of a pixel region of the target object in the image and a contour boundary of the target object.

The estimation unit assigns a value according to a size of a region of the target object,
The information processing apparatus according to claim 1 , wherein the output control unit performs control so as to output a region of the target object having a size equal to or larger than an input value.

The information processing apparatus according to claim 1 , wherein the output control unit performs control so as to add an annotation based on an instruction content to a region of the target object and output the region.

A learning unit that learns the region of the target object and generates a learned model when the region of the target object is annotated.
The information processing device according to any one of claims 1 to 4 .

The learning unit includes an AI unit that inputs an object to the trained model generated by the learning unit and estimates a target object in the object.
The information processing device according to claim 5 .

An acquisition unit that acquires an image;
A reception unit that receives instructions regarding a target object;
an estimation unit that estimates an area of a target object recorded in the image acquired by the acquisition unit based on the instruction content accepted by the acceptance unit, and assigns a value according to the size of the area of the target object;
an output control unit that controls to output a region of the target object estimated by the estimation unit, the region being equal to or larger than an input value;
An information processing device comprising:

The computer
An acquisition step of acquiring an image;
a receiving step of receiving an instruction content for a target object by an input character string by performing input and a reply using a chat ;
an estimation step of estimating an area of a target object recorded in the image acquired by the acquisition step based on the instruction content accepted by the acceptance step;
an output control step of controlling to output the area of the target object estimated by the estimation step;
An information processing method for performing the above.

On the computer,
an acquisition function for acquiring an image;
A reception function for receiving instructions regarding a target object by inputting and replying to the inputted character string through chat ;
an estimation function that estimates an area of a target object recorded in an image acquired by the acquisition function based on the instruction content accepted by the acceptance function;
an output control function that controls to output the area of the target object estimated by the estimation function;
An information processing program that realizes this.