WO2019216263A1 - Information processing device, information processing method, and program - Google Patents

Information processing device, information processing method, and program Download PDF

Info

Publication number
WO2019216263A1
WO2019216263A1 PCT/JP2019/017879 JP2019017879W WO2019216263A1 WO 2019216263 A1 WO2019216263 A1 WO 2019216263A1 JP 2019017879 W JP2019017879 W JP 2019017879W WO 2019216263 A1 WO2019216263 A1 WO 2019216263A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
determination
unit
frame
information processing
Prior art date
Application number
PCT/JP2019/017879
Other languages
French (fr)
Japanese (ja)
Inventor
亮 中橋
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Publication of WO2019216263A1 publication Critical patent/WO2019216263A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present technology relates to an information processing device, an information processing method, and a program, and more particularly, to an information processing device, an information processing method, and a program for confirming whether learning is performed correctly.
  • Patent Literature 1 proposes a learning system using reinforcement learning.
  • the behavior learning technique in order to confirm whether or not the learned behavior is moving correctly, it is necessary for a person to confirm the movement, but it takes a very long time to confirm all the behaviors.
  • the purpose of this technology is to improve the efficiency of checking whether learning is performed correctly.
  • the concept of this technology is A detection unit for detecting that a plurality of different classes are determined because of the same area of input information;
  • the information processing apparatus includes an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.
  • the detection unit detects that a plurality of different classes are determined because of the same area of input information. Then, the information holding unit extracts and holds the information portion including the same region from the input information based on the detection information.
  • the determination of the class includes various determinations such as determination of behavior and determination of classification.
  • the determination of a plurality of different classes is a determination of a plurality of different behaviors
  • the detection unit compares a plurality of regions focused on by the determination of a plurality of different behaviors based on the input information, so that a plurality of different classes can be used for the same region. It may be configured to detect that different behaviors are determined.
  • the determination of a plurality of different actions based on the input information may be performed based on a policy obtained by learning by reinforcement learning.
  • the determination of the plurality of different actions based on the input information may be the determination of the plurality of different actions related to the automatic driving.
  • the input information may be moving image data
  • the information holding unit may extract and hold image data of each frame including the same region from the moving image data.
  • a reproduction control unit that controls reproduction of a series of frame image data held in the information holding unit may be further provided.
  • the information portion including the same area is extracted from the input information and held. It is. Therefore, since it is possible to focus on the points with high possibility of correction based on the retained information part, it is possible to efficiently check whether learning is performed correctly.
  • FIG. 1 shows a configuration example of an information processing apparatus 100 as an embodiment.
  • This information processing apparatus 100 is used for action determination in an automatic driving system.
  • it is assumed that there are two actions to be determined as appropriate, “straight ahead” and “right turn”.
  • the information processing apparatus 100 includes an image sensor 101, a learning unit 102, an action selection unit 103, a frame extraction processing unit 104, a recording / reproducing unit 105, a digest recording / reproducing unit 106, a control unit 107, a user An operation unit 108 and a display unit 109 are provided.
  • the image sensor 101 constitutes, for example, a camera arranged at the front part of the vehicle and images the front of the vehicle.
  • the moving image data obtained by the image sensor 101 is supplied to the learning unit 102, the action selecting unit 103, the recording / generating unit 105, and the digest recording / reproducing unit 106.
  • the learning unit 102 performs reinforcement learning based on the moving image data obtained by the image sensor 101, the behavior output obtained by the behavior selection unit 103, and the reward set by the user, and in what environment and in what environment Create a policy that indicates what should be done.
  • the action selection unit 103 determines and outputs an action based on the moving image data obtained by the image sensor 101 and the policy obtained by the learning unit 102. In this case, when the policy created by the learning unit 102 is not complete, it is also assumed that the action selection unit 103 selects a plurality of contradictory actions at the same time.
  • the frame extraction processing unit 104 extracts frames that are ambiguous in action determination.
  • the frame extraction processing unit 104 selects a frame having a large pixel range that redundantly contributes to a plurality of behavior determinations based on the contribution degree of each pixel to each behavior determination output from the behavior selection unit 103 for each frame.
  • the frame is extracted as an ambiguous action decision.
  • the action selection unit 103 determines the action of “straight ahead” or “right turn”. In this case, the action selection unit 103 outputs the contribution degree of each pixel to the action determination of “straight ahead” and “right turn”. If the contribution degree is greater than or equal to the threshold T for each of the “straight-ahead” and “right turn” action determinations, the frame extraction unit 104 is assumed to contribute to the action determination. Then, the frame extracting unit 104 extracts a frame having a large pixel range that contributes redundantly to the “straight-ahead” and “right turn” action determinations as a frame having an ambiguous action determination.
  • FIG. 2 shows an example of an image in each frame of t1, t2, t3, and t4.
  • the action determination of “straight forward” is made, and the range of the ellipse P1 indicates the range of pixels contributing to the action determination.
  • the action determination of “right turn” is not made, it is not extracted as a frame in which the action determination is ambiguous.
  • the action determination of “straight forward” is made, and the range of ellipses P3 and P4 indicates the range of pixels contributing to the action determination.
  • the action determination of “right turn” is not made, it is not extracted as a frame in which the action determination is ambiguous.
  • the flowchart of FIG. 3 shows an example of the processing procedure of the frame extraction processing unit 104.
  • the action selection unit 103 determines the action of “straight ahead” or “right turn”.
  • step ST1 the frame extraction processing unit 104 starts processing.
  • the frame extraction processing unit 104 targets the first frame in step ST2.
  • step ST3 it is determined whether or not to make an extraction frame.
  • step ST4 it is determined whether or not it is the last frame. If it is not the last frame, the frame extraction processing unit 104 returns to the process of step ST3 for the next frame in step ST5. On the other hand, when it is the last frame in step ST4, the frame extraction processing unit 104 ends the process in step ST6.
  • step ST11 the frame extraction processing unit 104 starts processing. Then, the frame extraction processing unit 104 targets the first action determination in step ST12.
  • the frame extraction processing unit 104 obtains a contribution determination matrix having the action determination contribution (0 to 1) of each pixel of the frame as an element.
  • the frame extraction processing unit 104 binarizes the contribution degree of pixels that are equal to or greater than the threshold T as “1” and the contribution degree of other pixels as “0”, and determines the high contribution degree. Get the matrix.
  • the threshold T is a high contribution threshold and takes a value between 0 and 1.
  • step ST15 the frame extraction processing unit 104 determines whether it is the last action determination. When it is not the last action determination, the frame extraction processing unit 104 returns to the process of step ST13 for the next action determination in step ST16. On the other hand, when it is the final action determination, the frame extraction processing unit 104 proceeds to the process of step ST17.
  • step ST17 the frame extraction processing unit 104 adds the elements after multiplying the elements of the high contribution determination matrix of each action determination for each pixel, and divides the added value by the total number of pixels to thereby calculate the contribution overlap rate r. Get.
  • the contribution overlapping rate r takes a value between 0 and 1.
  • the frame extraction processing unit 104 determines whether or not the contribution overlapping rate r is equal to or greater than the threshold value R in step ST18.
  • the threshold R is a high contribution overlap threshold and takes a value between 0 and 1.
  • the frame extraction processing unit 104 sets the frame as an extraction frame in step ST19. Then, the frame extraction processing unit 104 ends the process in step ST19. On the other hand, when the contribution overlap rate r is not equal to or greater than the threshold value R, the frame extraction processing unit 104 immediately ends the process in step ST20.
  • a high contribution determination matrix M for “straight-ahead” action determination and a high contribution determination matrix N for “right turn” action determination are obtained by the processing from step ST12 to step ST16.
  • step ST17 the elements of the high contribution determination matrices M and N of “straight ahead” and “turn right” are multiplied for each pixel and added, and the added value is divided by the total number of pixels.
  • a contribution overlapping rate r is obtained.
  • the frame extraction algorithm of the frame extraction processing unit 104 shown in the flowchart of FIG. 4 is merely an example, and the present invention is not limited to this.
  • the contribution determination matrix itself is added or multiplied without passing through the high contribution determination matrix, and it is determined that the sum exceeds the threshold value.
  • the recording / reproducing unit 105 records and reproduces the moving image data obtained by the image sensor 101.
  • the digest recording / reproducing unit 106 records the image data of the frame extracted by the frame extraction processing unit 104 as digest image data and reproduces it.
  • FIG. 5 shows an example of the relationship between the moving image data recorded by the recording / reproducing unit 105 and the digest image data recorded by the digest recording / reproducing unit 106.
  • FIG. 5 (a) shows the entire moving image data recorded by the recording / reproducing unit 105, and the frames Fa to Fb, Fc to Fd, and Fe to Ff indicate frames that are ambiguous in action determination. Yes.
  • FIG. 5B shows digest image data recorded by the digest recording / playback unit 106, and shows that image data of frames Fa to Fb, Fc to Fd, and Fe to Ff are stored. .
  • the control unit 107 controls operations of the recording / reproducing unit 105 and the digest recording / reproducing unit 106 based on a user operation from the user operation unit 108.
  • the digest image data reproduced by the digest recording / reproducing unit 106 is supplied to the display unit 109, and the digest image is displayed on the display unit 109.
  • FIG. 6A and 6B show display examples of digest images on the display unit 109.
  • FIG. This display example shows an example in which the behavior selection unit 103 performs the behavior determination of “straight ahead” and “right turn”.
  • the digest image includes a pixel range (corresponding to the element “1” in the high contribution determination matrix M of the “straight ahead” action determination) described above) and “right turn”.
  • a pixel range (corresponding to the element “1” in the high contribution determination matrix N of the above-described “right turn” action determination) is superimposed and displayed.
  • an operation button 110 for moving from the digest image reproduction to the original image reproduction is displayed in the display screen.
  • the operation button 110 is operated by the user, the recording / reproducing unit 105 is controlled by the control unit 107, and the frame range of the original image corresponding to the frame range of the currently reproduced digest image is reproduced.
  • the original image is reproduced in any of the following (1) to (3).
  • (1) Image reproduction within the frame range of Fa to Fb of the original image (2) Image reproduction of the frame range of Fa ⁇ to Fb + ⁇ of the original image, ⁇ is fixed (3) Frame range of Fa ⁇ a to Fb + ⁇ b of the original image Image reproduction, ⁇ a and ⁇ b are variable
  • ⁇ a indicates a frame period in which only a pixel contributing to one of the action determinations of “straight ahead” and “right turn” returns to the frame Fa
  • ⁇ b indicates one of “straight forward” and “right turn”.
  • the frame period which progresses from the frame Fb to the frame where only the pixels contributing to the action determination of FIG.
  • the digest image data stored in the digest recording / reproducing unit 106 is not the image data of the frame of Fa to Fb but the image data of the frame range of Fa ⁇ to Fb + ⁇ or the frame range of Fa ⁇ a to Fb + ⁇ b. It is also possible to record image data.
  • the operation of the information processing apparatus 100 shown in FIG. 1 will be briefly described.
  • the moving image data obtained by the image sensor 101 is supplied to the recording / reproducing unit 105 and recorded.
  • the moving image data obtained by the image sensor 101 is supplied to the action selection unit 103.
  • the action selection unit 103 the action is determined and output based on the moving image data obtained by the image sensor 101 and the policy obtained by the learning unit 102.
  • the frame extraction processing unit 104 has a large pixel range that contributes redundantly to a plurality of behavior determinations based on the contribution degree of each pixel to each behavior determination output from the behavior selection unit 103 for each frame.
  • a frame is extracted as a frame (extracted frame) that is ambiguous in action determination.
  • the digest recording / reproducing unit 106 records the frame image data extracted by the frame extraction processing unit 104 from the moving image data obtained from the image sensor 101 as digest image data (see FIG. 5).
  • the digest image data reproduced by the digest recording / reproducing unit 106 is supplied to the display unit 109, and a digest image, that is, an image of a frame with ambiguous behavior determination is displayed on the display unit 109.
  • a digest image that is, an image of a frame with ambiguous behavior determination is displayed on the display unit 109.
  • pixel ranges that contribute to a plurality of behavior determinations are superimposed and displayed.
  • the image data of the frame extracted by the frame extraction processing unit 104 is recorded as digest image data by the digest recording / reproducing unit 106, and when the digest image is reproduced, the digest recording / reproducing unit 106 performs the digest image.
  • An example of reproducing the image data is shown. However, only the frame information extracted by the frame extraction processing unit 104 is stored as the frame information of the digest image, and when the digest image is played back, the digest from the recording / playback unit 105 is performed based on the frame information of the digest image. An example of reproducing image data of an image is also conceivable.
  • the use case that acts according to the image information is shown.
  • the use case that acts according to the audio information can be considered in the same manner.
  • the action selection unit determines an action of “advance toward a specific sound” or “avoid when noise of other moving objects comes”.
  • the audio information at that time becomes an ambiguous information part in action determination, and this information part is extracted and digested. It will be held as audio information.
  • a use case of behavior is shown, but a use case other than behavior can be considered as a use case to which the present technology can be applied.
  • a use case of classification there is a use case of classification.
  • an area to which a plurality of classes are assigned is an ambiguous information part.
  • FIG. 7A shows an example of object recognition.
  • the object “flying object” and the object “animal” are recognized exclusively, but “bird” is an object that can be taken by both “flight object” and “animal”.
  • the “bird” part is cut out and retained as an ambiguous information part.
  • FIG. 7B shows an example of text analysis. In this example, text is analyzed to recognize “positive expression” and “negative expression”, but “Yabai” is an expression that can be taken as both “positive expression” and “negative expression”. "Is cut out and retained as an ambiguous information part.
  • the processing of each unit in the information processing apparatus 100 can be executed by hardware or can be executed by software.
  • a program constituting the software is installed in the computer.
  • the computer includes a computer incorporated in dedicated hardware, a general-purpose personal computer capable of executing various functions by installing various programs, and the like.
  • FIG. 8 shows a configuration example of the personal computer 700.
  • a CPU (Central Processing Unit) 701 executes various processes according to a program stored in a ROM (Read Only Memory) 702 or a program loaded from a storage unit 713 to a RAM (Random Access Memory) 703.
  • the RAM 703 also appropriately stores data necessary for the CPU 701 to execute various processes.
  • the CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704.
  • An input / output interface 710 is also connected to the bus 704.
  • the input / output interface 710 includes an input unit 711 including a keyboard and a mouse, a display including a CRT (Cathode Ray Tube) and an LCD (Liquid Crystal Display), an output unit 712 including a speaker, and a hard disk.
  • a communication unit 714 including a storage unit 713 and a modem is connected. The communication unit 714 performs communication processing via a network including the Internet.
  • a drive 715 is also connected to the input / output interface 710 as necessary, and a removable medium 721 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is appropriately attached, and a computer program read from them is loaded. It is installed in the storage unit 713 as necessary.
  • this technique can also take the following structures.
  • a detection unit that detects that a plurality of different classes are determined because of the same area of input information;
  • An information processing apparatus comprising: an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.
  • the determination of the plurality of different classes is determination of a plurality of different actions, The detection unit detects that a plurality of different behaviors are determined based on the same region by comparing the regions focused on by the determination of a plurality of different behaviors based on the input information.
  • the information processing apparatus described.
  • the information processing apparatus according to (2) or (3), wherein the determination of a plurality of different actions based on the input information is a determination of a plurality of different actions related to automatic driving.
  • the input information is moving image data
  • the information processing apparatus according to any one of (1) to (4), wherein the information holding unit extracts and holds image data of each frame including the same region from the moving image data.
  • the information processing apparatus further including a reproduction control unit that controls reproduction of image data of a series of frames held in the information holding unit.
  • An information processing method comprising a procedure of extracting and holding an information portion including the same region from the input information based on the detection information.
  • a program that functions as information holding means for extracting and holding an information portion including the same area from the input information based on the detection information.
  • DESCRIPTION OF SYMBOLS 100 Information processing apparatus 101 ... Image sensor 102 ... Learning part 103 ... Action selection part 104 ... Frame extraction processing part 105 ... Recording / reproducing part 106 ... Digest recording / reproducing Unit 107 ... Control unit 108 ... User operation unit 109 ... Display unit 110 ... Operation buttons

Abstract

The present invention increases the efficiency of confirming whether learning is performed correctly. It is detected that a determination is made pertaining to a plurality of different classes because of the same region of input information. Information portions including the same region are extracted from the input information and retained on the basis of detection information. The determination pertaining to classes includes various determinations, such as action determination and classification determination. Because a point that is very likely to be corrected can be selectively confirmed by the retained information portions, it is possible to efficiently confirm whether learning is correctly performed.

Description

情報処理装置、情報処理方法およびプログラムInformation processing apparatus, information processing method, and program
 本技術は、情報処理装置、情報処理方法およびプログラムに関し、詳しくは、学習が正しく行われているかどうかを確認するための情報処理装置、情報処理方法およびプログラムに関する。 The present technology relates to an information processing device, an information processing method, and a program, and more particularly, to an information processing device, an information processing method, and a program for confirming whether learning is performed correctly.
 従来、強化学習や、人からの教え込みによる行動学習技術が知られている。例えば、特許文献1には、強化学習による学習システムが提案されている。行動学習技術において、学習した行動が正しく動いているかを確認するためには、人がその動きを確認する必要があるが、全ての行動を確認するには非常に多くの時間を要する。 Conventionally, reinforcement learning and behavior learning techniques by teaching from people are known. For example, Patent Literature 1 proposes a learning system using reinforcement learning. In the behavior learning technique, in order to confirm whether or not the learned behavior is moving correctly, it is necessary for a person to confirm the movement, but it takes a very long time to confirm all the behaviors.
特開2010-073200号公報JP 2010-073200 A
 本技術の目的は、学習が正しく行われているかどうかの確認の効率化を図ることにある。 The purpose of this technology is to improve the efficiency of checking whether learning is performed correctly.
 本技術の概念は、
 入力情報の同じ領域を理由に複数の異なるクラスの判定がなされることを検出する検出部と、
 上記検出情報に基づいて、上記入力情報から上記同じ領域を含む情報部分を抜き出して保持する情報保持部を備える
 情報処理装置にある。
The concept of this technology is
A detection unit for detecting that a plurality of different classes are determined because of the same area of input information;
The information processing apparatus includes an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.
 本技術において、検出部により、入力情報の同じ領域を理由に複数の異なるクラスの判定がなされることが検出される。そして、情報保持部により、検出情報に基づいて、入力情報から上記同じ領域を含む情報部分が抜き出されて保持される。ここで、クラスの判定には、行動の判定、分類の判定など、種々の判定が含まれる。 In the present technology, the detection unit detects that a plurality of different classes are determined because of the same area of input information. Then, the information holding unit extracts and holds the information portion including the same region from the input information based on the detection information. Here, the determination of the class includes various determinations such as determination of behavior and determination of classification.
 例えば、複数の異なるクラスの判定は、複数の異なる行動の判定であり、検出部は、入力情報に基づいた複数の異なる行動の判定で着目した領域を比較することで、同じ領域を理由に複数の異なる行動の判定がなされることを検出する、ようにされてもよい。この場合、例えば、入力情報に基づいた複数の異なる行動の判定は、それぞれ、強化学習による学習で求められたポリシーに基づいて行われる、ようにされてもよい。また、この場合、例えば、入力情報に基づいた複数の異なる行動の判定は、自動運転に係る複数の異なる行動の判定である、ようにされてもよい。 For example, the determination of a plurality of different classes is a determination of a plurality of different behaviors, and the detection unit compares a plurality of regions focused on by the determination of a plurality of different behaviors based on the input information, so that a plurality of different classes can be used for the same region. It may be configured to detect that different behaviors are determined. In this case, for example, the determination of a plurality of different actions based on the input information may be performed based on a policy obtained by learning by reinforcement learning. In this case, for example, the determination of the plurality of different actions based on the input information may be the determination of the plurality of different actions related to the automatic driving.
 また、例えば、入力情報は動画像データであり、情報保持部は、動画像データから同じ領域を含む各フレームの画像データを抜き出して保持する、ようにされてもよい。また、例えば、情報保持部に保持されている一連のフレームの画像データの再生を制御する再生制御部をさらに備える、ようにされてもよい。 Further, for example, the input information may be moving image data, and the information holding unit may extract and hold image data of each frame including the same region from the moving image data. Further, for example, a reproduction control unit that controls reproduction of a series of frame image data held in the information holding unit may be further provided.
 このように本技術においては、入力情報の同じ領域を理由に複数の異なるクラスの判定がなされることの検出情報に基づいて、その入力情報から上記同じ領域を含む情報部分を抜き出して保持するものである。そのため、保持された情報部分により修正可能性が高いポイントを重点的に確認できるため、学習が正しく行われているかの確認を効率的に行うことができる。 As described above, in the present technology, based on the detection information indicating that a plurality of different classes are determined because of the same area of the input information, the information portion including the same area is extracted from the input information and held. It is. Therefore, since it is possible to focus on the points with high possibility of correction based on the retained information part, it is possible to efficiently check whether learning is performed correctly.
 本技術によれば、学習が正しく行われているかどうかの確認の効率化を図ることができる。なお、ここに記載された効果は必ずしも限定されるものではなく、本開示中に記載されたいずれかの効果であってもよい。 本 According to the present technology, it is possible to improve the efficiency of checking whether learning is being performed correctly. Note that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.
実施の形態としての情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus as embodiment. フレーム抜き出し部の処理を説明するための図である。It is a figure for demonstrating the process of a frame extraction part. フレーム抜き出し処理部の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of a frame extraction process part. 抜き出しフレーム判定の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of extraction frame determination. 記録/再生部に記録される動画像データとダイジェスト記録/再生部に記憶されるダイジェスト画像データの関係の一例を示す図である。It is a figure which shows an example of the relationship between the moving image data recorded on a recording / reproducing part, and the digest image data memorize | stored in a digest recording / reproducing part. 表示部におけるダイジェスト画像の表示例を示す図である。It is a figure which shows the example of a display of the digest image in a display part. 行動以外のユースケースを示す図である。It is a figure which shows use cases other than action. パーソナルコンピュータの構成例を示す図である。It is a figure which shows the structural example of a personal computer.
 以下、発明を実施するための形態(以下、「実施の形態」とする)について説明する。なお、説明は以下の順序で行う。
 1.実施の形態
 2.変形例
Hereinafter, modes for carrying out the invention (hereinafter referred to as “embodiments”) will be described. The description will be given in the following order.
1. Embodiment 2. FIG. Modified example
 <1.実施の形態>
 [情報処理装置]
 図1は、実施の形態としての情報処理装置100の構成例を示している。この情報処理装置100は、自動運転システムにおける行動判定のために利用される。この実施の形態では、説明を簡単にするため、適宜、判定される行動は「直進」と「右折」の2つであるとして説明する。
<1. Embodiment>
[Information processing device]
FIG. 1 shows a configuration example of an information processing apparatus 100 as an embodiment. This information processing apparatus 100 is used for action determination in an automatic driving system. In this embodiment, in order to simplify the description, it is assumed that there are two actions to be determined as appropriate, “straight ahead” and “right turn”.
 情報処理装置100は、イメージセンサ101と、学習部102と、行動選択部103と、フレーム抜き出し処理部104と、記録/再生部105と、ダイジェスト記録/再生部106と、制御部107と、ユーザ操作部108と、表示部109を有している。 The information processing apparatus 100 includes an image sensor 101, a learning unit 102, an action selection unit 103, a frame extraction processing unit 104, a recording / reproducing unit 105, a digest recording / reproducing unit 106, a control unit 107, a user An operation unit 108 and a display unit 109 are provided.
 イメージセンサ101は、例えば車両のフロント部に配置されるカメラを構成しており、車両の前方を撮像する。イメージセンサ101で得られた動画像データは、学習部102、行動選択部103、記録/生成部105およびダイジェスト記録/再生部106に供給される。 The image sensor 101 constitutes, for example, a camera arranged at the front part of the vehicle and images the front of the vehicle. The moving image data obtained by the image sensor 101 is supplied to the learning unit 102, the action selecting unit 103, the recording / generating unit 105, and the digest recording / reproducing unit 106.
 学習部102は、イメージセンサ101で得られた動画像データと、行動選択部103で得られた行動出力と、ユーザが設定する報酬とにより、強化学習を行って、どのような環境でどのように行動すべきかを示すポリシーを作成する。行動選択部103は、イメージセンサ101で得られた動画像データと、学習部102で求められたポリシーとにより、行動を判定して出力する。この場合、学習部102で作成されたポリシーが完全でないとき、行動選択部103は、矛盾する複数の行動を同時に選択することも想定される。 The learning unit 102 performs reinforcement learning based on the moving image data obtained by the image sensor 101, the behavior output obtained by the behavior selection unit 103, and the reward set by the user, and in what environment and in what environment Create a policy that indicates what should be done. The action selection unit 103 determines and outputs an action based on the moving image data obtained by the image sensor 101 and the policy obtained by the learning unit 102. In this case, when the policy created by the learning unit 102 is not complete, it is also assumed that the action selection unit 103 selects a plurality of contradictory actions at the same time.
 フレーム抜き出し処理部104は、行動判定に曖昧性があるフレームを抽出する。フレーム抜き出し処理部104は、行動選択部103からフレーム毎に出力されるそれぞれの行動判定への各ピクセルの寄与度に基づき、複数の行動判定に対して重複して寄与するピクセル範囲が大きなフレームを、行動判定に曖昧性があるフレームとして抽出する。 The frame extraction processing unit 104 extracts frames that are ambiguous in action determination. The frame extraction processing unit 104 selects a frame having a large pixel range that redundantly contributes to a plurality of behavior determinations based on the contribution degree of each pixel to each behavior determination output from the behavior selection unit 103 for each frame. The frame is extracted as an ambiguous action decision.
 ここで、例えば、行動選択部103は、「直進」または「右折」の行動を判定するものとする。この場合、行動選択部103は、「直進」および「右折」の行動判定への各ピクセルの寄与度を出力する。フレーム抜き出し部104は、「直進」および「右折」の行動判定のそれぞれに対して、寄与度が閾値T以上であれば、そのピクセルはその行動判定に寄与しているものとする。そして、フレーム抜き出し部104は、「直進」および「右折」の行動判定に対して重複して寄与するピクセル範囲が大きなフレームを、行動判定に曖昧性があるフレームとして抽出する。 Here, for example, the action selection unit 103 determines the action of “straight ahead” or “right turn”. In this case, the action selection unit 103 outputs the contribution degree of each pixel to the action determination of “straight ahead” and “right turn”. If the contribution degree is greater than or equal to the threshold T for each of the “straight-ahead” and “right turn” action determinations, the frame extraction unit 104 is assumed to contribute to the action determination. Then, the frame extracting unit 104 extracts a frame having a large pixel range that contributes redundantly to the “straight-ahead” and “right turn” action determinations as a frame having an ambiguous action determination.
 図2は、t1、t2、t3、t4の各フレームにおける画像の一例を示している。t1のフレームにおいては、「直進」の行動判定がなされており、楕円P1の範囲は、その行動判定に寄与しているピクセルの範囲を示している。このt1のフレームにおいては、「右折」の行動判定はなされていないので、行動判定に曖昧性があるフレームとして抽出されることはない。 FIG. 2 shows an example of an image in each frame of t1, t2, t3, and t4. In the frame at t1, the action determination of “straight forward” is made, and the range of the ellipse P1 indicates the range of pixels contributing to the action determination. In the frame at t1, since the action determination of “right turn” is not made, it is not extracted as a frame in which the action determination is ambiguous.
 t2のフレームにおいては、「直進」の行動判定がなされており、楕円P2の範囲は、その行動判定に寄与しているピクセルの範囲を示している。また、このt2のフレームにおいては、「右折」の行動判定がなされており、楕円Q2の範囲は、その行動判定に寄与しているピクセルの範囲を示している。このt2のフレームにおいては、「直進」および「右折」の行動判定がなされており、しかも、それぞれの判定に寄与しているピクセル範囲が大きく重なっているので、行動判定に曖昧性があるフレームとして抽出される。 In the frame of t2, “straight forward” action determination is made, and the range of the ellipse P2 indicates the range of pixels contributing to the action determination. Further, in the frame at t2, the action determination of “right turn” is performed, and the range of the ellipse Q2 indicates the range of pixels contributing to the action determination. In the frame at t2, “straight forward” and “right turn” action determinations are made, and the pixel ranges contributing to the respective determinations are greatly overlapped. Extracted.
 t3,t4のフレームにおいては、「直進」の行動判定がなされており、楕円P3,P4の範囲は、その行動判定に寄与しているピクセルの範囲を示している。このt3,t4のフレームにおいては、「右折」の行動判定はなされていないので、行動判定に曖昧性があるフレームとして抽出されることはない。 In the frames of t3 and t4, the action determination of “straight forward” is made, and the range of ellipses P3 and P4 indicates the range of pixels contributing to the action determination. In the frames at t3 and t4, since the action determination of “right turn” is not made, it is not extracted as a frame in which the action determination is ambiguous.
 図3のフローチャートは、フレーム抜き出し処理部104の処理手順の一例を示している。この場合、行動選択部103は、「直進」または「右折」の行動を判定するものとする。フレーム抜き出し処理部104は、ステップST1において、処理を開始する。次に、フレーム抜き出し処理部104は、ステップST2において、最初のフレームを対象にする。次に、ステップST3において、抜き出しフレームにするか否かの判定をする。 The flowchart of FIG. 3 shows an example of the processing procedure of the frame extraction processing unit 104. In this case, the action selection unit 103 determines the action of “straight ahead” or “right turn”. In step ST1, the frame extraction processing unit 104 starts processing. Next, the frame extraction processing unit 104 targets the first frame in step ST2. Next, in step ST3, it is determined whether or not to make an extraction frame.
 次に、ステップST4において、最後のフレームか否かを判断する。最後のフレームでないとき、フレーム抜き出し処理部104は、ステップST5において、次のフレームを対象として、ステップST3の処理に戻る。一方、ステップST4で最後のフレームであるとき、フレーム抜き出し処理部104は、ステップST6において、処理を終了する。 Next, in step ST4, it is determined whether or not it is the last frame. If it is not the last frame, the frame extraction processing unit 104 returns to the process of step ST3 for the next frame in step ST5. On the other hand, when it is the last frame in step ST4, the frame extraction processing unit 104 ends the process in step ST6.
 図4のフローチャートは、図3のフローチャートにおけるステップST3の処理手順の一例を示している。フレーム抜き出し処理部104は、ステップST11において、処理を開始する。そして、フレーム抜き出し処理部104は、ステップST12において、最初の行動判定を対象とする。 The flowchart of FIG. 4 shows an example of the processing procedure of step ST3 in the flowchart of FIG. In step ST11, the frame extraction processing unit 104 starts processing. Then, the frame extraction processing unit 104 targets the first action determination in step ST12.
 次に、フレーム抜き出し処理部104は、ステップST13において、フレームの各ピクセルの行動判定寄与度(0~1)を要素とする寄与度判定マトリクスを得る。次に、フレーム抜き出し処理部104は、ステップST14において、閾値T以上のピクセルの寄与度を“1”とし、それ以外のピクセルの寄与度を“0”として、二値化して、高寄与度判定マトリクスを得る。ここで、閾値Tは、高寄与度の閾値であって、0~1の間の値をとる。 Next, in step ST13, the frame extraction processing unit 104 obtains a contribution determination matrix having the action determination contribution (0 to 1) of each pixel of the frame as an element. Next, in step ST14, the frame extraction processing unit 104 binarizes the contribution degree of pixels that are equal to or greater than the threshold T as “1” and the contribution degree of other pixels as “0”, and determines the high contribution degree. Get the matrix. Here, the threshold T is a high contribution threshold and takes a value between 0 and 1.
 次に、フレーム抜き出し処理部104は、ステップST15において、最後の行動判定であるか否かを判断する。最後の行動判定でないとき、フレーム抜き出し処理部104は、ステップST16において、次の行動判定を対象として、ステップST13の処理に戻る。一方、最後の行動判定であるとき、フレーム抜き出し処理部104は、ステップST17の処理に進む。 Next, in step ST15, the frame extraction processing unit 104 determines whether it is the last action determination. When it is not the last action determination, the frame extraction processing unit 104 returns to the process of step ST13 for the next action determination in step ST16. On the other hand, when it is the final action determination, the frame extraction processing unit 104 proceeds to the process of step ST17.
 このステップST17において、フレーム抜き出し処理部104は、各行動判定の高寄与度判定マトリクスの要素をピクセル毎に掛けた後に加算し、その加算値を全ピクセル数で割ることで、寄与度重複率rを得る。この寄与度重複率rは、0~1の間の値をとる。次に、フレーム抜き出し処理部104は、ステップST18において、寄与度重複率rが閾値R以上であるか否かを判断する。ここで、閾値Rは、高寄与度の重複の閾値であって、0~1の間の値をとる。 In step ST17, the frame extraction processing unit 104 adds the elements after multiplying the elements of the high contribution determination matrix of each action determination for each pixel, and divides the added value by the total number of pixels to thereby calculate the contribution overlap rate r. Get. The contribution overlapping rate r takes a value between 0 and 1. Next, the frame extraction processing unit 104 determines whether or not the contribution overlapping rate r is equal to or greater than the threshold value R in step ST18. Here, the threshold R is a high contribution overlap threshold and takes a value between 0 and 1.
 寄与度重複率rが閾値R以上であるとき、フレーム抜き出し処理部104は、ステップST19において、当該フレームを抜き出しフレームにする。そして、フレーム抜き出し処理部104は、ステップST19において、処理を終了する。一方、寄与度重複率rが閾値R以上でないとき、フレーム抜き出し処理部104は、直ちに、ステップST20において、処理を終了する。 When the contribution overlap rate r is equal to or greater than the threshold value R, the frame extraction processing unit 104 sets the frame as an extraction frame in step ST19. Then, the frame extraction processing unit 104 ends the process in step ST19. On the other hand, when the contribution overlap rate r is not equal to or greater than the threshold value R, the frame extraction processing unit 104 immediately ends the process in step ST20.
 ここで、例えば、上述したように、行動選択部103が「直進」および「右折」の行動判定を行う場合を考える。この場合、ステップST12からステップST16の処理により、「直進」の行動判定の高寄与度判定マトリクスMと、「右折」の行動判定の高寄与度判定マトリクスNが得られる。 Here, for example, as described above, consider a case where the action selection unit 103 performs action determination of “straight ahead” and “right turn”. In this case, a high contribution determination matrix M for “straight-ahead” action determination and a high contribution determination matrix N for “right turn” action determination are obtained by the processing from step ST12 to step ST16.
 そして、ステップST17の処理により、「直進」および「右折」の高寄与度判定マトリクスM,Nの要素がピクセル毎に掛けられた後に加算され、その加算値が全ピクセル数で割られることで、寄与度重複率rが得られる。ここで、「直進」の高寄与度判定マトリクスM内の各要素をMijとし、「右折」の高寄与度判定マトリクスN内の各要素をNijとするとき、寄与度重複率rは、以下の数式(1)で表される。
  r=(Σi,j Mij×Nij)/(i×j)   ・・・(1)
Then, by the processing of step ST17, the elements of the high contribution determination matrices M and N of “straight ahead” and “turn right” are multiplied for each pixel and added, and the added value is divided by the total number of pixels. A contribution overlapping rate r is obtained. Here, when each element in the “straight forward” high contribution determination matrix M is Mij and each element in the “right turn” high contribution determination matrix N is Nij, the contribution overlap ratio r is as follows. It is expressed by Equation (1).
r = (Σi, j Mij × Nij) / (i × j) (1)
 なお、上述したように、図4のフローチャートで示すフレーム抜き出し処理部104のフレーム抜き出しのアルゴリズムはあくまでも一例であって、これに限定されるものではない。例えば、高寄与度判定マトリクスを経ずに、寄与度判定マトリクスそのものを加算、もしくは乗算して、その和が閾値を超えたら重複と判定する、ということも考えられる。 As described above, the frame extraction algorithm of the frame extraction processing unit 104 shown in the flowchart of FIG. 4 is merely an example, and the present invention is not limited to this. For example, it is conceivable that the contribution determination matrix itself is added or multiplied without passing through the high contribution determination matrix, and it is determined that the sum exceeds the threshold value.
 図1に戻って、記録/再生部105は、イメージセンサ101で得られた動画像データの記録をすると共に、その再生をする。ダイジェスト記録/再生部106は、フレーム抜き出し処理部104で抽出されたフレームの画像データをダイジェスト画像データとして記録すると共に、その再生をする。 Returning to FIG. 1, the recording / reproducing unit 105 records and reproduces the moving image data obtained by the image sensor 101. The digest recording / reproducing unit 106 records the image data of the frame extracted by the frame extraction processing unit 104 as digest image data and reproduces it.
 図5は、記録/再生部105で記録される動画像データと、ダイジェスト記録/再生部106で記録されるダイジェスト画像データの関係の一例を示している。図5(a)は記録/再生部105で記録される動画像データの全体を示しており、Fa~Fb、Fc~Fd,Fe~Ffのフレームは行動判定に曖昧性があるフレームを示している。図5(b)は、ダイジェスト記録/再生部106で記録されるダイジェスト画像データを示しており、Fa~Fb、Fc~Fd,Fe~Ffのフレームの画像データが記憶されることを示している。 FIG. 5 shows an example of the relationship between the moving image data recorded by the recording / reproducing unit 105 and the digest image data recorded by the digest recording / reproducing unit 106. FIG. 5 (a) shows the entire moving image data recorded by the recording / reproducing unit 105, and the frames Fa to Fb, Fc to Fd, and Fe to Ff indicate frames that are ambiguous in action determination. Yes. FIG. 5B shows digest image data recorded by the digest recording / playback unit 106, and shows that image data of frames Fa to Fb, Fc to Fd, and Fe to Ff are stored. .
 制御部107は、ユーザ操作部108からのユーザ操作に基づき、記録/再生部105およびダイジェスト記録/再生部106の動作を制御する。ダイジェスト記録/再生部106で再生されたダイジェスト画像データは、表示部109に供給され、この表示部109にダイジェスト画像の表示が行われる。 The control unit 107 controls operations of the recording / reproducing unit 105 and the digest recording / reproducing unit 106 based on a user operation from the user operation unit 108. The digest image data reproduced by the digest recording / reproducing unit 106 is supplied to the display unit 109, and the digest image is displayed on the display unit 109.
 図6(a),(b)は、表示部109におけるダイジェスト画像の表示例を示している。この表示例は、行動選択部103が「直進」および「右折」の行動判定を行う場合の例を示している。図6(a)の表示例においては、表示画面内にダイジェスト画像の表示領域がひとつだけ存在し、そこにダイジェスト画像が表示される。そして、このダイジェスト画像には、「直進」の行動判定に寄与しているピクセル範囲(上述の「直進」の行動判定の高寄与度判定マトリクスMにおける“1”の要素に対応)と、「右折」の行動判定に寄与しているピクセル範囲(上述の「右折」の行動判定の高寄与度判定マトリクスNにおける“1”の要素に対応)が、重畳表示されている。 6A and 6B show display examples of digest images on the display unit 109. FIG. This display example shows an example in which the behavior selection unit 103 performs the behavior determination of “straight ahead” and “right turn”. In the display example of FIG. 6A, there is only one digest image display area in the display screen, and the digest image is displayed there. The digest image includes a pixel range (corresponding to the element “1” in the high contribution determination matrix M of the “straight ahead” action determination) described above) and “right turn”. A pixel range (corresponding to the element “1” in the high contribution determination matrix N of the above-described “right turn” action determination) is superimposed and displayed.
 図6(b)の表示例においては、表示画面内にダイジェスト画像の表示領域が3つ存在し、左側のひとつは大きく、右側の2つは小さい。これらの3つの表示領域に同一のダイジェスト画像が表示される。右側上部に表示されるダイジェスト画像には、「直進」の行動判定に寄与しているピクセル範囲が、重畳表示されている。右側下部に表示されるダイジェスト画像には、「右折」の行動判定に寄与しているピクセル範囲が、重畳表示されている。 6B, there are three digest image display areas in the display screen, one on the left is large and two on the right are small. The same digest image is displayed in these three display areas. In the digest image displayed on the upper right side, the pixel range contributing to the action determination of “straight forward” is superimposed and displayed. In the digest image displayed on the lower right side, the pixel range contributing to the action determination of “right turn” is superimposed and displayed.
 図6(a),(b)に示すように、表示画面内には、ダイジェスト画像の再生から元画像の再生に移るための操作ボタン110が表示されている。この操作ボタン110がユーザにより操作されることで、制御部107により記録/再生部105が制御され、現在再生しているダイジェスト画像のフレーム範囲に対応した元画像のフレーム範囲の再生が行われる。 As shown in FIGS. 6 (a) and 6 (b), an operation button 110 for moving from the digest image reproduction to the original image reproduction is displayed in the display screen. When the operation button 110 is operated by the user, the recording / reproducing unit 105 is controlled by the control unit 107, and the frame range of the original image corresponding to the frame range of the currently reproduced digest image is reproduced.
 例えば、Fa~Fbのフレーム範囲のダイジェスト画像の再生が行われている段階で操作ボタン110が操作された場合、以下の(1)~(3)のいずれかで元画像の再生がなされる。
 (1)元画像のFa~Fbのフレーム範囲の画像再生
 (2)元画像のFa-δ~Fb+δのフレーム範囲の画像再生、δは固定
 (3)元画像のFa-Δa~Fb+Δbのフレーム範囲の画像再生、Δa,Δbは可変
For example, when the operation button 110 is operated while the digest image in the frame range of Fa to Fb is being reproduced, the original image is reproduced in any of the following (1) to (3).
(1) Image reproduction within the frame range of Fa to Fb of the original image (2) Image reproduction of the frame range of Fa−δ to Fb + δ of the original image, δ is fixed (3) Frame range of Fa−Δa to Fb + Δb of the original image Image reproduction, Δa and Δb are variable
 例えば、Δaは、「直進」および「右折」の一方の行動判定に寄与しているピクセルだけが存在するフレームまでフレームFaから戻るフレーム期間を示し、Δbは、「直進」および「右折」の一方の行動判定に寄与しているピクセルだけが存在するフレームまでフレームFbから進むフレーム期間を示している。 For example, Δa indicates a frame period in which only a pixel contributing to one of the action determinations of “straight ahead” and “right turn” returns to the frame Fa, and Δb indicates one of “straight forward” and “right turn”. The frame period which progresses from the frame Fb to the frame where only the pixels contributing to the action determination of FIG.
 なお、ダイジェスト記録/再生部106に記憶されるダイジェスト画像データとして、Fa~Fbのフレームの画像データではなく、Fa-δ~Fb+δのフレーム範囲の画像データ、あるいはFa-Δa~Fb+Δbのフレーム範囲の画像データを記録することも考えられる。 It should be noted that the digest image data stored in the digest recording / reproducing unit 106 is not the image data of the frame of Fa to Fb but the image data of the frame range of Fa−δ to Fb + δ or the frame range of Fa−Δa to Fb + Δb. It is also possible to record image data.
 図1に示す情報処理装置100の動作を簡単に説明する。イメージセンサ101で得られた動画像データは、記録/再生部105に供給されて記録される。また、イメージセンサ101で得られた動画像データは、行動選択部103に供給される。行動選択部103では、イメージセンサ101で得られた動画像データと、学習部102で求められたポリシーとにより、行動が判定されて出力される。 The operation of the information processing apparatus 100 shown in FIG. 1 will be briefly described. The moving image data obtained by the image sensor 101 is supplied to the recording / reproducing unit 105 and recorded. The moving image data obtained by the image sensor 101 is supplied to the action selection unit 103. In the action selection unit 103, the action is determined and output based on the moving image data obtained by the image sensor 101 and the policy obtained by the learning unit 102.
 また、フレーム抜き出し処理部104では、行動選択部103からフレーム毎に出力されるそれぞれの行動判定への各ピクセルの寄与度に基づき、複数の行動判定に対して重複して寄与するピクセル範囲が大きなフレームが、行動判定に曖昧性があるフレーム(抜き出しフレーム)として抽出される。ダイジェスト記録/再生部106では、イメージセンサ101から得られた動画像データのうち、フレーム抜き出し処理部104で抽出されたフレームの画像データが、ダイジェスト画像データとして記録される(図5参照)。 In addition, the frame extraction processing unit 104 has a large pixel range that contributes redundantly to a plurality of behavior determinations based on the contribution degree of each pixel to each behavior determination output from the behavior selection unit 103 for each frame. A frame is extracted as a frame (extracted frame) that is ambiguous in action determination. The digest recording / reproducing unit 106 records the frame image data extracted by the frame extraction processing unit 104 from the moving image data obtained from the image sensor 101 as digest image data (see FIG. 5).
 ユーザ操作部108からのユーザ操作に基づき、ダイジェスト記録/再生部106の動作が制御される。ダイジェスト記録/再生部106で再生されたダイジェスト画像データは、表示部109に供給され、この表示部109にダイジェスト画像、つまり、行動判定に曖昧性があるフレームの画像が表示される。このダイジェスト画像には、複数の行動判定にそれぞれ寄与しているピクセル範囲が重畳表示される。 Based on a user operation from the user operation unit 108, the operation of the digest recording / reproducing unit 106 is controlled. The digest image data reproduced by the digest recording / reproducing unit 106 is supplied to the display unit 109, and a digest image, that is, an image of a frame with ambiguous behavior determination is displayed on the display unit 109. In the digest image, pixel ranges that contribute to a plurality of behavior determinations are superimposed and displayed.
 なお、上述の説明では、フレーム抜き出し処理部104で抽出されたフレームの画像データをダイジェスト記録/再生部106でダイジェスト画像データとして記録し、ダイジェスト画像の再生時には、ダイジェスト記録/再生部106でダイジェスト画像の画像データを再生する例を示した。しかし、フレーム抜き出し処理部104で抽出されたフレームの情報のみをダイジェスト画像のフレーム情報として記憶しておき、ダイジェスト画像の再生時には、このダイジェスト画像のフレーム情報に基づいて、記録/再生部105からダイジェスト画像の画像データを再生する例も考えられる。 In the above description, the image data of the frame extracted by the frame extraction processing unit 104 is recorded as digest image data by the digest recording / reproducing unit 106, and when the digest image is reproduced, the digest recording / reproducing unit 106 performs the digest image. An example of reproducing the image data is shown. However, only the frame information extracted by the frame extraction processing unit 104 is stored as the frame information of the digest image, and when the digest image is played back, the digest from the recording / playback unit 105 is performed based on the frame information of the digest image. An example of reproducing image data of an image is also conceivable.
 以上説明したように、図1に示す情報処理装置100においては、行動判定に曖昧性があるフレームを抽出し、そのフレームの画像データをダイジェスト画像データとして記録しておくものである。そのため、ダイジェスト画像から報酬設定の修正可能性が高いポイントを重点的に確認できるため、行動学習アルゴリズム開発のトライアンドエラーの速度を促進させることができる。 As described above, in the information processing apparatus 100 shown in FIG. 1, a frame with ambiguous behavior determination is extracted, and the image data of the frame is recorded as digest image data. Therefore, since points with a high possibility of revising the reward setting can be confirmed from the digest image, it is possible to accelerate the speed of trial and error in behavior learning algorithm development.
 <2.変形例>
 なお、上述の実施の形態において、画像情報に従って行動するユースケースを示したが、音声情報に従って行動するユースケースも同様に考えることができる。例えば、行動選択部では、「特定の音に向かって進む」または「他の移物の騒音が来たら避ける」の行動を判定するものとする。その場合、同方向から“特定の音”と“他の移物の騒音”が来た場合には、その時の音声情報は行動判定に曖昧性がある情報部分となり、この情報部分を抜き出してダイジェスト音声情報として保持することになる。
<2. Modification>
In the above-described embodiment, the use case that acts according to the image information is shown. However, the use case that acts according to the audio information can be considered in the same manner. For example, it is assumed that the action selection unit determines an action of “advance toward a specific sound” or “avoid when noise of other moving objects comes”. In that case, when “specific sound” and “noise of other moving objects” come from the same direction, the audio information at that time becomes an ambiguous information part in action determination, and this information part is extracted and digested. It will be held as audio information.
 また、上述実施の形態においては、行動のユースケースを示したが、本技術を適用し得るユースケースとして行動以外のユースケースも考えることができる。例えば、クラス分類のユースケースがある。この場合、特定の排他的なクラス分類において、複数のクラスが割り当てられる領域は曖昧性がある情報部分となる。 In the above-described embodiment, a use case of behavior is shown, but a use case other than behavior can be considered as a use case to which the present technology can be applied. For example, there is a use case of classification. In this case, in a specific exclusive class classification, an area to which a plurality of classes are assigned is an ambiguous information part.
 図7(a)は、オブジェクト認識の例を示している。この例では、“飛行物”のオブジェクトと“動物”のオブジェクトを排他的に認識するものであるが、「鳥」は、“飛行物”にも“動物”にもとれるオブジェクトであり、この「鳥」の部分が切出されて、曖昧性がある情報部分として保持される。図7(b)は、テキスト解析の例を示している。この例では、テキストを解析して“ポジティブ表現”と“ネガティブ表現”を認識するものであるが、「やばい」は“ポジティブ表現”にも“ネガティブ表現”にもとれる表現であり、この「やばい」の部分が切出されて、曖昧性がある情報部分として保持される。 FIG. 7A shows an example of object recognition. In this example, the object “flying object” and the object “animal” are recognized exclusively, but “bird” is an object that can be taken by both “flight object” and “animal”. The “bird” part is cut out and retained as an ambiguous information part. FIG. 7B shows an example of text analysis. In this example, text is analyzed to recognize “positive expression” and “negative expression”, but “Yabai” is an expression that can be taken as both “positive expression” and “negative expression”. "Is cut out and retained as an ambiguous information part.
 また、上述の実施の形態において、情報処理装置100における各部の処理は、ハードウエアにより実行することもできるし、ソフトウエアにより実行することもできる。一連の処理をソフトウエアにより実行する場合には、そのソフトウエアを構成するプログラムが、コンピュータにインストールされる。ここで、コンピュータには、専用のハードウエアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な汎用のパーソナルコンピュータなどが含まれる。 In the above-described embodiment, the processing of each unit in the information processing apparatus 100 can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software is installed in the computer. Here, the computer includes a computer incorporated in dedicated hardware, a general-purpose personal computer capable of executing various functions by installing various programs, and the like.
 図8は、パーソナルコンピュータ700の構成例を示している。CPU(Central Processing Unit)701は、ROM(Read Only Memory)702に記憶されているプログラム、または記憶部713からRAM(Random Access Memory)703にロードされたプログラムに従って各種の処理を実行する。RAM703にはまた、CPU701が各種の処理を実行する上において必要なデータなども適宜記憶される。 FIG. 8 shows a configuration example of the personal computer 700. A CPU (Central Processing Unit) 701 executes various processes according to a program stored in a ROM (Read Only Memory) 702 or a program loaded from a storage unit 713 to a RAM (Random Access Memory) 703. The RAM 703 also appropriately stores data necessary for the CPU 701 to execute various processes.
 CPU701、ROM702、およびRAM703は、バス704を介して相互に接続されている。このバス704にはまた、入出力インタフェース710も接続されている。 The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input / output interface 710 is also connected to the bus 704.
 入出力インタフェース710には、キーボード、マウスなどよりなる入力部711、CRT(Cathode Ray Tube)やLCD(Liquid Crystal Display)などよりなるディスプレイ、並びにスピーカなどよりなる出力部712、ハードディスクなどより構成される記憶部713、モデムなどより構成される通信部714が接続されている。通信部714は、インターネットを含むネットワークを介しての通信処理を行う。 The input / output interface 710 includes an input unit 711 including a keyboard and a mouse, a display including a CRT (Cathode Ray Tube) and an LCD (Liquid Crystal Display), an output unit 712 including a speaker, and a hard disk. A communication unit 714 including a storage unit 713 and a modem is connected. The communication unit 714 performs communication processing via a network including the Internet.
 入出力インタフェース710にはまた、必要に応じてドライブ715が接続され、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア721が適宜装着され、それらから読み出されたコンピュータプログラムが、必要に応じて記憶部713にインストールされる。 A drive 715 is also connected to the input / output interface 710 as necessary, and a removable medium 721 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is appropriately attached, and a computer program read from them is loaded. It is installed in the storage unit 713 as necessary.
 また、添付図面を参照しながら本開示の好適な実施形態について詳細に説明したが、本開示の技術的範囲はかかる例に限定されない。本開示の技術分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本開示の技術的範囲に属するものと了解される。 In addition, the preferred embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, but the technical scope of the present disclosure is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field of the present disclosure can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that it belongs to the technical scope of the present disclosure.
 また、本技術は、以下のような構成を取ることもできる。
 (1)入力情報の同じ領域を理由に複数の異なるクラスの判定がなされることを検出する検出部と、
 上記検出情報に基づいて、上記入力情報から上記同じ領域を含む情報部分を抜き出して保持する情報保持部を備える
 情報処理装置。
 (2)上記複数の異なるクラスの判定は、複数の異なる行動の判定であり、
 上記検出部は、上記入力情報に基づいた複数の異なる行動の判定で着目した領域を比較することで、同じ領域を理由に複数の異なる行動の判定がなされることを検出する
 前記(1)に記載の情報処理装置。
 (3)上記入力情報に基づいた複数の異なる行動の判定は、それぞれ、強化学習による学習で求められたポリシーに基づいて行われる
 前記(2)に記載の情報処理装置。
 (4)上記入力情報に基づいた複数の異なる行動の判定は、自動運転に係る複数の異なる行動の判定である
 前記(2)または(3)に記載の情報処理装置。
 (5)上記入力情報は動画像データであり、
 上記情報保持部は、上記動画像データから上記同じ領域を含む各フレームの画像データを抜き出して保持する
 前記(1)から(4)のいずれかに記載の情報処理装置。
 (6)上記情報保持部に保持されている一連のフレームの画像データの再生を制御する再生制御部をさらに備える
 前記(5)に記載の情報処理装置。
 (7)入力情報の同じ領域を理由に複数の異なるクラスの判定がなされることを検出する手順と、
 上記検出情報に基づいて、上記入力情報から上記同じ領域を含む情報部分を抜き出して保持する手順を有する
 情報処理方法。
 (8)コンピュータを、
 入力情報の同じ領域を理由に複数の異なるクラスの判定がなされることを検出する検出手段と、
 上記検出情報に基づいて、上記入力情報から上記同じ領域を含む情報部分を抜き出して保持する情報保持手段として機能させる
 プログラム。
Moreover, this technique can also take the following structures.
(1) a detection unit that detects that a plurality of different classes are determined because of the same area of input information;
An information processing apparatus comprising: an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.
(2) The determination of the plurality of different classes is determination of a plurality of different actions,
The detection unit detects that a plurality of different behaviors are determined based on the same region by comparing the regions focused on by the determination of a plurality of different behaviors based on the input information. The information processing apparatus described.
(3) The information processing apparatus according to (2), wherein the determination of a plurality of different actions based on the input information is performed based on a policy obtained by learning by reinforcement learning.
(4) The information processing apparatus according to (2) or (3), wherein the determination of a plurality of different actions based on the input information is a determination of a plurality of different actions related to automatic driving.
(5) The input information is moving image data,
The information processing apparatus according to any one of (1) to (4), wherein the information holding unit extracts and holds image data of each frame including the same region from the moving image data.
(6) The information processing apparatus according to (5), further including a reproduction control unit that controls reproduction of image data of a series of frames held in the information holding unit.
(7) a procedure for detecting that a plurality of different classes are determined because of the same area of input information;
An information processing method comprising a procedure of extracting and holding an information portion including the same region from the input information based on the detection information.
(8)
Detection means for detecting that a plurality of different classes are determined because of the same area of input information;
A program that functions as information holding means for extracting and holding an information portion including the same area from the input information based on the detection information.
 100・・・情報処理装置
 101・・・イメージセンサ
 102・・・学習部
 103・・・行動選択部
 104・・・フレーム抜き出し処理部
 105・・・記録/再生部
 106・・・ダイジェスト記録/再生部
 107・・・制御部
 108・・・ユーザ操作部
 109・・・表示部
 110・・・操作ボタン
DESCRIPTION OF SYMBOLS 100 ... Information processing apparatus 101 ... Image sensor 102 ... Learning part 103 ... Action selection part 104 ... Frame extraction processing part 105 ... Recording / reproducing part 106 ... Digest recording / reproducing Unit 107 ... Control unit 108 ... User operation unit 109 ... Display unit 110 ... Operation buttons

Claims (8)

  1.  入力情報の同じ領域を理由に複数の異なるクラスの判定がなされることを検出する検出部と、
     上記検出情報に基づいて、上記入力情報から上記同じ領域を含む情報部分を抜き出して保持する情報保持部を備える
     情報処理装置。
    A detection unit for detecting that a plurality of different classes are determined because of the same area of input information;
    An information processing apparatus comprising: an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.
  2.  上記複数の異なるクラスの判定は、複数の異なる行動の判定であり、
     上記検出部は、上記入力情報に基づいた複数の異なる行動の判定で着目した領域を比較することで、同じ領域を理由に複数の異なる行動の判定がなされることを検出する
     請求項1に記載の情報処理装置。
    The determination of the plurality of different classes is a determination of a plurality of different actions,
    The detection unit detects that a plurality of different behaviors are determined on the basis of the same region by comparing regions noted in the determination of a plurality of different behaviors based on the input information. Information processing device.
  3.  上記入力情報に基づいた複数の異なる行動の判定は、それぞれ、強化学習による学習で求められたポリシーに基づいて行われる
     請求項2に記載の情報処理装置。
    The information processing apparatus according to claim 2, wherein the determination of a plurality of different actions based on the input information is performed based on a policy obtained by learning by reinforcement learning.
  4.  上記入力情報に基づいた複数の異なる行動の判定は、自動運転に係る複数の異なる行動の判定である
     請求項2に記載の情報処理装置。
    The information processing apparatus according to claim 2, wherein the determination of a plurality of different actions based on the input information is a determination of a plurality of different actions related to automatic driving.
  5.  上記入力情報は動画像データであり、
     上記情報保持部は、上記動画像データから上記同じ領域を含む各フレームの画像データを抜き出して保持する
     請求項1に記載の情報処理装置。
    The above input information is video data,
    The information processing apparatus according to claim 1, wherein the information holding unit extracts and holds image data of each frame including the same region from the moving image data.
  6.  上記情報保持部に保持されている一連のフレームの画像データの再生を制御する再生制御部をさらに備える
     請求項5に記載の情報処理装置。
    The information processing apparatus according to claim 5, further comprising: a reproduction control unit that controls reproduction of image data of a series of frames held in the information holding unit.
  7.  入力情報の同じ領域を理由に複数の異なるクラスの判定がなされることを検出する手順と、
     上記検出情報に基づいて、上記入力情報から上記同じ領域を含む情報部分を抜き出して保持する手順を有する
     情報処理方法。
    A procedure for detecting that a plurality of different classes are judged because of the same area of input information;
    An information processing method comprising a procedure of extracting and holding an information portion including the same region from the input information based on the detection information.
  8.  コンピュータを、
     入力情報の同じ領域を理由に複数の異なるクラスの判定がなされることを検出する検出手段と、
     上記検出情報に基づいて、上記入力情報から上記同じ領域を含む情報部分を抜き出して保持する情報保持手段として機能させる
     プログラム。
    Computer
    Detecting means for detecting that a plurality of different classes are determined because of the same area of input information;
    A program that functions as an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.
PCT/JP2019/017879 2018-05-10 2019-04-26 Information processing device, information processing method, and program WO2019216263A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018091573 2018-05-10
JP2018-091573 2018-05-10

Publications (1)

Publication Number Publication Date
WO2019216263A1 true WO2019216263A1 (en) 2019-11-14

Family

ID=68468197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/017879 WO2019216263A1 (en) 2018-05-10 2019-04-26 Information processing device, information processing method, and program

Country Status (1)

Country Link
WO (1) WO2019216263A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005176339A (en) * 2003-11-20 2005-06-30 Nippon Telegr & Teleph Corp <Ntt> Moving image processing method, moving image processing apparatus, moving image processing program and recording medium with the program recorded thereon
JP2014142871A (en) * 2013-01-25 2014-08-07 Dainippon Screen Mfg Co Ltd Instructor data creation support device, instructor data creation device, image classification device, instructor data creation support method, instructor data creation method, and image classification method
JP2015232847A (en) * 2014-06-10 2015-12-24 株式会社東芝 Detector, correction system, detection method and program
JP2017151813A (en) * 2016-02-25 2017-08-31 ファナック株式会社 Image processing device for displaying object detected from input image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005176339A (en) * 2003-11-20 2005-06-30 Nippon Telegr & Teleph Corp <Ntt> Moving image processing method, moving image processing apparatus, moving image processing program and recording medium with the program recorded thereon
JP2014142871A (en) * 2013-01-25 2014-08-07 Dainippon Screen Mfg Co Ltd Instructor data creation support device, instructor data creation device, image classification device, instructor data creation support method, instructor data creation method, and image classification method
JP2015232847A (en) * 2014-06-10 2015-12-24 株式会社東芝 Detector, correction system, detection method and program
JP2017151813A (en) * 2016-02-25 2017-08-31 ファナック株式会社 Image processing device for displaying object detected from input image

Similar Documents

Publication Publication Date Title
US10176079B2 (en) Identification of elements of currently-executing component script
JP6544991B2 (en) Tactile Design Authoring Tool
US20110135152A1 (en) Information processing apparatus, information processing method, and program
CN101655867B (en) Information processing apparatus, information processing method
US11024338B2 (en) Device, method, and non-transitory computer readable medium for processing motion image
US20110153328A1 (en) Obscene content analysis apparatus and method based on audio data analysis
JP4065516B2 (en) Information processing apparatus and information processing method
JP4769299B2 (en) Operation confirmation information providing device and electronic apparatus equipped with such device
US11431887B2 (en) Information processing device and method for detection of a sound image object
EP3024248A1 (en) Broadcasting receiving apparatus and control method thereof
JP2019148681A (en) Text correction device, text correction method and text correction program
US20230353814A1 (en) Testing rendering of screen objects
US11762451B2 (en) Methods and apparatus to add common sense reasoning to artificial intelligence in the context of human machine interfaces
US8306934B2 (en) Demo verification provisioning
US10896252B2 (en) Composite challenge task generation and deployment
WO2019216263A1 (en) Information processing device, information processing method, and program
JP2019053527A (en) Assembly work analysis device, assembly work analysis method, computer program, and storage medium
JP5396759B2 (en) Information processing apparatus, image processing apparatus, software operation test method, software operation test program, and recording medium recording the program
US11921816B2 (en) Information processing apparatus that specifies a subject and method, image capturing apparatus, and image capturing system
JP2017049537A (en) Maneuvering device, correcting method, and program
US20240062546A1 (en) Information processing device, information processing method, and recording medium
JP2005149329A (en) Intended extraction support apparatus, operability evaluation system using the same, and program for use in them
WO2024030330A1 (en) Video content processing using selected machinelearning models
KR20210041817A (en) Method and system for determining mosaic area of video contents
KR20220130336A (en) Electronic device for analyzing student response data for internet lecture, and operating method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19800388

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19800388

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP