WO2019216263A1

WO2019216263A1 - Information processing device, information processing method, and program

Info

Publication number: WO2019216263A1
Application number: PCT/JP2019/017879
Authority: WO
Inventors: 亮中橋
Original assignee: ソニー株式会社
Priority date: 2018-05-10
Filing date: 2019-04-26
Publication date: 2019-11-14

Abstract

The present invention increases the efficiency of confirming whether learning is performed correctly. It is detected that a determination is made pertaining to a plurality of different classes because of the same region of input information. Information portions including the same region are extracted from the input information and retained on the basis of detection information. The determination pertaining to classes includes various determinations, such as action determination and classification determination. Because a point that is very likely to be corrected can be selectively confirmed by the retained information portions, it is possible to efficiently confirm whether learning is correctly performed.

Description

Information processing apparatus, information processing method, and program

The present technology relates to an information processing device, an information processing method, and a program, and more particularly, to an information processing device, an information processing method, and a program for confirming whether learning is performed correctly.

Conventionally, reinforcement learning and behavior learning techniques by teaching from people are known. For example, Patent Literature 1 proposes a learning system using reinforcement learning. In the behavior learning technique, in order to confirm whether or not the learned behavior is moving correctly, it is necessary for a person to confirm the movement, but it takes a very long time to confirm all the behaviors.

JP 2010-073200 A

The purpose of this technology is to improve the efficiency of checking whether learning is performed correctly.

The concept of this technology is
A detection unit for detecting that a plurality of different classes are determined because of the same area of input information;
The information processing apparatus includes an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.

In the present technology, the detection unit detects that a plurality of different classes are determined because of the same area of input information. Then, the information holding unit extracts and holds the information portion including the same region from the input information based on the detection information. Here, the determination of the class includes various determinations such as determination of behavior and determination of classification.

For example, the determination of a plurality of different classes is a determination of a plurality of different behaviors, and the detection unit compares a plurality of regions focused on by the determination of a plurality of different behaviors based on the input information, so that a plurality of different classes can be used for the same region. It may be configured to detect that different behaviors are determined. In this case, for example, the determination of a plurality of different actions based on the input information may be performed based on a policy obtained by learning by reinforcement learning. In this case, for example, the determination of the plurality of different actions based on the input information may be the determination of the plurality of different actions related to the automatic driving.

Further, for example, the input information may be moving image data, and the information holding unit may extract and hold image data of each frame including the same region from the moving image data. Further, for example, a reproduction control unit that controls reproduction of a series of frame image data held in the information holding unit may be further provided.

As described above, in the present technology, based on the detection information indicating that a plurality of different classes are determined because of the same area of the input information, the information portion including the same area is extracted from the input information and held. It is. Therefore, since it is possible to focus on the points with high possibility of correction based on the retained information part, it is possible to efficiently check whether learning is performed correctly.

本 According to the present technology, it is possible to improve the efficiency of checking whether learning is being performed correctly. Note that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

It is a block diagram which shows the structural example of the information processing apparatus as embodiment. It is a figure for demonstrating the process of a frame extraction part. It is a flowchart which shows an example of the process sequence of a frame extraction process part. It is a flowchart which shows an example of the process sequence of extraction frame determination. It is a figure which shows an example of the relationship between the moving image data recorded on a recording / reproducing part, and the digest image data memorize | stored in a digest recording / reproducing part. It is a figure which shows the example of a display of the digest image in a display part. It is a figure which shows use cases other than action. It is a figure which shows the structural example of a personal computer.

Hereinafter, modes for carrying out the invention (hereinafter referred to as “embodiments”) will be described. The description will be given in the following order.
1. Embodiment 2. FIG. Modified example

<1. Embodiment>
[Information processing device]
FIG. 1 shows a configuration example of an information processing apparatus 100 as an embodiment. This information processing apparatus 100 is used for action determination in an automatic driving system. In this embodiment, in order to simplify the description, it is assumed that there are two actions to be determined as appropriate, “straight ahead” and “right turn”.

The information processing apparatus 100 includes an image sensor 101, a learning unit 102, an action selection unit 103, a frame extraction processing unit 104, a recording / reproducing unit 105, a digest recording / reproducing unit 106, a control unit 107, a user An operation unit 108 and a display unit 109 are provided.

The image sensor 101 constitutes, for example, a camera arranged at the front part of the vehicle and images the front of the vehicle. The moving image data obtained by the image sensor 101 is supplied to the learning unit 102, the action selecting unit 103, the recording / generating unit 105, and the digest recording / reproducing unit 106.

The learning unit 102 performs reinforcement learning based on the moving image data obtained by the image sensor 101, the behavior output obtained by the behavior selection unit 103, and the reward set by the user, and in what environment and in what environment Create a policy that indicates what should be done. The action selection unit 103 determines and outputs an action based on the moving image data obtained by the image sensor 101 and the policy obtained by the learning unit 102. In this case, when the policy created by the learning unit 102 is not complete, it is also assumed that the action selection unit 103 selects a plurality of contradictory actions at the same time.

The frame extraction processing unit 104 extracts frames that are ambiguous in action determination. The frame extraction processing unit 104 selects a frame having a large pixel range that redundantly contributes to a plurality of behavior determinations based on the contribution degree of each pixel to each behavior determination output from the behavior selection unit 103 for each frame. The frame is extracted as an ambiguous action decision.

Here, for example, the action selection unit 103 determines the action of “straight ahead” or “right turn”. In this case, the action selection unit 103 outputs the contribution degree of each pixel to the action determination of “straight ahead” and “right turn”. If the contribution degree is greater than or equal to the threshold T for each of the “straight-ahead” and “right turn” action determinations, the frame extraction unit 104 is assumed to contribute to the action determination. Then, the frame extracting unit 104 extracts a frame having a large pixel range that contributes redundantly to the “straight-ahead” and “right turn” action determinations as a frame having an ambiguous action determination.

FIG. 2 shows an example of an image in each frame of t1, t2, t3, and t4. In the frame at t1, the action determination of “straight forward” is made, and the range of the ellipse P1 indicates the range of pixels contributing to the action determination. In the frame at t1, since the action determination of “right turn” is not made, it is not extracted as a frame in which the action determination is ambiguous.

In the frame of t2, “straight forward” action determination is made, and the range of the ellipse P2 indicates the range of pixels contributing to the action determination. Further, in the frame at t2, the action determination of “right turn” is performed, and the range of the ellipse Q2 indicates the range of pixels contributing to the action determination. In the frame at t2, “straight forward” and “right turn” action determinations are made, and the pixel ranges contributing to the respective determinations are greatly overlapped. Extracted.

In the frames of t3 and t4, the action determination of “straight forward” is made, and the range of ellipses P3 and P4 indicates the range of pixels contributing to the action determination. In the frames at t3 and t4, since the action determination of “right turn” is not made, it is not extracted as a frame in which the action determination is ambiguous.

The flowchart of FIG. 3 shows an example of the processing procedure of the frame extraction processing unit 104. In this case, the action selection unit 103 determines the action of “straight ahead” or “right turn”. In step ST1, the frame extraction processing unit 104 starts processing. Next, the frame extraction processing unit 104 targets the first frame in step ST2. Next, in step ST3, it is determined whether or not to make an extraction frame.

Next, in step ST4, it is determined whether or not it is the last frame. If it is not the last frame, the frame extraction processing unit 104 returns to the process of step ST3 for the next frame in step ST5. On the other hand, when it is the last frame in step ST4, the frame extraction processing unit 104 ends the process in step ST6.

The flowchart of FIG. 4 shows an example of the processing procedure of step ST3 in the flowchart of FIG. In step ST11, the frame extraction processing unit 104 starts processing. Then, the frame extraction processing unit 104 targets the first action determination in step ST12.

Next, in step ST13, the frame extraction processing unit 104 obtains a contribution determination matrix having the action determination contribution (0 to 1) of each pixel of the frame as an element. Next, in step ST14, the frame extraction processing unit 104 binarizes the contribution degree of pixels that are equal to or greater than the threshold T as “1” and the contribution degree of other pixels as “0”, and determines the high contribution degree. Get the matrix. Here, the threshold T is a high contribution threshold and takes a value between 0 and 1.

Next, in step ST15, the frame extraction processing unit 104 determines whether it is the last action determination. When it is not the last action determination, the frame extraction processing unit 104 returns to the process of step ST13 for the next action determination in step ST16. On the other hand, when it is the final action determination, the frame extraction processing unit 104 proceeds to the process of step ST17.

In step ST17, the frame extraction processing unit 104 adds the elements after multiplying the elements of the high contribution determination matrix of each action determination for each pixel, and divides the added value by the total number of pixels to thereby calculate the contribution overlap rate r. Get. The contribution overlapping rate r takes a value between 0 and 1. Next, the frame extraction processing unit 104 determines whether or not the contribution overlapping rate r is equal to or greater than the threshold value R in step ST18. Here, the threshold R is a high contribution overlap threshold and takes a value between 0 and 1.

When the contribution overlap rate r is equal to or greater than the threshold value R, the frame extraction processing unit 104 sets the frame as an extraction frame in step ST19. Then, the frame extraction processing unit 104 ends the process in step ST19. On the other hand, when the contribution overlap rate r is not equal to or greater than the threshold value R, the frame extraction processing unit 104 immediately ends the process in step ST20.

Here, for example, as described above, consider a case where the action selection unit 103 performs action determination of “straight ahead” and “right turn”. In this case, a high contribution determination matrix M for “straight-ahead” action determination and a high contribution determination matrix N for “right turn” action determination are obtained by the processing from step ST12 to step ST16.

Then, by the processing of step ST17, the elements of the high contribution determination matrices M and N of “straight ahead” and “turn right” are multiplied for each pixel and added, and the added value is divided by the total number of pixels. A contribution overlapping rate r is obtained. Here, when each element in the “straight forward” high contribution determination matrix M is Mij and each element in the “right turn” high contribution determination matrix N is Nij, the contribution overlap ratio r is as follows. It is expressed by Equation (1).
r = (Σi, j Mij × Nij) / (i × j) (1)

As described above, the frame extraction algorithm of the frame extraction processing unit 104 shown in the flowchart of FIG. 4 is merely an example, and the present invention is not limited to this. For example, it is conceivable that the contribution determination matrix itself is added or multiplied without passing through the high contribution determination matrix, and it is determined that the sum exceeds the threshold value.

Returning to FIG. 1, the recording / reproducing unit 105 records and reproduces the moving image data obtained by the image sensor 101. The digest recording / reproducing unit 106 records the image data of the frame extracted by the frame extraction processing unit 104 as digest image data and reproduces it.

FIG. 5 shows an example of the relationship between the moving image data recorded by the recording / reproducing unit 105 and the digest image data recorded by the digest recording / reproducing unit 106. FIG. 5 (a) shows the entire moving image data recorded by the recording / reproducing unit 105, and the frames Fa to Fb, Fc to Fd, and Fe to Ff indicate frames that are ambiguous in action determination. Yes. FIG. 5B shows digest image data recorded by the digest recording / playback unit 106, and shows that image data of frames Fa to Fb, Fc to Fd, and Fe to Ff are stored. .

The control unit 107 controls operations of the recording / reproducing unit 105 and the digest recording / reproducing unit 106 based on a user operation from the user operation unit 108. The digest image data reproduced by the digest recording / reproducing unit 106 is supplied to the display unit 109, and the digest image is displayed on the display unit 109.

6A and 6B show display examples of digest images on the display unit 109. FIG. This display example shows an example in which the behavior selection unit 103 performs the behavior determination of “straight ahead” and “right turn”. In the display example of FIG. 6A, there is only one digest image display area in the display screen, and the digest image is displayed there. The digest image includes a pixel range (corresponding to the element “1” in the high contribution determination matrix M of the “straight ahead” action determination) described above) and “right turn”. A pixel range (corresponding to the element “1” in the high contribution determination matrix N of the above-described “right turn” action determination) is superimposed and displayed.

6B, there are three digest image display areas in the display screen, one on the left is large and two on the right are small. The same digest image is displayed in these three display areas. In the digest image displayed on the upper right side, the pixel range contributing to the action determination of “straight forward” is superimposed and displayed. In the digest image displayed on the lower right side, the pixel range contributing to the action determination of “right turn” is superimposed and displayed.

As shown in FIGS. 6 (a) and 6 (b), an operation button 110 for moving from the digest image reproduction to the original image reproduction is displayed in the display screen. When the operation button 110 is operated by the user, the recording / reproducing unit 105 is controlled by the control unit 107, and the frame range of the original image corresponding to the frame range of the currently reproduced digest image is reproduced.

For example, when the operation button 110 is operated while the digest image in the frame range of Fa to Fb is being reproduced, the original image is reproduced in any of the following (1) to (3).
(1) Image reproduction within the frame range of Fa to Fb of the original image (2) Image reproduction of the frame range of Fa−δ to Fb + δ of the original image, δ is fixed (3) Frame range of Fa−Δa to Fb + Δb of the original image Image reproduction, Δa and Δb are variable

For example, Δa indicates a frame period in which only a pixel contributing to one of the action determinations of “straight ahead” and “right turn” returns to the frame Fa, and Δb indicates one of “straight forward” and “right turn”. The frame period which progresses from the frame Fb to the frame where only the pixels contributing to the action determination of FIG.

It should be noted that the digest image data stored in the digest recording / reproducing unit 106 is not the image data of the frame of Fa to Fb but the image data of the frame range of Fa−δ to Fb + δ or the frame range of Fa−Δa to Fb + Δb. It is also possible to record image data.

The operation of the information processing apparatus 100 shown in FIG. 1 will be briefly described. The moving image data obtained by the image sensor 101 is supplied to the recording / reproducing unit 105 and recorded. The moving image data obtained by the image sensor 101 is supplied to the action selection unit 103. In the action selection unit 103, the action is determined and output based on the moving image data obtained by the image sensor 101 and the policy obtained by the learning unit 102.

In addition, the frame extraction processing unit 104 has a large pixel range that contributes redundantly to a plurality of behavior determinations based on the contribution degree of each pixel to each behavior determination output from the behavior selection unit 103 for each frame. A frame is extracted as a frame (extracted frame) that is ambiguous in action determination. The digest recording / reproducing unit 106 records the frame image data extracted by the frame extraction processing unit 104 from the moving image data obtained from the image sensor 101 as digest image data (see FIG. 5).

Based on a user operation from the user operation unit 108, the operation of the digest recording / reproducing unit 106 is controlled. The digest image data reproduced by the digest recording / reproducing unit 106 is supplied to the display unit 109, and a digest image, that is, an image of a frame with ambiguous behavior determination is displayed on the display unit 109. In the digest image, pixel ranges that contribute to a plurality of behavior determinations are superimposed and displayed.

In the above description, the image data of the frame extracted by the frame extraction processing unit 104 is recorded as digest image data by the digest recording / reproducing unit 106, and when the digest image is reproduced, the digest recording / reproducing unit 106 performs the digest image. An example of reproducing the image data is shown. However, only the frame information extracted by the frame extraction processing unit 104 is stored as the frame information of the digest image, and when the digest image is played back, the digest from the recording / playback unit 105 is performed based on the frame information of the digest image. An example of reproducing image data of an image is also conceivable.

As described above, in the information processing apparatus 100 shown in FIG. 1, a frame with ambiguous behavior determination is extracted, and the image data of the frame is recorded as digest image data. Therefore, since points with a high possibility of revising the reward setting can be confirmed from the digest image, it is possible to accelerate the speed of trial and error in behavior learning algorithm development.

<2. Modification>
In the above-described embodiment, the use case that acts according to the image information is shown. However, the use case that acts according to the audio information can be considered in the same manner. For example, it is assumed that the action selection unit determines an action of “advance toward a specific sound” or “avoid when noise of other moving objects comes”. In that case, when “specific sound” and “noise of other moving objects” come from the same direction, the audio information at that time becomes an ambiguous information part in action determination, and this information part is extracted and digested. It will be held as audio information.

In the above-described embodiment, a use case of behavior is shown, but a use case other than behavior can be considered as a use case to which the present technology can be applied. For example, there is a use case of classification. In this case, in a specific exclusive class classification, an area to which a plurality of classes are assigned is an ambiguous information part.

FIG. 7A shows an example of object recognition. In this example, the object “flying object” and the object “animal” are recognized exclusively, but “bird” is an object that can be taken by both “flight object” and “animal”. The “bird” part is cut out and retained as an ambiguous information part. FIG. 7B shows an example of text analysis. In this example, text is analyzed to recognize “positive expression” and “negative expression”, but “Yabai” is an expression that can be taken as both “positive expression” and “negative expression”. "Is cut out and retained as an ambiguous information part.

In the above-described embodiment, the processing of each unit in the information processing apparatus 100 can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software is installed in the computer. Here, the computer includes a computer incorporated in dedicated hardware, a general-purpose personal computer capable of executing various functions by installing various programs, and the like.

FIG. 8 shows a configuration example of the personal computer 700. A CPU (Central Processing Unit) 701 executes various processes according to a program stored in a ROM (Read Only Memory) 702 or a program loaded from a storage unit 713 to a RAM (Random Access Memory) 703. The RAM 703 also appropriately stores data necessary for the CPU 701 to execute various processes.

The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input / output interface 710 is also connected to the bus 704.

The input / output interface 710 includes an input unit 711 including a keyboard and a mouse, a display including a CRT (Cathode Ray Tube) and an LCD (Liquid Crystal Display), an output unit 712 including a speaker, and a hard disk. A communication unit 714 including a storage unit 713 and a modem is connected. The communication unit 714 performs communication processing via a network including the Internet.

A drive 715 is also connected to the input / output interface 710 as necessary, and a removable medium 721 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is appropriately attached, and a computer program read from them is loaded. It is installed in the storage unit 713 as necessary.

In addition, the preferred embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, but the technical scope of the present disclosure is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field of the present disclosure can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that it belongs to the technical scope of the present disclosure.

Moreover, this technique can also take the following structures.
(1) a detection unit that detects that a plurality of different classes are determined because of the same area of input information;
An information processing apparatus comprising: an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.
(2) The determination of the plurality of different classes is determination of a plurality of different actions,
The detection unit detects that a plurality of different behaviors are determined based on the same region by comparing the regions focused on by the determination of a plurality of different behaviors based on the input information. The information processing apparatus described.
(3) The information processing apparatus according to (2), wherein the determination of a plurality of different actions based on the input information is performed based on a policy obtained by learning by reinforcement learning.
(4) The information processing apparatus according to (2) or (3), wherein the determination of a plurality of different actions based on the input information is a determination of a plurality of different actions related to automatic driving.
(5) The input information is moving image data,
The information processing apparatus according to any one of (1) to (4), wherein the information holding unit extracts and holds image data of each frame including the same region from the moving image data.
(6) The information processing apparatus according to (5), further including a reproduction control unit that controls reproduction of image data of a series of frames held in the information holding unit.
(7) a procedure for detecting that a plurality of different classes are determined because of the same area of input information;
An information processing method comprising a procedure of extracting and holding an information portion including the same region from the input information based on the detection information.
(8)
Detection means for detecting that a plurality of different classes are determined because of the same area of input information;
A program that functions as information holding means for extracting and holding an information portion including the same area from the input information based on the detection information.

DESCRIPTION OF SYMBOLS 100 ... Information processing apparatus 101 ... Image sensor 102 ... Learning part 103 ... Action selection part 104 ... Frame extraction processing part 105 ... Recording / reproducing part 106 ... Digest recording / reproducing Unit 107 ... Control unit 108 ... User operation unit 109 ... Display unit 110 ... Operation buttons

Claims

A detection unit for detecting that a plurality of different classes are determined because of the same area of input information;
An information processing apparatus comprising: an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.
The determination of the plurality of different classes is a determination of a plurality of different actions,
The detection unit detects that a plurality of different behaviors are determined on the basis of the same region by comparing regions noted in the determination of a plurality of different behaviors based on the input information. Information processing device.
The information processing apparatus according to claim 2, wherein the determination of a plurality of different actions based on the input information is performed based on a policy obtained by learning by reinforcement learning.
The information processing apparatus according to claim 2, wherein the determination of a plurality of different actions based on the input information is a determination of a plurality of different actions related to automatic driving.
The above input information is video data,
The information processing apparatus according to claim 1, wherein the information holding unit extracts and holds image data of each frame including the same region from the moving image data.
The information processing apparatus according to claim 5, further comprising: a reproduction control unit that controls reproduction of image data of a series of frames held in the information holding unit.
A procedure for detecting that a plurality of different classes are judged because of the same area of input information;
An information processing method comprising a procedure of extracting and holding an information portion including the same region from the input information based on the detection information.
Computer
Detecting means for detecting that a plurality of different classes are determined because of the same area of input information;
A program that functions as an information holding unit that extracts and holds an information portion including the same region from the input information based on the detection information.