WO2022123654A1 - Information processing device and information processing method - Google Patents

Information processing device and information processing method Download PDF

Info

Publication number
WO2022123654A1
WO2022123654A1 PCT/JP2020/045684 JP2020045684W WO2022123654A1 WO 2022123654 A1 WO2022123654 A1 WO 2022123654A1 JP 2020045684 W JP2020045684 W JP 2020045684W WO 2022123654 A1 WO2022123654 A1 WO 2022123654A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
warning
condition
unit
information processing
Prior art date
Application number
PCT/JP2020/045684
Other languages
French (fr)
Japanese (ja)
Inventor
政人 土屋
良枝 今井
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2020/045684 priority Critical patent/WO2022123654A1/en
Priority to JP2021524283A priority patent/JPWO2022123654A1/ja
Publication of WO2022123654A1 publication Critical patent/WO2022123654A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • This disclosure relates to an information processing device and an information processing method.
  • Patent Document 1 discloses a technique for visualizing a judgment basis in a case where a disease in the body is judged from an endoscopic image.
  • the warning system using the object detection technology it is necessary to select the information when notifying the user of the recognition result. For example, if the judgment is unreliable at the time of detection, or if the detection is not a threat at that time, telling the user may rather hinder the user's operation or concentration.
  • visualization techniques so far have been applied to single tasks such as image classification
  • conventional visualization techniques are used as they are for multitasking tasks such as object detection that combines image classification and position detection. Cannot be applied.
  • one or more aspects of the present disclosure are intended to enable visualization of the result of the judgment basis for object detection.
  • the information processing apparatus detects the object according to the feature map in the object detection unit that detects the object from the target image using the feature map and the arrangement of the same pixels as the target image. It is characterized by including a judgment basis visualization unit that generates a judgment basis visualization image, which is an image that enables identification of the importance of an image area that is a determination basis.
  • an object is detected from a target image using a feature map, and a judgment basis is used when the object is detected according to the feature map in the same pixel arrangement as the target image. It is characterized in that a judgment basis visualization image, which is an image capable of identifying the importance of the image area that has become, is generated.
  • FIG. 1 is a block diagram schematically showing a configuration of a warning system 100 as an information processing system (for example, a visualization system) according to an embodiment.
  • the warning system 100 includes a camera 101, a distance measuring sensor 102, a warning device 110 as an information processing device, a monitor 103, and a speaker 104.
  • the warning system 100 uses a warning device 110 mounted as an in-vehicle device to give a warning notification to a user such as a driver or other occupants together with visualization of the judgment basis.
  • this warning system 100 is not limited to being mounted on an automobile, and can be widely applied to mobile vehicles such as railroad vehicles and two-wheeled vehicles. Therefore, the description of the first embodiment does not narrow the scope of the right.
  • the warning device 110 is provided with a camera 101 as an image pickup device that acquires an image in the traveling direction, a distance measuring sensor 102 as a distance measuring device that measures the distance between an object in the traveling direction, and a warning content.
  • a monitor 103 as a display device that visually conveys the user to the user, and a speaker 104 as an acoustic device that conveys the warning content to the user by sound are connected.
  • the monitor 103 and the speaker 104 function as a warning output device that outputs a warning.
  • the camera 101 is installed facing the front of the vehicle and converts visible light into a digital signal, and can acquire an image of the front of the vehicle as an RGB value.
  • the camera 101 outputs an image output signal indicating the captured image to the warning device 110.
  • the camera 101 does not have to be a visible light camera, and may be an infrared camera, a depth camera, or the like.
  • the distance measuring sensor 102 measures the distance to the object in front of the camera 101 and transmits it to the warning device 110.
  • the distance measuring sensor 102 outputs a depth image output signal indicating a depth image representing the measured distance for each pixel to the warning device 110.
  • a conventional radar or a sensor such as LiDAR is assumed as the distance measuring sensor 102.
  • the monitor 103 is a device used to visually convey a warning from the warning device 110 to the user by an image.
  • an image obtained by superimposing a warning display on the image acquired from the camera 101 is used.
  • the speaker 104 is a device used to convey a warning from the warning device 110 to the user by voice.
  • FIG. 2 is a block diagram schematically showing the configuration of the warning device 110.
  • the warning device 110 includes an input unit 111, an image signal acquisition unit 112, a preprocessing unit 113, an object detection unit 114, a judgment basis visualization unit 115, a condition judgment unit 116, a warning execution unit 117, and an output unit. It is equipped with 118.
  • the input unit 111 receives the input of the image output signal from the camera 101 and the depth image output signal from the distance measuring sensor 102.
  • the input unit 111 gives the image output signal and the depth image output signal to the image signal acquisition unit 112.
  • the image signal acquisition unit 112 converts an image output signal, which is an analog signal from the camera 101, into a digitized image signal. Further, the image signal acquisition unit 112 converts the depth image output signal, which is an analog signal from the distance measuring sensor 102, into a digitized depth image signal. Then, the image signal acquisition unit 112 supplies the image signal and the depth image signal to the preprocessing unit 113.
  • the preprocessing unit 113 converts the image signal into a processed image signal in order to make the format convenient for the object detection unit 114 and the judgment basis visualization unit 115 for the convenience of calculation efficiency. Further, the preprocessing unit 113 also converts the depth image signal into the processed depth image signal.
  • the preprocessing includes various processes such as standardization and whitening, but here, the preprocessing is simply a resizing process for converting the resolution of a two-dimensional image signal into a square. Then, the preprocessing unit 113 gives the processed image signal and the processed depth image signal to the object detection unit 114, the judgment basis visualization unit 115, and the warning execution unit 117.
  • the image signal acquisition unit 112 and the preprocessing unit 113 configure an image acquisition unit that acquires an object image for detecting an object.
  • the target image is an image represented by the processed image signal.
  • the object detection unit 114 detects an object from the target image using the feature map. For example, the object detection unit 114 detects the object position based on the processed image and the processed depth image, which are the images indicated by the processed image signal and the processed depth image signal received from the preprocessing unit 113, and the object detection unit 114 thereof. Generates object detection result information, which is information indicating the object detection result that is the detection result. The object detection result information is given to the judgment basis visualization unit 115, the condition judgment unit 116, and the warning execution unit 117.
  • FIG. 3 is a schematic diagram showing a first example of object detection result information.
  • the object detection result information 120 shown in FIG. 3 is table-type information including a detection candidate ID row 120a, a detection position row 120b, an estimated distance row 120c, and a reliability row 120d.
  • the detection candidate ID column 120a stores the detection candidate ID which is the object candidate identification information as the identification information for identifying the detected object.
  • the detection position sequence 120b stores the detection position, which is the position where the object candidate is detected.
  • the detection position indicates the position of the detected object in the image space.
  • the position when the position information of the object detection model used by the object detection unit 114 is formulated by the Dirac Delta distribution is shown.
  • the estimated distance sequence 120c stores the distance of the detected object.
  • the reliability column 120d stores the reliability indicating the probability that the detected object is the target object.
  • FIG. 4 is a schematic diagram showing a second example of the object detection result information. Similar to the object detection result information 120 shown in FIG. 3, the object detection result information 121 shown in FIG. 4 has a detection candidate ID row 121a, a detection position row 121b, an estimated distance row 121c, and reliability. It is information in the form of a table including the sex column 121d. In the example shown in FIG. 4, the detection position row 121b shows the position when the position information of the object detection model used by the object detection unit 114 is formulated with a normal distribution.
  • Feature map is a general term that refers to a multidimensional tensor obtained as a result of performing a spatial convolution operation on an image signal.
  • the object detection unit 114 generates feature map information indicating this feature map, and gives the feature map information to the judgment basis visualization unit 115.
  • the judgment basis visualization unit 115 is an image in which the importance of the image region that is the judgment basis when detecting an object can be identified according to the feature map in the same pixel arrangement as the target image. Generate a certain judgment basis visualization image. For example, the judgment basis visualization unit 115 is based on the processed image signal and the processed depth image signal received from the preprocessing unit 113, the object detection result information and the feature map information received from the object detection unit 114, and the judgment basis visualization image. To generate.
  • the judgment basis visualization image is a shaded image having the same resolution as the processed image signal after resizing and the processed depth image signal, and is an image area that is an important judgment basis for the object detection unit 114 to perform object detection. It is an image in which a high value is set.
  • the judgment basis visualization image is converted from a low value to a color map to which colors such as blue, green, yellow, and red are assigned, and is often displayed superimposed on the original image by using alpha blending or the like.
  • the judgment basis visualization image is created based on the information of the detection position included in the object detection result, and the distribution of the detection position is used as it is for visualization.
  • the detection position of the object detection model used by the object detection unit 114 is formulated by the Dirac Delta distribution, as an example of simple visualization, there is a calculation method as shown in the following equation (1).
  • k is the detection candidate ID
  • x is the position of the object identified by k in the image space
  • conf is the reliability value of the object identified by k
  • Z is the standardized constant. It is a constant for setting the maximum value of E at each position x in all object classes at the time of object detection to 1.
  • c is a channel in the feature map.
  • the circle symbol in the equation (2) indicates the Hadamard product, and the outer double line indicates the L2 norm.
  • the judgment basis visualization unit 115 gives the judgment basis visualization signal indicating the judgment basis visualization image to the warning execution unit 117.
  • the condition determination unit 116 determines the warning setting level based on the object detection result information.
  • the reliability and estimated distance included in the object detection result are used to determine the warning setting level.
  • FIG. 5 is a table for determining the warning setting level L. As shown in FIG. 5, the warning setting level L is determined according to the combination with the reliability and the estimated distance. Since the warning setting level L is used to determine the operation of the warning execution unit 117, the warning setting level information indicating the warning setting level L is given to the warning execution unit 117.
  • the warning setting level 1 is a level when the first condition that no object is detected from the target image is satisfied.
  • the warning setting level 1 is a level when the reliability is less than 50% and the second condition that the distance is 10 m or more is satisfied.
  • the warning setting level 2 is a level when the third condition that the reliability is less than 50% and the distance is less than 10 m, or the reliability is 50% or more and the distance is 10 m or more is satisfied.
  • the warning setting level 3 is a level when the reliability is 50% or more and the fourth condition that the distance is less than 10 m is satisfied.
  • the second condition is a condition in which a warning for the detected object is more necessary than the first condition
  • the third condition is a warning for the detected object rather than the second condition.
  • the fourth condition is a condition in which the need for warning to the detected object is higher than the third condition.
  • the warning execution unit 117 is an output execution unit that determines the content of the warning to the user based on the warning setting level indicated by the warning setting level information received from the condition determination unit 116 and executes the output.
  • FIG. 6 is a table for determining the warning content from the warning setting level L. As shown in FIG. 6, the warning execution unit 117 determines the warning content according to the warning setting level L. Then, the warning execution unit 117 generates warning information according to the determined warning content, and gives a warning signal indicating the warning information to the output unit 118.
  • the warning execution unit 117 does not generate warning information. In this case, the warning execution unit 117 outputs a processed image signal indicating the target image to the output unit 118.
  • the warning execution unit 117 If the warning setting level L is 1, the warning execution unit 117 generates an output image using the judgment basis visualization image as a warning image, and generates warning information indicating the warning image. Specifically, the warning execution unit 117 generates an output image by alpha blending the judgment basis visualization image received from the judgment basis visualization unit 115 into the processed image indicated by the processed image signal.
  • the warning execution unit 117 If the warning setting level L is 2, the warning execution unit 117 generates a processed image in which the object detected in the target image is surrounded by a rectangle as a warning image, and generates warning information indicating the warning image. Specifically, the warning execution unit 117 sets the detection position included in the object detection result received from the object detection unit 114 as a rectangle and superimposes the warning image on the processed image indicated by the processed image signal. Generate and generate warning information indicating the warning image.
  • the warning execution unit 117 If the warning setting level L is 3, the warning execution unit 117 generates the above-mentioned processed image as a warning image, and also generates a warning sound which is a voice for warning, and the warning image and the warning sound indicating the warning sound are generated. Generate information. Specifically, the warning execution unit 117 superimposes the detection position included in the object detection result received from the object detection unit 114 on the processed image indicated by the processed image signal as a rectangle to display the warning image. At the same time as generating, a warning sound by voice is generated.
  • the output unit 118 outputs a warning information given by the warning execution unit 117 to convey a warning to the user through at least one of the monitor 103 and the speaker 104.
  • FIG. 7 is a hardware configuration diagram of the warning device 110. As shown in FIG. 7, the warning device 110 can be configured by a computer 130.
  • the memory 131 stores a program that functions as an image signal acquisition unit 112, a preprocessing unit 113, a judgment basis visualization unit 115, an object detection unit 114, a condition judgment unit 116, and a warning execution unit 117, and data used by these.
  • the memory 131 may be, for example, a RAM (Random Access Memory), a ROM (Read Only Memory), a flash memory, an EPROM (Erasable Programmable Read Only Memory), an EPROM (Electrically Memory Memory, etc.)
  • An optical disk, a photomagnetic disk, or the like is used.
  • the processor 132 reads out a program that functions as an image signal acquisition unit 112, a preprocessing unit 113, a judgment basis visualization unit 115, an object detection unit 114, a condition judgment unit 116, and a warning execution unit 117, and executes processing.
  • the processor 132 uses, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a microprocessor, a microcontroller, a DSP (Digital Signal Processor), or the like.
  • the acoustic interface 133 is used when the warning execution unit 117 outputs a warning by sound from the speaker 104 when instructing the user to give a warning.
  • the acoustic interface functions as an output unit 118.
  • the image interface 134 is for transmitting the analog signal obtained from the camera 101 and the distance measuring sensor 102 to the image signal acquisition unit 112. In this case, the image interface 134 functions as an input unit 111.
  • the image interface 134 is also used when the warning execution unit 117 outputs a final screen output signal to the monitor 103 when instructing the user to give a warning. In this case, the image interface 134 functions as an output unit 118.
  • the network interface 135 is used when the image signal acquisition unit 112 receives an image input signal from the external network environment.
  • the network interface 135 is not required unless it is configured to communicate with the external network environment. In this case, the network interface 135 functions as the input unit 111.
  • the memory 131 is arranged inside the warning device 110 in FIG. 2, it may be configured as an external memory such as a USB memory and may be configured to read a program and data. Further, as the memory 131, the memory in the device and the external memory may be used together.
  • FIG. 8 is a flowchart showing a process performed by the warning device 110.
  • the input unit 111 receives the input of the image output signal from the camera 101 and the depth image output signal from the distance measuring sensor 102 (S10).
  • the input image output signal and depth image output signal are given to the image signal acquisition unit 112, converted from an analog signal to a digital signal by the image signal acquisition unit 112, and used as an image signal and a depth image signal in the preprocessing unit.
  • the preprocessing unit 113 performs preprocessing on the image signal and the depth image signal to convert them into a processed image signal and a processed depth image signal, respectively (S11).
  • the processed image signal and the processed depth image signal are given to the object detection unit 114, the judgment basis visualization unit 115, and the warning execution unit 117.
  • the object detection unit 114 performs object detection from the processed image signal and the processed depth image signal, and determines the object detection result information indicating the object detection result which is the detection result. And the warning execution unit 117, and further, the feature map information indicating the feature map obtained in the process of object detection is given to the judgment basis visualization unit 115 (S12).
  • condition determination unit 116 determines the warning setting level according to the combination of the estimated distance and the reliability included in the object detection result information, and provides the warning setting level information indicating the determined warning setting level. It is given to the warning execution unit 117 (S13).
  • the warning execution unit 117 confirms the warning setting level indicated by the given warning setting level information (S14), ends the processing if the warning setting level is 0, and processes if the warning setting level is weak. To step S15, and if the warning setting level is medium to strong warning, the process proceeds to step S16.
  • step S15 the warning execution unit 117 generates a warning image by alpha blending the judgment basis visualization image received from the judgment basis visualization unit 115 into the processed image indicated by the processed image signal, and shows the warning image. Generate warning information. The generated warning information is given to the output unit 118. Then, the process proceeds to step S17.
  • step S16 if the warning setting level L is 2, the warning execution unit 117 processes the detected position included in the object detection result received from the object detection unit 114 as a rectangular shape and is indicated by the processed image signal. By superimposing on the finished image, a warning image is generated, and warning information indicating the warning image is generated.
  • the warning setting level L 3
  • a warning image is generated by superimposing the detection position included in the object detection result received from the object detection unit 114 on the processed image indicated by the processed image signal as a rectangle.
  • a warning sound by voice is generated, and a warning image and warning information indicating the warning sound are generated. Then, the process proceeds to step S17.
  • step S17 the output unit 118 issues a warning to the user using at least one of voice and image based on the warning information generated by the warning execution unit 117.
  • FIGS. 9 (A) and 9 (B) are schematic views showing an example of a screen image output to the monitor 103. Both the screen image 140 and the screen image 150 assume a situation in which vehicles 141 and 151 cross to the right from the corner of the road ahead.
  • the screen image 140 is a warning screen image generated by the warning execution unit 117 when it is determined that the warning setting level is medium to strong.
  • the detection position included in the object detection result is used, and a warning by the rectangle 142 is displayed at that position.
  • the screen image 150 is a warning screen image generated by the warning execution unit 117 when it is determined that the warning setting level is equivalent to a weak warning.
  • the attention area 152 in which the information indicated by the judgment basis visualization signal is superimposed on the original image is displayed as the warning content.
  • the warning execution unit 117 since the image is resized in the preprocessing unit 113, the warning execution unit 117 generates a warning image using the processed image processed by the preprocessing unit 113.
  • the present embodiment is not limited to such an example.
  • the warning execution unit 117 can generate a warning image using the image represented by the digitized image signal by the image signal acquisition unit 112. ..
  • warning system 101 camera, 102 distance measuring sensor, 103 monitor, 104 speaker, 110 warning device, 111 input unit, 112 image signal acquisition unit, 113 preprocessing unit, 114 object detection unit, 115 judgment basis visualization unit, 116 conditions Judgment unit, 117 warning execution unit, 118 output unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

A warning device (110) equipped with an object detection unit (114) for detecting an object by using a feature map from a target image, and a determination basis visualization unit (115) for generating a determination basis visualization image, which is an image which makes it possible to identify the level of importance in an image region which is the basis for determination when detecting an object according to a feature map in a pixel arrangement identical to that of the target image.

Description

情報処理装置及び情報処理方法Information processing equipment and information processing method
 本開示は、情報処理装置及び情報処理方法に関する。 This disclosure relates to an information processing device and an information processing method.
 機械学習技術が適用されるケースが広がり、一般ユーザーの身近なところで使われるようになってきた。このような状況において、学習されたモデルによる推論結果について観測信号のどの部分が結果に寄与しているのかを明らかにすることは、ユーザーに納得感を与える点で重要である。特に近年の深層学習におけるEND-TO-ENDな学習方法は、観測信号の物理的現象としての生成プロセスを明確に捉えることなく、ブラックボックス化してしまう。このため、説明性を高めるために、判断根拠の可視化を行う技術への注目度は高い。 The number of cases where machine learning technology is applied has expanded, and it has come to be used in places familiar to general users. In such a situation, it is important to clarify which part of the observed signal contributes to the inference result by the trained model in order to give a convincing feeling to the user. In particular, the END-TO-END learning method in deep learning in recent years becomes a black box without clearly grasping the generation process as a physical phenomenon of the observed signal. For this reason, in order to enhance the explanation, the degree of attention to the technique of visualizing the judgment basis is high.
 従来の技術としては、例えば、特許文献1では、内視鏡画像から体内の疾患を判断するケースにおいて、判断根拠を可視化する技術が開示されている。 As a conventional technique, for example, Patent Document 1 discloses a technique for visualizing a judgment basis in a case where a disease in the body is judged from an endoscopic image.
特開2020-89712号公報Japanese Unexamined Patent Publication No. 2020-89712
 従来の技術のように、説明性を高めるために、判断根拠を可視化する技術は、画像分類において使用される場合が多く、物体検出等のより複雑なタスクには適用されてこなかった。 The technique of visualizing the judgment basis in order to improve the explanation like the conventional technique is often used in image classification and has not been applied to more complicated tasks such as object detection.
 物体検出の技術を用いた警告システムでは認識結果をユーザーに通知する際に情報の取捨選択が必要である。例えば、検出時にその判断の信頼性が低かった場合、又は、検出してもその時点では脅威でない場合は、ユーザーに伝えることでかえってユーザーの操作又は集中を妨げることがある。 In the warning system using the object detection technology, it is necessary to select the information when notifying the user of the recognition result. For example, if the judgment is unreliable at the time of detection, or if the detection is not a threat at that time, telling the user may rather hinder the user's operation or concentration.
 また、これまでの可視化技術は画像分類のようなシングルタスクに対して適用されてきたが、画像分類と位置検出を組み合わせた物体検出のようなマルチタスクの課題に対してそのままでは従来の可視化手法を適用することができない。 In addition, although the visualization techniques so far have been applied to single tasks such as image classification, conventional visualization techniques are used as they are for multitasking tasks such as object detection that combines image classification and position detection. Cannot be applied.
 そこで、本開示の一又は複数の態様は、物体検出の判断根拠の結果を可視化できるようにすることを目的とする。 Therefore, one or more aspects of the present disclosure are intended to enable visualization of the result of the judgment basis for object detection.
 本開示の一態様に係る情報処理装置は、対象画像から特徴マップを用いて物体を検出する物体検出部と、前記対象画像と同一の画素の配置において、前記特徴マップに従って、前記物体を検出する際に判断根拠となった画像領域の重要度を識別できるようにした画像である判断根拠可視化画像を生成する判断根拠可視化部と、を備えることを特徴とする。 The information processing apparatus according to one aspect of the present disclosure detects the object according to the feature map in the object detection unit that detects the object from the target image using the feature map and the arrangement of the same pixels as the target image. It is characterized by including a judgment basis visualization unit that generates a judgment basis visualization image, which is an image that enables identification of the importance of an image area that is a determination basis.
 本開示の一態様に係る情報処理方法は、対象画像から特徴マップを用いて物体を検出し、前記対象画像と同一の画素の配置において、前記特徴マップに従って、前記物体を検出する際に判断根拠となった画像領域の重要度を識別できるようにした画像である判断根拠可視化画像を生成することを特徴とする。 In the information processing method according to one aspect of the present disclosure, an object is detected from a target image using a feature map, and a judgment basis is used when the object is detected according to the feature map in the same pixel arrangement as the target image. It is characterized in that a judgment basis visualization image, which is an image capable of identifying the importance of the image area that has become, is generated.
 本開示の一又は複数の態様によれば、物体検出の判断根拠の結果を可視化することができる。 According to one or more aspects of the present disclosure, it is possible to visualize the result of the judgment basis for object detection.
警告システムの構成を概略的に示すブロック図である。It is a block diagram which shows the structure of a warning system roughly. 警告装置の構成を概略的に示すブロック図である。It is a block diagram which shows the structure of the warning device schematicly. 物体検出結果情報の第一の例を示す概略図である。It is a schematic diagram which shows the 1st example of the object detection result information. 物体検出結果情報の第二の例を示す概略図である。It is a schematic diagram which shows the 2nd example of the object detection result information. 警告設定レベルを決定するための表である。This is a table for determining the warning setting level. 警告設定レベルから警告内容を決定するための表である。It is a table for determining the warning content from the warning setting level. 警告装置のハードウェア構成図である。It is a hardware block diagram of the warning device. 警告装置が行う処理を示すフローチャートである。It is a flowchart which shows the process performed by a warning device. (A)及び(B)は、モニターに出力される画面画像の例を示す概略図である。(A) and (B) are schematic views showing an example of a screen image output to a monitor.
 図1は、実施の形態に係る情報処理システム(例えば、可視化システム)としての警告システム100の構成を概略的に示すブロック図である。
 警告システム100は、カメラ101と、測距センサー102と、情報処理装置としての警告装置110と、モニター103と、スピーカー104とを備える。
FIG. 1 is a block diagram schematically showing a configuration of a warning system 100 as an information processing system (for example, a visualization system) according to an embodiment.
The warning system 100 includes a camera 101, a distance measuring sensor 102, a warning device 110 as an information processing device, a monitor 103, and a speaker 104.
 警告システム100は、車載機器として搭載された警告装置110によって、例えば、ドライバー又はその他の乗員等のユーザーに対して判断根拠の可視化を併用した警告通知を行う。
 但し、この警告システム100は、自動車への搭載のみに限定されるものではなく鉄道車両又は二輪車等、広く移動用車両に対して適用可能である。よって実施の形態1の記述はその権利範囲を狭めるようなものではない。
The warning system 100 uses a warning device 110 mounted as an in-vehicle device to give a warning notification to a user such as a driver or other occupants together with visualization of the judgment basis.
However, this warning system 100 is not limited to being mounted on an automobile, and can be widely applied to mobile vehicles such as railroad vehicles and two-wheeled vehicles. Therefore, the description of the first embodiment does not narrow the scope of the right.
 警告システム100においては、警告装置110に、進行方向の画像を取得する撮像装置としてのカメラ101と、進行方向の物体との距離を計測する測距装置としての測距センサー102と、警告内容を視覚的にユーザーに伝える表示装置としてのモニター103、警告内容を音でユーザーに伝える音響装置としてのスピーカー104とが接続されている。モニター103及びスピーカー104は、警告を出力する警告出力装置として機能する。 In the warning system 100, the warning device 110 is provided with a camera 101 as an image pickup device that acquires an image in the traveling direction, a distance measuring sensor 102 as a distance measuring device that measures the distance between an object in the traveling direction, and a warning content. A monitor 103 as a display device that visually conveys the user to the user, and a speaker 104 as an acoustic device that conveys the warning content to the user by sound are connected. The monitor 103 and the speaker 104 function as a warning output device that outputs a warning.
 カメラ101は、車両前方向きに据え付けられ、可視光をデジタル信号化するカメラであり、車両前方の画像をRGB値として取得可能である。カメラ101は、撮像された画像を示す画像出力信号を、警告装置110に出力する。このカメラ101は、可視光のカメラである必要はなく、赤外線カメラ又は深度カメラ等でもよい。 The camera 101 is installed facing the front of the vehicle and converts visible light into a digital signal, and can acquire an image of the front of the vehicle as an RGB value. The camera 101 outputs an image output signal indicating the captured image to the warning device 110. The camera 101 does not have to be a visible light camera, and may be an infrared camera, a depth camera, or the like.
 測距センサー102は、カメラ101の前方の物体との距離を計測し、警告装置110に伝えるものである。測距センサー102は、計測された距離を画素毎に表す深度画像を示す深度画像出力信号を、警告装置110に出力する。測距センサー102には、従来のレーダー又はLiDAR等のセンサーが想定される。 The distance measuring sensor 102 measures the distance to the object in front of the camera 101 and transmits it to the warning device 110. The distance measuring sensor 102 outputs a depth image output signal indicating a depth image representing the measured distance for each pixel to the warning device 110. As the distance measuring sensor 102, a conventional radar or a sensor such as LiDAR is assumed.
 モニター103は、警告装置110からの警告を画像によって視覚的にユーザーに伝える際に用いられる装置である。この表示される画像には、カメラ101から取得される映像に対して警告のための表示を重畳表示したものが用いられる。 The monitor 103 is a device used to visually convey a warning from the warning device 110 to the user by an image. As the displayed image, an image obtained by superimposing a warning display on the image acquired from the camera 101 is used.
 スピーカー104は、警告装置110からの警告を音声によってユーザーに伝える際に用いられる装置である。 The speaker 104 is a device used to convey a warning from the warning device 110 to the user by voice.
 図2は、警告装置110の構成を概略的に示すブロック図である。
 警告装置110は、入力部111と、画像信号取得部112と、前処理部113と、物体検出部114と、判断根拠可視化部115と、条件判断部116と、警告実行部117と、出力部118とを備える。
FIG. 2 is a block diagram schematically showing the configuration of the warning device 110.
The warning device 110 includes an input unit 111, an image signal acquisition unit 112, a preprocessing unit 113, an object detection unit 114, a judgment basis visualization unit 115, a condition judgment unit 116, a warning execution unit 117, and an output unit. It is equipped with 118.
 入力部111は、カメラ101からの画像出力信号、及び、測距センサー102からの深度画像出力信号の入力を受け付ける。入力部111は、その画像出力信号及び深度画像出力信号を画像信号取得部112に与える。 The input unit 111 receives the input of the image output signal from the camera 101 and the depth image output signal from the distance measuring sensor 102. The input unit 111 gives the image output signal and the depth image output signal to the image signal acquisition unit 112.
 画像信号取得部112は、カメラ101からのアナログ信号である画像出力信号を、デジタル化された画像信号に変換する。
 また、画像信号取得部112は、測距センサー102からのアナログ信号である深度画像出力信号を、デジタル化された深度画像信号に変換する。
 そして、画像信号取得部112は、画像信号及び深度画像信号を、前処理部113に与える。
The image signal acquisition unit 112 converts an image output signal, which is an analog signal from the camera 101, into a digitized image signal.
Further, the image signal acquisition unit 112 converts the depth image output signal, which is an analog signal from the distance measuring sensor 102, into a digitized depth image signal.
Then, the image signal acquisition unit 112 supplies the image signal and the depth image signal to the preprocessing unit 113.
 前処理部113は、物体検出部114及び判断根拠可視化部115にとって計算効率の便宜上、都合のいい形式にするために、画像信号を処理済画像信号へ変換する。
 また、前処理部113は、深度画像信号も処理済深度画像信号へ変換する。
 前処理には、標準化又は白色化等様々な処理があるが、ここでは、前処理は、単に二次元の画像信号の解像度を正方形へ変換するリサイズ処理とする。
 そして、前処理部113は、処理済画像信号及び処理済深度画像信号を、物体検出部114、判断根拠可視化部115及び警告実行部117に与える。
The preprocessing unit 113 converts the image signal into a processed image signal in order to make the format convenient for the object detection unit 114 and the judgment basis visualization unit 115 for the convenience of calculation efficiency.
Further, the preprocessing unit 113 also converts the depth image signal into the processed depth image signal.
The preprocessing includes various processes such as standardization and whitening, but here, the preprocessing is simply a resizing process for converting the resolution of a two-dimensional image signal into a square.
Then, the preprocessing unit 113 gives the processed image signal and the processed depth image signal to the object detection unit 114, the judgment basis visualization unit 115, and the warning execution unit 117.
 ここで、画像信号取得部112及び前処理部113により、物体を検出するための対象画像を取得する画像取得部が構成される。ここで、対象画像は、処理済画像信号で示される画像である。 Here, the image signal acquisition unit 112 and the preprocessing unit 113 configure an image acquisition unit that acquires an object image for detecting an object. Here, the target image is an image represented by the processed image signal.
 物体検出部114は、対象画像から特徴マップを用いて物体を検出する。
 例えば、物体検出部114は、前処理部113から受け取った処理済画像信号及び処理済深度画像信号で示される画像である処理済画像及び処理済深度画像を元に、物体位置を検出し、その検出した結果である物体検出結果を示す情報である物体検出結果情報を生成する。物体検出結果情報は、判断根拠可視化部115、条件判断部116及び警告実行部117に与えられる。
The object detection unit 114 detects an object from the target image using the feature map.
For example, the object detection unit 114 detects the object position based on the processed image and the processed depth image, which are the images indicated by the processed image signal and the processed depth image signal received from the preprocessing unit 113, and the object detection unit 114 thereof. Generates object detection result information, which is information indicating the object detection result that is the detection result. The object detection result information is given to the judgment basis visualization unit 115, the condition judgment unit 116, and the warning execution unit 117.
 図3は、物体検出結果情報の第一の例を示す概略図である。
 図3に示されている物体検出結果情報120は、検出候補ID列120aと、検出位置列120bと、推定距離列120cと、信頼性列120dとを備えるテーブル形式の情報である。
 検出候補ID列120aは、検出された物体を識別する識別情報としての物体候補識別情報である検出候補IDを格納する。
FIG. 3 is a schematic diagram showing a first example of object detection result information.
The object detection result information 120 shown in FIG. 3 is table-type information including a detection candidate ID row 120a, a detection position row 120b, an estimated distance row 120c, and a reliability row 120d.
The detection candidate ID column 120a stores the detection candidate ID which is the object candidate identification information as the identification information for identifying the detected object.
 検出位置列120bは、物体の候補が検出された位置である検出位置を格納する。
 検出位置は、検出された物体の画像空間内での位置を示している。ここでは、物体検出部114で使用する物体検出モデルの位置情報がDiracDelta分布にて定式化されている場合における位置が示されている。
The detection position sequence 120b stores the detection position, which is the position where the object candidate is detected.
The detection position indicates the position of the detected object in the image space. Here, the position when the position information of the object detection model used by the object detection unit 114 is formulated by the Dirac Delta distribution is shown.
 推定距離列120cは、検出された物体の距離を格納する。
 信頼性列120dは、検出された物体が目的の物体である確率を示す信頼性を格納する。
The estimated distance sequence 120c stores the distance of the detected object.
The reliability column 120d stores the reliability indicating the probability that the detected object is the target object.
 図4は、物体検出結果情報の第二の例を示す概略図である。
 図4に示されている物体検出結果情報121は、図3に示されている物体検出結果情報120と同様に、検出候補ID列121aと、検出位置列121bと、推定距離列121cと、信頼性列121dとを備えるテーブル形式の情報である。
 図4に示されている例では、検出位置列121bに、物体検出部114で使用する物体検出モデルの位置情報が正規分布で定式化されている場合における位置が示されている。
FIG. 4 is a schematic diagram showing a second example of the object detection result information.
Similar to the object detection result information 120 shown in FIG. 3, the object detection result information 121 shown in FIG. 4 has a detection candidate ID row 121a, a detection position row 121b, an estimated distance row 121c, and reliability. It is information in the form of a table including the sex column 121d.
In the example shown in FIG. 4, the detection position row 121b shows the position when the position information of the object detection model used by the object detection unit 114 is formulated with a normal distribution.
 また、物体検出部114が、物体検出モデルに深層学習モデルを用いていた場合、物体検出を行う計算過程で特徴マップが算出される。特徴マップは、画像信号に対し空間的な畳み込み演算を行った結果として得られる多次元テンソルを指す一般的な用語である。物体検出部114は、この特徴マップを示す特徴マップ情報を生成し、その特徴マップ情報を判断根拠可視化部115に与える。 Further, when the object detection unit 114 uses the deep learning model as the object detection model, the feature map is calculated in the calculation process of performing the object detection. Feature map is a general term that refers to a multidimensional tensor obtained as a result of performing a spatial convolution operation on an image signal. The object detection unit 114 generates feature map information indicating this feature map, and gives the feature map information to the judgment basis visualization unit 115.
 図2に戻り、判断根拠可視化部115は、対象画像と同一の画素の配置において、特徴マップに従って、物体を検出する際に判断根拠となった画像領域の重要度を識別できるようにした画像である判断根拠可視化画像を生成する。
 例えば、判断根拠可視化部115は、前処理部113から受け取った処理済画像信号及び処理済深度画像信号と、物体検出部114から受け取った物体検出結果情報及び特徴マップ情報と、から判断根拠可視化画像を生成する。
Returning to FIG. 2, the judgment basis visualization unit 115 is an image in which the importance of the image region that is the judgment basis when detecting an object can be identified according to the feature map in the same pixel arrangement as the target image. Generate a certain judgment basis visualization image.
For example, the judgment basis visualization unit 115 is based on the processed image signal and the processed depth image signal received from the preprocessing unit 113, the object detection result information and the feature map information received from the object detection unit 114, and the judgment basis visualization image. To generate.
 判断根拠可視化画像は、リサイズ後の処理済画像信号及び処理済深度画像信号と同一解像度の濃淡画像であり、物体検出部114が物体検出を行うにあたって重要な判断根拠となった画像領域であるほど高い値が設定されるような画像である。 The judgment basis visualization image is a shaded image having the same resolution as the processed image signal after resizing and the processed depth image signal, and is an image area that is an important judgment basis for the object detection unit 114 to perform object detection. It is an image in which a high value is set.
 通例、判断根拠可視化画像は、低い値から青、緑、黄、赤等の色を割り当てたカラーマップに変換され、アルファブレンディング等を用いて原画像に重ねて表示されることが多い。 Usually, the judgment basis visualization image is converted from a low value to a color map to which colors such as blue, green, yellow, and red are assigned, and is often displayed superimposed on the original image by using alpha blending or the like.
 判断根拠可視化画像は、物体検出結果に含まれている検出位置の情報を基に作成され、検出位置の分布がそのまま可視化に用いられる。具体的には、物体検出部114で使用する物体検出モデルの検出位置がDiracDelta分布で定式化されている場合、単純な可視化の例としては以下の(1)式のような算出方法がある。
Figure JPOXMLDOC01-appb-M000001
The judgment basis visualization image is created based on the information of the detection position included in the object detection result, and the distribution of the detection position is used as it is for visualization. Specifically, when the detection position of the object detection model used by the object detection unit 114 is formulated by the Dirac Delta distribution, as an example of simple visualization, there is a calculation method as shown in the following equation (1).
Figure JPOXMLDOC01-appb-M000001
 (1)式において、kは、検出候補ID、xは、kで識別される物体の画像空間上の位置、confは、kで識別される物体の信頼性の値、Zは、規格化定数であり、物体検出時の全物体クラスにおけるEの各位置xでの最大値を1にするための定数である。 In the equation (1), k is the detection candidate ID, x is the position of the object identified by k in the image space, conf is the reliability value of the object identified by k, and Z is the standardized constant. It is a constant for setting the maximum value of E at each position x in all object classes at the time of object detection to 1.
 一方で物体検出モデルの位置情報が正規分布で定式化されている場合、又は、物体検出部114が物体検出結果を算出する計算過程で算出する特徴マップFを併用する場合は、以下の(2)式のような算出方法がある。
Figure JPOXMLDOC01-appb-M000002
On the other hand, when the position information of the object detection model is formulated with a normal distribution, or when the feature map F calculated by the object detection unit 114 in the calculation process for calculating the object detection result is used together, the following (2) ) There is a calculation method like the formula.
Figure JPOXMLDOC01-appb-M000002
 ここで、cは、特徴マップにおけるチャネルである。(2)式の中の丸の記号は、アダマール積を示し、外側の二重線は、L2ノルムであることを示している。 Here, c is a channel in the feature map. The circle symbol in the equation (2) indicates the Hadamard product, and the outer double line indicates the L2 norm.
 判断根拠可視化部115は、判断根拠可視化画像を示す判断根拠可視化信号を警告実行部117に与える。 The judgment basis visualization unit 115 gives the judgment basis visualization signal indicating the judgment basis visualization image to the warning execution unit 117.
 条件判断部116は、物体検出結果情報に基づき、警告設定レベルを判断する。
 実施の形態1では、警告設定レベルの決定に、物体検出結果に含まれている信頼性及び推定距離が用いられる。
The condition determination unit 116 determines the warning setting level based on the object detection result information.
In the first embodiment, the reliability and estimated distance included in the object detection result are used to determine the warning setting level.
 図5は、警告設定レベルLを決定するための表である。
 図5に示されているように、信頼性及び推定距離との組み合わせに応じて、警告設定レベルLが決定される。警告設定レベルLは、警告実行部117の動作を決定するのに使用されるため、警告設定レベルLを示す警告設定レベル情報が警告実行部117に与えられる。
FIG. 5 is a table for determining the warning setting level L.
As shown in FIG. 5, the warning setting level L is determined according to the combination with the reliability and the estimated distance. Since the warning setting level L is used to determine the operation of the warning execution unit 117, the warning setting level information indicating the warning setting level L is given to the warning execution unit 117.
 具体的には、警告設定レベル1は、対象画像から物体が検出されていない第一の条件を満たす場合のレベルである。
 警告設定レベル1は、信頼性が50%未満で、かつ、距離が10m以上という第二の条件を満たす場合のレベルである。
 警告設定レベル2は、信頼性が50%未満で、かつ、距離が10m未満、又は、信頼性が50%以上で、かつ、距離が10m以上という第三の条件を満たす場合のレベルである。
 警告設定レベル3は、信頼性が50%以上で、かつ、距離が10m未満という第四の条件を満たす場合のレベルである。
 ここで、第二の条件は、第一の条件よりも、検出された物体に対する警告の必要性が高い条件であり、第三の条件は、第二の条件よりも、検出された物体に対する警告の必要性が高い条件であり、第四の条件は、第三の条件よりも、検出された物体に対する警告の必要性が高い条件である。
Specifically, the warning setting level 1 is a level when the first condition that no object is detected from the target image is satisfied.
The warning setting level 1 is a level when the reliability is less than 50% and the second condition that the distance is 10 m or more is satisfied.
The warning setting level 2 is a level when the third condition that the reliability is less than 50% and the distance is less than 10 m, or the reliability is 50% or more and the distance is 10 m or more is satisfied.
The warning setting level 3 is a level when the reliability is 50% or more and the fourth condition that the distance is less than 10 m is satisfied.
Here, the second condition is a condition in which a warning for the detected object is more necessary than the first condition, and the third condition is a warning for the detected object rather than the second condition. The fourth condition is a condition in which the need for warning to the detected object is higher than the third condition.
 警告実行部117は、条件判断部116から受け取った警告設定レベル情報で示される警告設定レベルに基づき、ユーザーへの警告内容を決定して、その出力を実行する出力実行部である。
 図6は、警告設定レベルLから警告内容を決定するための表である。
 警告実行部117は、図6に示されているように、警告設定レベルLに応じて警告内容を決定する。そして、警告実行部117は、決定された警告内容に従った警告情報を生成して、その警告情報を示す警告信号を出力部118に与える。
The warning execution unit 117 is an output execution unit that determines the content of the warning to the user based on the warning setting level indicated by the warning setting level information received from the condition determination unit 116 and executes the output.
FIG. 6 is a table for determining the warning content from the warning setting level L.
As shown in FIG. 6, the warning execution unit 117 determines the warning content according to the warning setting level L. Then, the warning execution unit 117 generates warning information according to the determined warning content, and gives a warning signal indicating the warning information to the output unit 118.
 例えば、警告設定レベルLが0であれば、警告実行部117は、警告情報を生成しない。この場合、警告実行部117は、出力部118に対象画像を示す処理済画像信号を出力する。 For example, if the warning setting level L is 0, the warning execution unit 117 does not generate warning information. In this case, the warning execution unit 117 outputs a processed image signal indicating the target image to the output unit 118.
 警告設定レベルLが1であれば、警告実行部117は、判断根拠可視化画像を用いた出力画像を警告画像として生成し、その警告画像を示す警告情報を生成する。具体的には、警告実行部117は、判断根拠可視化部115から受け取った判断根拠可視化画像を、処理済画像信号で示される処理済画像にアルファブレンディングすることで出力画像を生成する。 If the warning setting level L is 1, the warning execution unit 117 generates an output image using the judgment basis visualization image as a warning image, and generates warning information indicating the warning image. Specifically, the warning execution unit 117 generates an output image by alpha blending the judgment basis visualization image received from the judgment basis visualization unit 115 into the processed image indicated by the processed image signal.
 警告設定レベルLが2であれば、警告実行部117は、対象画像において検出された物体を矩形で囲んだ処理画像を、警告画像として生成し、その警告画像を示す警告情報を生成する。具体的には、警告実行部117は、物体検出部114から受け取った物体検出結果に含まれている検出位置を矩形として、処理済画像信号で示される処理済画像に重ねることで、警告画像を生成し、その警告画像を示す警告情報を生成する。 If the warning setting level L is 2, the warning execution unit 117 generates a processed image in which the object detected in the target image is surrounded by a rectangle as a warning image, and generates warning information indicating the warning image. Specifically, the warning execution unit 117 sets the detection position included in the object detection result received from the object detection unit 114 as a rectangle and superimposes the warning image on the processed image indicated by the processed image signal. Generate and generate warning information indicating the warning image.
 警告設定レベルLが3であれば、警告実行部117は、上記の処理画像を警告画像として生成するとともに、警告するための音声である警告音を生成し、その警告画像及び警告音を示す警告情報を生成する。具体的には、警告実行部117は、物体検出部114から受け取った物体検出結果に含まれている検出位置を、矩形として処理済画像信号で示される処理済画像に重ねることで、警告画像を生成するとともに、音声による警告音を生成する。 If the warning setting level L is 3, the warning execution unit 117 generates the above-mentioned processed image as a warning image, and also generates a warning sound which is a voice for warning, and the warning image and the warning sound indicating the warning sound are generated. Generate information. Specifically, the warning execution unit 117 superimposes the detection position included in the object detection result received from the object detection unit 114 on the processed image indicated by the processed image signal as a rectangle to display the warning image. At the same time as generating, a warning sound by voice is generated.
 出力部118は、警告実行部117から与えられる警告情報を出力することで、モニター103及びスピーカー104の少なくとも何れか一方を通じて、ユーザーに警告を伝える。 The output unit 118 outputs a warning information given by the warning execution unit 117 to convey a warning to the user through at least one of the monitor 103 and the speaker 104.
 図7は、警告装置110のハードウェア構成図である。
 図7に示されているように、警告装置110は、コンピュータ130により構成することができる。
FIG. 7 is a hardware configuration diagram of the warning device 110.
As shown in FIG. 7, the warning device 110 can be configured by a computer 130.
 メモリ131は、画像信号取得部112、前処理部113、判断根拠可視化部115、物体検出部114、条件判断部116及び警告実行部117として機能するプログラム及びこれらが使用するデータを記憶する。メモリ131は、例えば、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリ、EPROM(Erasable Programmable Read Only Memory)又はEEPROM(Electrically Erasable Programmable Read-Only Memory)等の半導体メモリ、磁気ディスク、光ディスク又は光磁気ディスク等を用いたものである。 The memory 131 stores a program that functions as an image signal acquisition unit 112, a preprocessing unit 113, a judgment basis visualization unit 115, an object detection unit 114, a condition judgment unit 116, and a warning execution unit 117, and data used by these. The memory 131 may be, for example, a RAM (Random Access Memory), a ROM (Read Only Memory), a flash memory, an EPROM (Erasable Programmable Read Only Memory), an EPROM (Electrically Memory Memory, etc.) An optical disk, a photomagnetic disk, or the like is used.
 プロセッサ132は、画像信号取得部112、前処理部113、判断根拠可視化部115、物体検出部114、条件判断部116及び警告実行部117として機能するプログラムを読み出し、処理の実行を行う。プロセッサ132は、例えば、CPU(Central Processing Unit)、GPU(Graphics Processing Unit)、マイクロプロセッサ、マイクロコントローラ又はDSP(Digital Signal Processor)等を用いたものである。 The processor 132 reads out a program that functions as an image signal acquisition unit 112, a preprocessing unit 113, a judgment basis visualization unit 115, an object detection unit 114, a condition judgment unit 116, and a warning execution unit 117, and executes processing. The processor 132 uses, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a microprocessor, a microcontroller, a DSP (Digital Signal Processor), or the like.
 音響インターフェース133は、警告実行部117が警告をユーザーに指示する場合にスピーカー104から音による警告を出力する場合に用いるものである。音響インターフェースは、出力部118として機能する。 The acoustic interface 133 is used when the warning execution unit 117 outputs a warning by sound from the speaker 104 when instructing the user to give a warning. The acoustic interface functions as an output unit 118.
 画像インターフェース134は、カメラ101及び測距センサー102から得られたアナログ信号を画像信号取得部112に伝えるためのものである。この場合、画像インターフェース134は、入力部111として機能する。
 また、画像インターフェース134は、警告実行部117が警告をユーザーに指示する場合にモニター103に最終的な画面出力信号を出力する場合にも用いられる。この場合、画像インターフェース134は、出力部118として機能する。
The image interface 134 is for transmitting the analog signal obtained from the camera 101 and the distance measuring sensor 102 to the image signal acquisition unit 112. In this case, the image interface 134 functions as an input unit 111.
The image interface 134 is also used when the warning execution unit 117 outputs a final screen output signal to the monitor 103 when instructing the user to give a warning. In this case, the image interface 134 functions as an output unit 118.
 ネットワークインターフェース135は、外部ネットワーク環境から画像信号取得部112が画像入力信号を受け取る場合に用いられる。外部ネットワーク環境と通信するような構成になっていない場合は、ネットワークインターフェース135は不要である。この場合、ネットワークインターフェース135は、入力部111として機能する。 The network interface 135 is used when the image signal acquisition unit 112 receives an image input signal from the external network environment. The network interface 135 is not required unless it is configured to communicate with the external network environment. In this case, the network interface 135 functions as the input unit 111.
 なお、図2において、メモリ131は、警告装置110の内部に配置したが、USBメモリ等の外部メモリとして構成され、プログラム及びデータを読み込むように構成されてもよい。また、メモリ131は、装置内のメモリ及び外部メモリが併用されていてもよい。 Although the memory 131 is arranged inside the warning device 110 in FIG. 2, it may be configured as an external memory such as a USB memory and may be configured to read a program and data. Further, as the memory 131, the memory in the device and the external memory may be used together.
 図8は、警告装置110が行う処理を示すフローチャートである。
 まず、入力部111は、カメラ101からの画像出力信号、及び、測距センサー102からの深度画像出力信号の入力を受け付ける(S10)。入力された画像出力信号及び深度画像出力信号は、画像信号取得部112に与えられ、画像信号取得部112により、アナログ信号からデジタル信号に変換されて、画像信号及び深度画像信号として、前処理部113に与えられる。
FIG. 8 is a flowchart showing a process performed by the warning device 110.
First, the input unit 111 receives the input of the image output signal from the camera 101 and the depth image output signal from the distance measuring sensor 102 (S10). The input image output signal and depth image output signal are given to the image signal acquisition unit 112, converted from an analog signal to a digital signal by the image signal acquisition unit 112, and used as an image signal and a depth image signal in the preprocessing unit. Given to 113.
 次に、前処理部113は、画像信号及び深度画像信号に前処理を施すことで、処理済画像信号及び処理済深度画像信号へとそれぞれ変換する(S11)。処理済画像信号及び処理済深度画像信号は、物体検出部114、判断根拠可視化部115及び警告実行部117に与えられる。 Next, the preprocessing unit 113 performs preprocessing on the image signal and the depth image signal to convert them into a processed image signal and a processed depth image signal, respectively (S11). The processed image signal and the processed depth image signal are given to the object detection unit 114, the judgment basis visualization unit 115, and the warning execution unit 117.
 次に、物体検出部114は、処理済画像信号及び処理済深度画像信号から物体検出を行い、その検出結果である物体検出結果を示す物体検出結果情報を判断根拠可視化部115、条件判断部116及び警告実行部117に与え、さらに、物体検出の過程で得られる特徴マップを示す特徴マップ情報を判断根拠可視化部115に与える(S12)。 Next, the object detection unit 114 performs object detection from the processed image signal and the processed depth image signal, and determines the object detection result information indicating the object detection result which is the detection result. And the warning execution unit 117, and further, the feature map information indicating the feature map obtained in the process of object detection is given to the judgment basis visualization unit 115 (S12).
 次に、条件判断部116は、物体検出結果情報に含まれている推定距離と信頼性との組み合わせに応じて、警告設定レベルを決定し、決定された警告設定レベルを示す警告設定レベル情報を警告実行部117に与える(S13)。 Next, the condition determination unit 116 determines the warning setting level according to the combination of the estimated distance and the reliability included in the object detection result information, and provides the warning setting level information indicating the determined warning setting level. It is given to the warning execution unit 117 (S13).
 警告実行部117は、与えられた警告設定レベル情報で示される警告設定レベルを確認して(S14)、警告設定レベルが0であれば処理を終了し、警告設定レベルが弱い警告であれば処理をステップS15に進め、警告設定レベルが中程度~強い警告であれば処理をステップS16に進める。 The warning execution unit 117 confirms the warning setting level indicated by the given warning setting level information (S14), ends the processing if the warning setting level is 0, and processes if the warning setting level is weak. To step S15, and if the warning setting level is medium to strong warning, the process proceeds to step S16.
 ステップS15では、警告実行部117は、判断根拠可視化部115から受け取った判断根拠可視化画像を処理済画像信号で示される処理済画像にアルファブレンディングすることで警告画像を生成し、その警告画像を示す警告情報を生成する。生成された警告情報は、出力部118に与えられる。そして、処理はステップS17に進む。 In step S15, the warning execution unit 117 generates a warning image by alpha blending the judgment basis visualization image received from the judgment basis visualization unit 115 into the processed image indicated by the processed image signal, and shows the warning image. Generate warning information. The generated warning information is given to the output unit 118. Then, the process proceeds to step S17.
 一方、ステップS16では、警告実行部117は、警告設定レベルLが2であれば、物体検出部114から受け取った物体検出結果に含まれている検出位置を矩形として処理済画像信号で示される処理済画像に重ねることで、警告画像を生成し、その警告画像を示す警告情報を生成する。警告設定レベルLが3であれば、物体検出部114から受け取った物体検出結果に含まれている検出位置を矩形として処理済画像信号で示される処理済画像に重ねることで、警告画像を生成するとともに、音声による警告音を生成し、警告画像及び警告音を示す警告情報を生成する。そして、処理はステップS17に進む。 On the other hand, in step S16, if the warning setting level L is 2, the warning execution unit 117 processes the detected position included in the object detection result received from the object detection unit 114 as a rectangular shape and is indicated by the processed image signal. By superimposing on the finished image, a warning image is generated, and warning information indicating the warning image is generated. When the warning setting level L is 3, a warning image is generated by superimposing the detection position included in the object detection result received from the object detection unit 114 on the processed image indicated by the processed image signal as a rectangle. At the same time, a warning sound by voice is generated, and a warning image and warning information indicating the warning sound are generated. Then, the process proceeds to step S17.
 ステップS17では、出力部118は、警告実行部117で生成された警告情報に基づいて、音声及び画像の少なくとも何れか一方を用いて、ユーザーへの警告を実施する。 In step S17, the output unit 118 issues a warning to the user using at least one of voice and image based on the warning information generated by the warning execution unit 117.
 図9(A)及び(B)は、モニター103に出力される画面画像の例を示す概略図である。
 画面画像140、画面画像150ともに前方の道のコーナーから車両141、151が右へ横切るような状況を想定している。
9 (A) and 9 (B) are schematic views showing an example of a screen image output to the monitor 103.
Both the screen image 140 and the screen image 150 assume a situation in which vehicles 141 and 151 cross to the right from the corner of the road ahead.
 画面画像140は、警告設定レベルが中程度~強い警告であると判断された場合に、警告実行部117で生成される警告画面画像である。画面画像140では、物体検出結果に含まれている検出位置が用いられ、その位置に矩形142による警告が表示されている。 The screen image 140 is a warning screen image generated by the warning execution unit 117 when it is determined that the warning setting level is medium to strong. In the screen image 140, the detection position included in the object detection result is used, and a warning by the rectangle 142 is displayed at that position.
 一方、画面画像150は、警告設定レベルが弱い警告相当であると判断された場合に、警告実行部117で生成される警告画面画像である。画面画像150では、判断根拠可視化信号で示される情報を原画像に重ねた注目領域152が警告内容として表示されている。 On the other hand, the screen image 150 is a warning screen image generated by the warning execution unit 117 when it is determined that the warning setting level is equivalent to a weak warning. In the screen image 150, the attention area 152 in which the information indicated by the judgment basis visualization signal is superimposed on the original image is displayed as the warning content.
 以上に記載された実施の形態では、前処理部113において、画像のリサイズを行っているため、警告実行部117は、前処理部113で処理された処理済画像を用いて警告画像を生成しているが、本実施の形態は、このような例に限定されない。例えば、前処理部113で画像のリサイズを行わない場合等では、警告実行部117は、画像信号取得部112でデジタル化された画像信号で示される画像を用いて警告画像を生成することができる。 In the embodiment described above, since the image is resized in the preprocessing unit 113, the warning execution unit 117 generates a warning image using the processed image processed by the preprocessing unit 113. However, the present embodiment is not limited to such an example. For example, when the preprocessing unit 113 does not resize the image, the warning execution unit 117 can generate a warning image using the image represented by the digitized image signal by the image signal acquisition unit 112. ..
 100 警告システム、 101 カメラ、 102 測距センサー、 103 モニター、 104 スピーカー、 110 警告装置、 111 入力部、 112 画像信号取得部、 113 前処理部、 114 物体検出部、 115 判断根拠可視化部、 116 条件判断部、 117 警告実行部、 118 出力部。 100 warning system, 101 camera, 102 distance measuring sensor, 103 monitor, 104 speaker, 110 warning device, 111 input unit, 112 image signal acquisition unit, 113 preprocessing unit, 114 object detection unit, 115 judgment basis visualization unit, 116 conditions Judgment unit, 117 warning execution unit, 118 output unit.

Claims (8)

  1.  対象画像から特徴マップを用いて物体を検出する物体検出部と、
     前記対象画像と同一の画素の配置において、前記特徴マップに従って、前記物体を検出する際に判断根拠となった画像領域の重要度を識別できるようにした画像である判断根拠可視化画像を生成する判断根拠可視化部と、を備えること
     を特徴とする情報処理装置。
    An object detection unit that detects an object from the target image using a feature map,
    Judgment to generate a judgment basis visualization image which is an image capable of identifying the importance of an image region which is a judgment basis when detecting the object in the same pixel arrangement as the target image according to the feature map. An information processing device characterized by having a grounds visualization unit.
  2.  前記物体の検出結果が第一の条件を満たす否か、及び、第二の条件を満たすか否かを判断する条件判断部と、
     前記検出結果が前記第一の条件を満たす場合には、前記対象画像を出力し、前記検出結果が前記第二の条件を満たす場合には、前記判断根拠可視化画像を用いた出力画像を出力する出力実行部と、をさらに備えること
     を特徴とする請求項1に記載の情報処理装置。
    A condition determination unit that determines whether or not the detection result of the object satisfies the first condition and whether or not the second condition is satisfied.
    When the detection result satisfies the first condition, the target image is output, and when the detection result satisfies the second condition, an output image using the judgment basis visualization image is output. The information processing apparatus according to claim 1, further comprising an output execution unit.
  3.  前記第二の条件は、前記第一の条件よりも、前記物体に対する警告の必要性が高い条件であること
     を特徴とする請求項2に記載の情報処理装置。
    The information processing apparatus according to claim 2, wherein the second condition is a condition in which a warning to the object is more necessary than the first condition.
  4.  前記条件判断部は、前記検出結果が第三の条件を満たすか否かをさらに判断し、
     前記出力実行部は、前記検出結果が前記第三の条件を満たす場合には、前記対象画像において前記物体を矩形で囲んだ処理画像を出力すること
     を特徴とする請求項2又は3に記載の情報処理装置。
    The condition determination unit further determines whether or not the detection result satisfies the third condition, and further determines.
    The second or third aspect of the present invention, wherein the output execution unit outputs a processed image in which the object is surrounded by a rectangle in the target image when the detection result satisfies the third condition. Information processing device.
  5.  前記第三の条件は、前記第二の条件よりも、前記物体に対する警告の必要性が高い条件であること
     を特徴とする請求項4に記載の情報処理装置。
    The information processing apparatus according to claim 4, wherein the third condition is a condition in which a warning to the object is more necessary than the second condition.
  6.  前記条件判断部は、前記検出結果が第四の条件を満たすか否かをさらに判断し、
     前記出力実行部は、前記検出結果が前記第四の条件を満たす場合には、前記処理画像及び警告するための音声を出力すること
     を特徴とする請求項4又は5に記載の情報処理装置。
    The condition determination unit further determines whether or not the detection result satisfies the fourth condition, and further determines.
    The information processing apparatus according to claim 4, wherein the output execution unit outputs the processed image and the voice for warning when the detection result satisfies the fourth condition.
  7.  前記第四の条件は、前記第三の条件よりも、前記物体に対する警告の必要性が高い条件であること
     を特徴とする請求項6に記載の情報処理装置。
    The information processing apparatus according to claim 6, wherein the fourth condition is a condition in which a warning to the object is more necessary than the third condition.
  8.  対象画像から特徴マップを用いて物体を検出し、
     前記対象画像と同一の画素の配置において、前記特徴マップに従って、前記物体を検出する際に判断根拠となった画像領域を識別できるようにした画像である判断根拠可視化画像を生成すること
     を特徴とする情報処理方法。
    Detect an object from the target image using a feature map and
    It is characterized in that, in the same pixel arrangement as the target image, a judgment basis visualization image, which is an image that enables identification of an image area that is a judgment basis when detecting the object, is generated according to the feature map. Information processing method to do.
PCT/JP2020/045684 2020-12-08 2020-12-08 Information processing device and information processing method WO2022123654A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2020/045684 WO2022123654A1 (en) 2020-12-08 2020-12-08 Information processing device and information processing method
JP2021524283A JPWO2022123654A1 (en) 2020-12-08 2020-12-08

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/045684 WO2022123654A1 (en) 2020-12-08 2020-12-08 Information processing device and information processing method

Publications (1)

Publication Number Publication Date
WO2022123654A1 true WO2022123654A1 (en) 2022-06-16

Family

ID=81973386

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/045684 WO2022123654A1 (en) 2020-12-08 2020-12-08 Information processing device and information processing method

Country Status (2)

Country Link
JP (1) JPWO2022123654A1 (en)
WO (1) WO2022123654A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024161535A1 (en) * 2023-02-01 2024-08-08 三菱電機株式会社 Information processing device, program, information processing system, and information processing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016114592A (en) * 2014-12-12 2016-06-23 キヤノン株式会社 Information processing device, information processing method, and program
JP2019156641A (en) * 2018-03-08 2019-09-19 コニカミノルタ株式会社 Image processing device for fork lift and control program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016114592A (en) * 2014-12-12 2016-06-23 キヤノン株式会社 Information processing device, information processing method, and program
JP2019156641A (en) * 2018-03-08 2019-09-19 コニカミノルタ株式会社 Image processing device for fork lift and control program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024161535A1 (en) * 2023-02-01 2024-08-08 三菱電機株式会社 Information processing device, program, information processing system, and information processing method

Also Published As

Publication number Publication date
JPWO2022123654A1 (en) 2022-06-16

Similar Documents

Publication Publication Date Title
CN109017570B (en) Vehicle surrounding scene presenting method and device and vehicle
JP4359710B2 (en) Vehicle periphery monitoring device, vehicle, vehicle periphery monitoring program, and vehicle periphery monitoring method
JP5121389B2 (en) Ultrasonic diagnostic apparatus and method for measuring the size of an object
JP4171501B2 (en) Vehicle periphery monitoring device
EP3499456B1 (en) Circuit device, electronic instrument, and error detection method
EP2150053A1 (en) Vehicle periphery monitoring system, vehicle periphery monitoring program and vehicle periphery monitoring method
EP3534331A1 (en) Circuit device and electronic apparatus
JP5718920B2 (en) Vehicle periphery monitoring device
US9826166B2 (en) Vehicular surrounding-monitoring control apparatus
WO2012004938A1 (en) Device for monitoring vicinity of vehicle
JP2009037622A (en) Method and device for evaluating image
US9197860B2 (en) Color detector for vehicle
JP2008268190A (en) Determination of surface characteristics
WO2022123654A1 (en) Information processing device and information processing method
JP2003028635A (en) Image range finder
JP4813304B2 (en) Vehicle periphery monitoring device
JP5906696B2 (en) Vehicle periphery photographing apparatus and vehicle periphery image processing method
WO2010115020A2 (en) Color and pattern detection system
JP7505596B2 (en) IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND IMAGE PROCESSING PROGRAM
CN110556024B (en) Anti-collision auxiliary driving method and system and computer readable storage medium
JP2011081614A (en) Recognition system, recognition method, and program
KR101194152B1 (en) METHOD AND SySTEM FOR AVOIDING PEDESTRIAN COLLISION
JP4869835B2 (en) Vehicle perimeter monitoring system
JP2018072884A (en) Information processing device, information processing method and program
JP2966683B2 (en) Obstacle detection device for vehicles

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021524283

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965038

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965038

Country of ref document: EP

Kind code of ref document: A1