WO2024180759A1 - 画像処理装置、画像処理方法、およびプログラム - Google Patents
画像処理装置、画像処理方法、およびプログラム Download PDFInfo
- Publication number
- WO2024180759A1 WO2024180759A1 PCT/JP2023/007793 JP2023007793W WO2024180759A1 WO 2024180759 A1 WO2024180759 A1 WO 2024180759A1 JP 2023007793 W JP2023007793 W JP 2023007793W WO 2024180759 A1 WO2024180759 A1 WO 2024180759A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- person
- face
- converted
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
Definitions
- the present invention relates to an image processing device, an image processing method, and a program.
- Patent Document 1 describes a technique for preventing erroneous detection of a pattern on a wall in front of an image input means as a face, even if the pattern on the wall resembles a person's face.
- Patent Document 1 detects the arrival of a person using a human presence sensor, and then turns on a light that illuminates the person's face, thereby reliably detecting the person's face.
- a human presence sensor detects the arrival of a person using a human presence sensor, and then turns on a light that illuminates the person's face, thereby reliably detecting the person's face.
- the facial area of an image to be anonymized cannot be detected more easily and reliably.
- the present invention has been made in consideration of these circumstances, and one of its objectives is to provide an image processing device, an image processing method, and a program that can more easily and reliably detect the facial area of an image that is to be anonymized.
- An image processing device includes an extraction unit that extracts a person from a converted image obtained by applying an image conversion process to an input image, and a processing unit that determines whether the extracted person satisfies requirements regarding the suitability of the person's presence and applies a predetermined processing to the converted image based on the result of the determination.
- the extraction unit extracts the face of the person as the person, and the requirement for the suitability of the presence of the person is that other parts of the person besides the face are recognized around the face.
- the requirement regarding suitability of the presence of the person is that the area of the converted image in which the person is present is recognized as an area through which pedestrians can pass.
- the requirement regarding the suitability of the presence of the person is that the person is also present in the transformed images at the previous and next time points among the transformed images in the time series.
- the processing unit deletes the converted image or performs the image conversion process again on the input image as the predetermined process.
- the processing unit stores the converted image as learning data as the predetermined process.
- the image conversion process is a process of changing the face of the person into the face of another person while aligning the orientation of the face of the person before and after the image conversion process.
- a computer extracts a person from a converted image obtained by performing an image conversion process on an input image, determines whether the extracted person satisfies a requirement regarding the suitability of the person's presence, and performs a predetermined process on the converted image based on the result of the determination.
- a program causes a computer to extract a person from a converted image obtained by performing an image conversion process on an input image, determine whether the extracted person satisfies requirements regarding the suitability of the person's presence, and perform a predetermined process on the converted image based on the result of the determination.
- FIG. 1 is a diagram showing an overview of a system 1 including an image processing apparatus 100 according to an embodiment of the present invention.
- FIG. 1 is a diagram illustrating an example of a functional configuration of an image processing device 100 according to an embodiment of the present invention
- 2A and 2B are diagrams showing an example of an interior image and an exterior image acquired from a vehicle M1.
- 4 is a diagram for explaining a process executed by an image processing unit 130.
- FIG. 11 is a diagram for explaining a process executed by an image conversion unit 140.
- FIG. 1A and 1B are diagrams for explaining a failure of a process executed by a conventional image processing unit 130.
- 5A to 5C are diagrams for explaining processing executed by an image processing unit 130 according to the present embodiment.
- FIG. 1 is a diagram illustrating an example of a functional configuration of an image processing device 100 according to an embodiment of the present invention
- 2A and 2B are diagrams showing an example of an interior image and an exterior image acquired from a vehicle M1.
- 4 is
- FIG. 1 is a diagram illustrating an example of an annotation task performed by an annotator. A figure showing an example of driving assistance using a trained model 180.
- FIG. 11 is a diagram showing an example of a flow of processing executed by an image conversion unit 140.
- FIG. 11 is a diagram showing an example of a flow of processing executed by an image determination unit 150.
- FIG. 1 is a diagram showing an overview of a system 1 including an image processing device 100 according to this embodiment.
- the system 1 includes at least one vehicle M1 and one vehicle M2, an image processing device 100, and a terminal device 200.
- the vehicle M1 and the vehicle M2 are illustrated as different vehicles, but these vehicles may be the same.
- Vehicle M1 is, for example, a hybrid vehicle, an electric vehicle, or the like, and includes at least a camera that captures images of the interior of vehicle M1 and a camera that captures images of the exterior of vehicle M1. While traveling, vehicle M1 transmits images of the interior and exterior of the vehicle captured by these cameras to image processing device 100 via a network NW such as a cellular network, a Wi-Fi network, or the Internet.
- NW such as a cellular network, a Wi-Fi network, or the Internet.
- the image processing device 100 is a server device that, when it receives captured image data including images inside and outside the vehicle from the vehicle M1, performs image conversion, described below, on the received captured image data. This image conversion is a process for protecting the privacy of people captured in the images inside and outside the vehicle.
- the image processing device 100 transmits the obtained converted image data to the terminal device 200 via the network NW.
- the terminal device 200 is a terminal device such as a desktop personal computer or a smartphone.
- the user of the terminal device 200 acquires the converted image data from the image processing device 100, the user performs an annotation assignment operation, which will be described later, on the acquired converted image data.
- the annotation assignment operation is completed, the user of the terminal device 200 transmits the annotated image data, in which the annotations have been assigned to the converted image data, to the image processing device 100.
- the image processing device 100 When the image processing device 100 receives annotated image data from the terminal device 200, it uses the received annotated image data as learning data and generates a trained model (described below) using an arbitrary machine learning model.
- This trained model is, for example, a behavior prediction model that, when an outside-of-vehicle image is input, outputs the predicted behavior (trajectory) of a person depicted in the outside-of-vehicle image, or, when an inside-vehicle image and an outside-vehicle image are input, takes into account the line of sight of the driver depicted in the inside-vehicle image and calls attention to a pedestrian depicted in the outside-vehicle image.
- a behavior prediction model that, when an outside-of-vehicle image is input, outputs the predicted behavior (trajectory) of a person depicted in the outside-of-vehicle image, or, when an inside-vehicle image and an outside-vehicle
- the image data used as the learning data may be annotated image data in which annotations have been added to the converted image data, or annotated image data in which the converted image data has been reconverted into captured image data while leaving the annotations intact (i.e., annotated image data in which annotations have been added to captured image data).
- annotated image data in which annotations have been added to captured image data as the learning data it is possible to use learning data that is more realistic and in which the effects of image conversion have been removed.
- the image processing device 100 When the image processing device 100 generates the trained model, it distributes the generated trained model to the vehicle M2 via the network NW.
- the vehicle M2 is, for example, a hybrid vehicle or an electric vehicle, and while the vehicle M2 is traveling, at least one of an interior image and an exterior image captured by a camera is input into the trained model, thereby obtaining behavior prediction data for people present in the vicinity of the vehicle M2.
- the driver of the vehicle M2 can refer to the obtained behavior prediction data and use it when driving the vehicle M2. The contents of each process are explained in more detail below.
- [Functional configuration of the image processing device] 2 is a diagram showing an example of a functional configuration of the image processing device 100 according to the present embodiment.
- the image processing device 100 includes, for example, a communication unit 110, a transmission/reception control unit 120, an image processing unit 130, an image conversion unit 140, an image determination unit 150, a trained model generation unit 160, and a storage unit 170. These components are realized by, for example, a hardware processor such as a CPU (Central Processing Unit) executing a program (software).
- a hardware processor such as a CPU (Central Processing Unit) executing a program (software).
- Some or all of these components may be realized by hardware (including circuitry) such as an LSI (Large Scale Integration), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or a GPU (Graphics Processing Unit), or may be realized by cooperation between software and hardware.
- the program may be stored in advance in a storage device (a storage device having a non-transient storage medium) such as a hard disk drive (HDD) or a flash memory, or may be stored in a removable storage medium (a non-transient storage medium) such as a DVD or a CD-ROM, and may be installed by mounting the storage medium to a drive device.
- a storage device a storage device having a non-transient storage medium
- HDD hard disk drive
- flash memory or may be stored in a removable storage medium (a non-transient storage medium) such as a DVD or a CD-ROM, and may be installed by mounting the storage medium to a drive device.
- the storage unit 170 is, for example, a HDD, a flash memory, a random access memory (RAM), or the like.
- the storage unit 170 stores, for example, captured image data 172, converted image data 174, annotation image data 176, annotated image data 178, and a trained model 180.
- the image processing device 100 includes a trained model generation unit 160 and a storage unit 170 that stores the trained model 180, but a function of generating a trained model and the generated trained model may be held by a server device different from the image processing device 100.
- the communication unit 110 is an interface that communicates with the communication device 10 of the vehicle M via the network NW.
- the communication unit 110 includes a NIC (Network Interface Card) and an antenna for wireless communication.
- the transmission/reception control unit 120 uses the communication unit 110 to transmit and receive data between the vehicles M1 and M2 and the terminal device 200. More specifically, the transmission/reception control unit 120 first acquires from the vehicle M1 a number of interior and exterior images captured in time series by a camera mounted on the vehicle M1.
- the time series in this case refers to images captured at a predetermined interval (e.g., every second) during one driving cycle from when the vehicle M1 starts to when it stops.
- FIG. 3 is a diagram showing an example of an interior image and an exterior image acquired from vehicle M1.
- the left part of FIG. 3 shows an interior image acquired from vehicle M1, and the right part of FIG. 3 shows an exterior image acquired from vehicle M1.
- the interior image is captured with a camera installed so as to capture at least the facial area of the driver of vehicle M1
- the exterior image is captured with a camera installed so as to capture at least the area ahead in the traveling direction of vehicle M1.
- the transmission/reception control unit 120 links the interior image and exterior image acquired from vehicle M1 to an image ID and stores them in the memory unit 170 as captured image data 172.
- FIG. 4 is a diagram for explaining the processing executed by the image processing unit 130.
- the image processing unit 130 performs image processing on the captured image data 172, and acquires (extracts) information such as image attributes, facial attributes, and orientation of each image included in the captured image data 172. More specifically, when an image is input, the image processing unit 130 acquires image attributes indicating whether each image included in the captured image data 172 is an inside-vehicle image or an outside-vehicle image, using a trained model that outputs a classification result indicating whether the image is an inside-vehicle image or an outside-vehicle image.
- the image processing unit 130 acquires face attributes of each image included in the captured image data 172 using a trained model that outputs the face area, face size (area of the face area), and distance from the image capture position to the face for all faces included in the image.
- a face area FA1 of person P1 is acquired from the in-vehicle image
- a face area FA2 of person P2 is acquired from the outside-vehicle image
- a face area FA3 of person P3, and a face area FA4 of person P4 are acquired from the outside-vehicle image.
- the face areas FA1, FA2, FA3, and FA4 are acquired as rectangular areas, but the present invention is not limited to such a configuration, and for example, a trained model that acquires face areas along the contours of the person's face may be used.
- the image processing unit 130 acquires directional information of the faces in each image included in the captured image data 172 using a trained model that outputs at least one of the face direction and the gaze direction for all faces included in the image, for example as a vector. More specifically, for an image of the captured image data 172 having the attribute of an in-vehicle image, the image processing unit 130 acquires directional information using a trained model that outputs the face direction and gaze direction for all faces included in the image when the image is input. On the other hand, for an image of the captured image data 172 having the attribute of an outside-vehicle image, the image processing unit 130 acquires directional information using a trained model that outputs the face direction for all faces included in the image when the image is input.
- the face direction FD1 and gaze direction ED1 of person P1 are acquired from the in-vehicle image
- the face direction FD2 of person P2 the face direction FD3 of person P3, and the face direction FD4 of person P4 are acquired from the outside-vehicle image.
- the image processing unit 130 When the image processing unit 130 acquires the image attributes, face attributes, and direction information for each image in the captured image data 172, it records the image attributes, face attributes, and direction information in association with the image. Note that, as an example, in the above, the image processing unit 130 acquires the image attributes, face attributes, and direction information using a trained model, but the present invention is not limited to such a configuration, and the image processing unit 130 may acquire the image attributes, face attributes, and direction information using any known method.
- the image conversion unit 140 executes a process for replacing the face of a person captured in each image with the face of another person, without changing the directional information of the person, using any software in which such a function is implemented, for the captured image data 172 processed by the image processing unit 130.
- FIG. 5 is a diagram for explaining the process executed by the image conversion unit 140. As shown in FIG. 5, the image conversion unit 140 replaces the faces of persons P1, P2, and P3 shown in FIG. 4 with the faces of other persons without changing the line of sight direction ED1 and facial directions FD1, FD2, and FD3. On the other hand, the face of person P4 is covered with a mosaic MS as a result of the mosaic process performed by the image conversion unit 140.
- the image conversion unit 140 determines whether to replace each face shown in each image of the captured image data 172 with the face of another person or to apply mosaic processing based on the facial attributes of the face. More specifically, for each face shown in each image of the captured image data 172, the image conversion unit 140 determines whether the size of the face is equal to or greater than the first threshold Th1, and if it is determined that the size of the face is equal to or greater than the first threshold Th1, it determines to replace the face with the face of another person. On the other hand, if it is determined that the size of the face is less than the first threshold Th1, the image conversion unit 140 determines to apply mosaic processing to the face. Replacing the face of a person shown in a captured image with the face of another person or applying mosaic processing is an example of "anonymization processing".
- the image conversion unit 140 also determines whether the distance of each face in each image of the captured image data 172 is equal to or less than the second threshold Th2, and if it is determined that the distance of the face is equal to or less than the second threshold Th2, it decides to replace the face with the face of another person. On the other hand, if it is determined that the distance of the face is greater than the second threshold Th2, the image conversion unit 140 decides to apply mosaic processing to the face.
- the image conversion unit 140 repeatedly executes these determination processes as many times as the number of faces depicted in the image, and either replaces each face with the face of another person or applies mosaic processing according to the determination results.
- the image conversion unit 140 stores the image data obtained by applying such processing to the captured image data 172 in the storage unit 170 as converted image data 174. This allows for the selection of data that is useful as learning data for generating a behavior prediction model, and also allows for the privacy of the people depicted in each image to be protected when an annotator, described later, performs annotation work.
- the image conversion unit 140 may decide to replace the face with the face of another person when the face size is equal to or greater than the first threshold Th1 and the face distance is equal to or less than the second threshold Th2, or may decide to replace the face with the face of another person when the face size is equal to or greater than the first threshold Th1 or the face distance is equal to or less than the second threshold Th2.
- the image conversion unit 140 may select faces to be used as learning data by performing a mosaic process on faces captured in each image of the captured image data 172 for which directional information could not be obtained.
- the image processing unit 130 acquires information such as the facial area and facial direction of the person contained in the image using the trained model as described above.
- the trained model or other software with similar functions may output a non-person as a person due to, for example, wrinkles, stains, lighting conditions, or even cracks or dirt on the road or wall, or dirt, stickers, advertisements, etc. on other vehicles.
- FIG. 6 is a diagram for explaining the failure of processing executed by the conventional image processing unit 130 and image conversion unit 140.
- FIG. 6 shows a converted image obtained by the conventional image processing unit 130 acquiring persons P1 to P6 by inputting the vehicle interior image and the vehicle exterior image into the trained model, and the image conversion unit 140 performing anonymization processing on the face areas FA1 to FA6 of the persons P1 to P6.
- person P5 may be erroneously acquired and converted due to wrinkles, stains, lighting conditions, etc. on the clothes worn by the person depicted in the vehicle interior image
- person P6 may be erroneously acquired and converted due to cracks or dirt on the road depicted in the vehicle exterior image, or dirt, stickers, advertisements, etc. on other vehicles.
- the image determination unit 150 identifies converted images that contain people that have been acquired in error, depending on whether or not each person depicted in the converted image satisfies a predetermined requirement regarding the suitability of presence, and maintains the accuracy of the behavior prediction model by deleting the converted image or by re-performing anonymization processing on the input image that corresponds to the converted image.
- FIG. 7 is a diagram for explaining the processing executed by the image determination unit 150 according to this embodiment.
- the image determination unit 150 extracts people P1 to P6 from the converted images obtained by the image conversion unit 140 converting the inside-vehicle image and the outside-vehicle image.
- people P1 to P6 are identified in advance by the image processing unit 130 as people corresponding to each of the face areas FA1 to FA6 before conversion is performed by the image conversion unit 140.
- person P4 which has been subjected to pixelation, may be excluded from the processing by the image determination unit 150.
- the image determination unit 150 searches for one or more body parts (e.g., a person's shoulders, hands, knees, feet, etc.) around the facial areas FA1 to FA6 of each of the extracted persons P1 to P6 using any key point detection method, and if one or more body parts are detected, it determines that the person and its facial area are real people. For example, in the case of the converted image of the interior of the car shown on the left side of Figure 7, the image determination unit 150 detects key points KP1 and KP2 representing both shoulders based on the facial area FA1 of person P1 (in other words, taking into account the relative position from the facial area FA1), and as a result, it determines that person P1 is a real person.
- body parts e.g., a person's shoulders, hands, knees, feet, etc.
- the image determination unit 150 determines that person P5 is not a real person.
- the image determination unit 150 can extract key points such as shoulders, hands, knees, and feet for persons P2 to P4 using facial areas FA2 to FA4 as references, but for person P6, there are no key points that can be extracted using the facial area FA6 of person P6 as reference, and the image determination unit 150 can determine that person P6 is not a real human. Detection of one or more body parts using a facial area as reference is an example of a "predetermined requirement.”
- the image determination unit 150 may determine whether a person in a converted image of an outside-of-vehicle image is a real person based on whether the area in which the person is present in the converted image is a free space (in other words, an area where pedestrians can pass through). For example, in the case of the converted image of an outside-vehicle image shown on the right side of FIG. 7, the image determination unit 150 detects the free space FS by, for example, inputting the converted image into a trained model that has been trained to output the free space in the image when the image is input. Next, the image determination unit 150 determines whether each of the persons P2 to P4 and P6 is present in the free space FS.
- the image determination unit 150 determines that the persons P2 to P4 are present in the free space FS, while determining that the person P6 is not present in the free space FS because he is present on the vehicle body. As a result, the image determination unit 150 determines that the persons P2 to P4 are real people, while determining that the person P6 is not a real person. For a converted image of an outside-vehicle image, the presence of a person in free space is an example of a "predetermined requirement.”
- the image determination unit 150 may determine whether a person is a real person based on whether the same person is captured in the converted images of the inside-vehicle images and the outside-vehicle images captured in chronological order at each point in time. For example, the image determination unit 150 determines that a person detected in a converted image at a certain point in time is a real person when the person is also present in the converted images at the previous and next points in time. At this time, the image determination unit 150 may perform such a determination only for points in time when the person is expected to be included in the captured image, taking into account the position change (speed) of the person captured in the captured image.
- the image determination unit 150 either deletes the converted image or applies anonymity processing again to the input image corresponding to the converted image, thereby reacquiring the converted image. Deleting the converted image or applying anonymity processing again to the input image corresponding to the converted image is an example of "predetermined processing".
- the image determination unit 150 inputs the converted image again into a trained model that has been trained to output at least one of the face direction and the gaze direction, and obtains the face direction FD or gaze direction ED in the converted image.
- the image determination unit 150 determines whether the face direction FD or gaze direction ED of the face of the person depicted in the converted image is approximately consistent with the face direction FD or gaze direction ED of the face depicted in the captured image before conversion. As described above, both the face direction FD and the gaze direction ED are obtained for the in-vehicle image, and the face direction FD is obtained for the outside-vehicle image.
- the image determination unit 150 determines whether the face direction FD and the gaze direction ED are approximately consistent between the captured image before conversion and the converted image, and for the outside-vehicle image, it determines whether the face direction FD is approximately consistent between the captured image before conversion and the converted image. More specifically, for example, the image determination unit 150 calculates the angle difference between a vector representing the face direction FD in the captured image before conversion and a vector representing the face direction FD in the converted image, and determines that the face directions FD are approximately the same if the calculated angle difference is within a threshold value. The same applies to the gaze direction ED.
- the image conversion unit 140 performs conversion processing again on the captured image for the faces for which it is determined that the face direction FD or the gaze direction ED does not substantially match.
- the image conversion unit 140 may perform conversion processing again only for the faces determined to not substantially match, or may perform conversion processing again for all faces included in the converted image including the faces determined to not substantially match.
- the image conversion unit 140 may perform mosaic processing on the faces determined to not substantially match without performing conversion processing again, and may exclude them from the targets to be used as learning data. This makes it possible to prevent degradation of information due to unintended operations of the face conversion software.
- the converted image judgment process executed by the image judgment unit 150 described above may be executed only for faces assumed to be of higher importance, rather than for all faces in the converted image.
- the image judgment unit 150 may execute these judgment processes only for faces in the captured image before conversion whose face size is equal to or greater than a third threshold Th3 that is greater than the first threshold Th1, or may execute these judgment processes only for faces whose face distance is equal to or less than a fourth threshold Th4 that is smaller than the second threshold Th2.
- the image judgment unit 150 may execute judgment processes by assuming that the face of a person present in front of the vehicle M1 in the traveling direction or the face of a person whose face is facing forward in the traveling direction of the vehicle M1 in the captured image before conversion is of higher importance.
- the image determination unit 150 When the image determination unit 150 completes the determination of the converted image, it stores the converted image data 174 that has been determined to be positive in the storage unit 170 as annotation image data 176. At this time, the converted image data 174 may be stored in the storage unit 170 as annotation image data 176 together with information indicating the purpose of use, for example, information indicating that the converted image data 174 is annotation image data for generating a behavior prediction model that predicts the behavior of a person depicted in the input image.
- the transmission/reception control unit 120 transmits the annotation image data 176 to the terminal device 200.
- the annotator who is the user of the terminal device 200, generates annotated image data by performing annotation work on the annotation image included in the received annotation image data 176, and transmits it to the image processing device 100.
- the image processing device 100 stores the received annotated image data in the storage unit 170 as annotation image data 178.
- FIG. 8 is a diagram showing an example of annotation work performed by an annotator.
- the left part of FIG. 8 shows annotations to the converted image of the in-vehicle image
- the right part of FIG. 8 shows annotations to the converted image of the outside-vehicle image.
- the annotator assigns information to the converted image of the in-vehicle image indicating whether the driver's gaze direction ED1 shown in the converted image is appropriate or not in the situation shown in the converted image of the outside-vehicle image at the same time (for example, 1 if appropriate, 0 if inappropriate).
- FIG. 8 assigns information to the converted image of the in-vehicle image indicating whether the driver's gaze direction ED1 shown in the converted image is appropriate or not in the situation shown in the converted image of the outside-vehicle image at the same time (for example, 1 if appropriate, 0 if inappropriate).
- the converted image of the outside-vehicle image indicates that there is a pedestrian on the left side of the vehicle's traveling direction, while the converted image of the inside-vehicle image indicates that the driver is looking to the left.
- the annotator assigns information indicating that the driver's gaze direction ED1 is appropriate (i.e., 1).
- the annotator specifies a risk area RA into which a person depicted in the converted image of the outside-of-vehicle image, excluding people who have been subjected to pixelation, is predicted to proceed.
- a risk area RA into which a person depicted in the converted image of the outside-of-vehicle image, excluding people who have been subjected to pixelation, is predicted to proceed.
- the image conversion unit 140 and the image determination unit 150 the face of the person depicted in the original image has been converted into the face of another person, so the privacy of that person is protected.
- the annotator can accurately specify the risk area RA while referring to the facial direction and gaze direction of the other person depicted in the converted image. This makes it possible to more easily and reliably detect the facial area of the image to be anonymized.
- the trained model generation unit 160 uses the annotated image data 178 as training data and an arbitrary machine learning model to generate a trained model.
- this trained model is, for example, a behavior prediction model that, when an outside-of-vehicle image is input, outputs the predicted behavior (trajectory) of a person depicted in the outside-of-vehicle image, or, when an inside-vehicle image and an outside-vehicle image are input, takes into account the line of sight of the driver depicted in the inside-vehicle image to call attention to a pedestrian depicted in the outside-vehicle image.
- the trained model generation unit 160 stores the generated trained model in the memory unit 170 as trained model 180.
- the transmission/reception control unit 120 distributes the trained model 180 to the vehicle M2 via the network NW.
- the vehicle M2 uses the trained model 180 (more precisely, an application program that utilizes the trained model 180) to provide driving assistance to the driver of the vehicle M2.
- FIG. 9 is a diagram showing an example of driving assistance using a trained model 180.
- FIG. 9 shows an example of driving assistance in which vehicle M2 inputs interior and exterior images captured by a camera mounted thereon while driving to trained model 180, and trained model 180 outputs information to an HMI (human machine interface) to alert the driver to a pedestrian captured in the exterior image, taking into account the driver's line of sight captured in the interior image.
- HMI human machine interface
- the HMI displays a risk area RA2 corresponding to pedestrian P5 captured in the exterior image, and outputs a warning message ("Be careful not to look away from the road") as text information or audio information when the driver's line of sight captured in the interior image is not directed toward pedestrian P5. This makes it possible to realize driving assistance that takes into account the driver's state.
- Figure 10 is a diagram showing an example of the flow of processing executed by the image conversion unit 140.
- the processing shown in Figure 10 is executed, for example, when an interior image or an exterior image is captured by a camera mounted on the vehicle M1 and processed by the image processing unit 130.
- the image conversion unit 140 acquires the captured image contained in the captured image data 172 that has been processed by the image processing unit 130 (step S100). Next, the image conversion unit 140 selects one face that appears in the acquired captured image (step S102).
- the image conversion unit 140 determines whether the size of the selected face is equal to or greater than the first threshold Th1 (step S104). If it is determined that the size of the selected face is equal to or greater than the first threshold Th1, the image conversion unit 140 converts the face into the face of another person (step S106). On the other hand, if it is determined that the size of the selected face is less than the first threshold Th1, the image conversion unit 140 then determines whether the distance of the selected face is equal to or less than the second threshold Th2 (step S108).
- step S106 If it is determined that the distance of the selected face is equal to or less than the second threshold Th2, the image conversion unit 140 proceeds to step S106 and converts the face into the face of another person. On the other hand, if it is determined that the distance of the selected face is greater than the second threshold Th2, the image conversion unit 140 applies mosaic processing to the face (step S110). Next, the image conversion unit 140 determines whether or not the processing has been performed on all faces captured in the acquired captured image (step S112).
- the image conversion unit 140 acquires the image obtained by performing the processing on all faces as a converted image, and stores this in the storage unit 170 as converted image data 174 (step S114). On the other hand, if it is determined that the processing has not been performed on all faces depicted in the acquired captured image, the image conversion unit 140 returns the processing to step S102. This ends the processing of this flowchart.
- FIG. 11 is a diagram showing an example of the flow of processing executed by the image determination unit 150.
- the processing shown in FIG. 11 is executed, for example, at the timing when a time-series converted image is obtained by applying the above-mentioned conversion processing to a time-series captured image taken during one driving cycle from the start to the stop of the vehicle M1.
- the image determination unit 150 acquires one converted image (step S200). Next, the image determination unit 150 selects one person from the acquired converted image (step S202). Next, the image determination unit 150 judges whether or not the acquired person satisfies a predetermined requirement regarding the suitability of the presence (step S204). If it is judged that the acquired person satisfies the predetermined requirement regarding the suitability of the presence, then the image determination unit 150 judges whether or not the acquired converted image is an in-vehicle image (step S206). On the other hand, if it is judged that the acquired person does not satisfy the predetermined requirement regarding the suitability of the presence, the image determination unit 150 causes the image conversion unit 140 to perform anonymization processing again on the input image corresponding to the converted image (step S208). After that, the image determination unit 150 executes the processing of step S202 again on the re-converted image.
- step S206 determines whether the converted image is an in-vehicle image. If it is determined in step S206 that the converted image is an in-vehicle image, the image determination unit 150 determines whether the gaze direction and facial direction of these faces match those in the image before conversion (step S210). On the other hand, if it is determined that the acquired converted image is not an in-vehicle image, i.e., an outside-vehicle image, the image determination unit 150 determines whether the facial direction of these faces matches those in the image before conversion (step S212). If it is determined in the processing of step S210 or step S212 that they do not match, the image determination unit 150 advances the processing to step S208.
- step S210 or step S212 determines that these faces have been converted normally, and determines whether or not the process has been performed for all faces appearing in the converted images (step S214). If it is determined that the process has been performed for all faces appearing in the time-series converted images, the image determination unit 150 acquires these time-series converted images as images for annotation, and causes the transmission/reception control unit 120 to transmit the acquired images for annotation to the terminal device 200 (step S216). On the other hand, if it is determined that the process has not been performed for all faces appearing in the time-series converted images, the image determination unit 150 returns the process to step S202. This ends the process of this flowchart.
- a person is extracted from a converted image obtained by performing image conversion processing on an input image, and it is determined whether or not the extracted person satisfies the requirements for suitability of the presence of a person, and a predetermined process is performed on the converted image based on the result of the determination.
- a storage medium for storing computer-readable instructions
- a processor coupled to the storage medium;
- the processor executes the computer-readable instructions to: A person area representing a person is extracted from a converted image obtained by performing an image conversion process on the input image; an image processing device that determines whether the extracted person area satisfies a requirement regarding suitability of the presence of the person, and performs a predetermined process on the converted image based on a result of the determination.
- Image processing device 110 Communication unit 120 Transmission/reception control unit 130 Image processing unit 140 Image conversion unit 150 Image determination unit 160 Trained model generation unit 170 Storage unit 172 Captured image data 174 Converted image data 176 Annotation image data 178 Annotated image data 180 Trained model
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202380094662.7A CN120660110A (zh) | 2023-03-02 | 2023-03-02 | 图像处理装置、图像处理方法及程序 |
| PCT/JP2023/007793 WO2024180759A1 (ja) | 2023-03-02 | 2023-03-02 | 画像処理装置、画像処理方法、およびプログラム |
| JP2025503542A JPWO2024180759A1 (https=) | 2023-03-02 | 2023-03-02 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2023/007793 WO2024180759A1 (ja) | 2023-03-02 | 2023-03-02 | 画像処理装置、画像処理方法、およびプログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024180759A1 true WO2024180759A1 (ja) | 2024-09-06 |
Family
ID=92590154
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2023/007793 Ceased WO2024180759A1 (ja) | 2023-03-02 | 2023-03-02 | 画像処理装置、画像処理方法、およびプログラム |
Country Status (3)
| Country | Link |
|---|---|
| JP (1) | JPWO2024180759A1 (https=) |
| CN (1) | CN120660110A (https=) |
| WO (1) | WO2024180759A1 (https=) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008040710A (ja) * | 2006-08-04 | 2008-02-21 | Sony Corp | 顔検出装置、撮像装置および顔検出方法 |
| JP2021105808A (ja) * | 2019-12-26 | 2021-07-26 | 株式会社リコー | 発話者認識システム、発話者認識方法、及び発話者認識プログラム |
| US20220148243A1 (en) * | 2020-11-10 | 2022-05-12 | Adobe Inc. | Face Anonymization in Digital Images |
-
2023
- 2023-03-02 JP JP2025503542A patent/JPWO2024180759A1/ja active Pending
- 2023-03-02 WO PCT/JP2023/007793 patent/WO2024180759A1/ja not_active Ceased
- 2023-03-02 CN CN202380094662.7A patent/CN120660110A/zh active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2008040710A (ja) * | 2006-08-04 | 2008-02-21 | Sony Corp | 顔検出装置、撮像装置および顔検出方法 |
| JP2021105808A (ja) * | 2019-12-26 | 2021-07-26 | 株式会社リコー | 発話者認識システム、発話者認識方法、及び発話者認識プログラム |
| US20220148243A1 (en) * | 2020-11-10 | 2022-05-12 | Adobe Inc. | Face Anonymization in Digital Images |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120660110A (zh) | 2025-09-16 |
| JPWO2024180759A1 (https=) | 2024-09-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108388834B (zh) | 利用循环神经网络和级联特征映射的对象检测 | |
| CN106952303B (zh) | 车距检测方法、装置和系统 | |
| CN111325141B (zh) | 交互关系识别方法、装置、设备及存储介质 | |
| US8880282B2 (en) | Method and system for risk prediction for a support actuation system | |
| US10864906B2 (en) | Method of switching vehicle drive mode from automatic drive mode to manual drive mode depending on accuracy of detecting object | |
| JP7135665B2 (ja) | 車両制御システム、車両の制御方法及びコンピュータプログラム | |
| JP2019179372A (ja) | 学習データ作成方法、学習方法、危険予測方法、学習データ作成装置、学習装置、危険予測装置、及び、プログラム | |
| JP2015057690A (ja) | 画像処理装置、認識対象物検出方法、認識対象物検出プログラム、および、移動体制御システム | |
| JP2013225295A5 (https=) | ||
| CN107392083A (zh) | 识别装置、识别方法、识别程序和记录介质 | |
| JP2014241036A (ja) | 車両運転支援装置 | |
| CN110268417A (zh) | 在摄像机图像中识别目标的方法 | |
| CN107944382A (zh) | 目标跟踪方法、装置及电子设备 | |
| JP2009070344A (ja) | 画像認識装置、画像認識方法および電子制御装置 | |
| JP5107154B2 (ja) | 運動推定装置 | |
| CN111081045A (zh) | 姿态轨迹预测方法及电子设备 | |
| Lakmal et al. | Pothole detection with image segmentation for advanced driver assisted systems | |
| Shashidhar et al. | Computer Vision and the IoT‐Based Intelligent Road Lane Detection System | |
| WO2024180759A1 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| WO2024180709A1 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
| WO2024005073A1 (ja) | 画像処理装置、画像処理方法、画像処理システム、およびプログラム | |
| JP5838948B2 (ja) | 対象物識別装置 | |
| JP2006010652A (ja) | 物体検出装置 | |
| JP4772622B2 (ja) | 周辺監視システム | |
| JP7805260B2 (ja) | 画像処理装置、画像処理方法、画像処理システム、およびプログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23925314 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025503542 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025503542 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202380094662.7 Country of ref document: CN |
|
| WWP | Wipo information: published in national office |
Ref document number: 202380094662.7 Country of ref document: CN |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23925314 Country of ref document: EP Kind code of ref document: A1 |