WO2018105246A1 - Image sensor - Google Patents

Image sensor Download PDF

Info

Publication number
WO2018105246A1
WO2018105246A1 PCT/JP2017/037867 JP2017037867W WO2018105246A1 WO 2018105246 A1 WO2018105246 A1 WO 2018105246A1 JP 2017037867 W JP2017037867 W JP 2017037867W WO 2018105246 A1 WO2018105246 A1 WO 2018105246A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
control signal
image sensor
detection
frames
Prior art date
Application number
PCT/JP2017/037867
Other languages
French (fr)
Japanese (ja)
Inventor
久美子 馬原
佐伯 隆司
Original Assignee
ソニーセミコンダクタソリューションズ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーセミコンダクタソリューションズ株式会社 filed Critical ソニーセミコンダクタソリューションズ株式会社
Publication of WO2018105246A1 publication Critical patent/WO2018105246A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/04Devices for conversing with the deaf-blind
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/06Devices for teaching lip-reading

Definitions

  • This technology relates to image sensors. Specifically, the present invention relates to an image sensor for detecting an event using captured image data.
  • a data flow in a system that performs such recognition processing is roughly divided into a data flow that outputs image data as a display image and a data flow that extracts necessary information from the image data and performs recognition processing.
  • an image detection processing device in which a processing element is provided for each pixel of the image sensor has been proposed (for example, see Patent Document 1).
  • image data is transferred and processing data necessary for calculating moments and the like is output to an external processor.
  • further processing is required in an external processor in order to calculate the position of the center of gravity and the like.
  • a processing speed of about 30 to 120 fps (frames / second) is generally sufficient, but this is insufficient for performing advanced recognition processing.
  • This technology was created in view of such circumstances, and aims to speed up recognition processing using image data.
  • the present technology has been made to solve the above-described problems.
  • the first aspect of the present technology is an imaging device that captures an object and generates frames of image data arranged in time series, and the above-described frame. Generate a difference between a binarization processing unit that performs binarization processing on each to generate a binarized frame and the binarized frame adjacent to the time series, and includes the difference in the binarized frame
  • a tracking processing unit that tracks a change in the position of the target object, a moment generation unit that calculates a moment of the target object included in the binarized frame based on a result of the tracking processing unit, and the image data
  • a condition setting unit for setting a condition for detecting a predetermined event, a detection unit for detecting the predetermined event by comparing the moment of the object and the condition set in the condition setting unit,
  • Serial is an image sensor and a control signal supply unit for supplying a control signal to the output device according to the result of detection. Accordingly, there is an effect that the control
  • the image processing apparatus further includes a filter processing unit that performs a filtering process on each of the frames, and the binarization processing unit performs the filtering process on each of the frames subjected to the filtering process. Binarization processing may be performed. This brings about the effect
  • the first aspect may further include a centroid position generation unit that generates a centroid position of the object included in the binarization frame based on the moment generated by the moment generation unit. . This brings about the effect
  • the detection unit detects a dangerous action as the predetermined event based on the movement amount and shape of the object, and the control signal supply unit detects the dangerous action as described above.
  • the control signal that causes the output device to output a warning may be supplied. This brings about the effect
  • the detection unit detects a sign language pattern as the predetermined event based on the movement and shape of the object, performs sign language analysis, and the control signal supply unit performs the sign language analysis.
  • the control signal for causing the output device to output text information based on the analysis may be supplied.
  • the sign language analysis is performed and the text information is output to the output device based on the analysis.
  • the detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, reads the lips, and the control signal supply unit
  • the control signal that causes the output device to output text information based on the result of the lip reading may be supplied. This brings about the effect of reading the lips and causing the output device to output text information based on it.
  • the detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, reads the lips, and the control signal supply unit
  • the control signal for causing the output device to output sign language information based on the result of the lip reading may be supplied. This brings about the effect that lip reading is performed and sign language information is output to the output device based on the lip reading.
  • condition setting unit may set a plurality of the conditions
  • detection unit may perform the detection independently for the plurality of conditions. This brings about the effect
  • FIG. 12 is a flowchart illustrating an example of a lip reading processing procedure according to a second application example of the embodiment of the present technology. It is a figure showing an example of detection by a plurality of conditions in the 3rd application example of an embodiment of this art. It is a figure showing an example of relation between a plurality of conditions and a frame in the 3rd example of application of an embodiment of this art.
  • Embodiment configuration example of detection system
  • First application example risk act detection example
  • Second application example example of assisting hearing impairment
  • Third application example example of detection by multiple conditions
  • FIG. 1 is a diagram illustrating an example of the overall configuration of a detection system according to an embodiment of the present technology.
  • This detection system includes a camera 410, a control unit 420, a behavior learning device 430, a condition holding unit 440, an image sensor 100, an operation input device 310, and an output device 320.
  • the camera 410, the control unit 420, and the behavior learning device 430 constitute a learning phase 401.
  • the image sensor 100, the operation input device 310, and the output device 320 constitute a detection phase 101.
  • the learning result in the learning phase 401 is held in the condition holding unit 440 and is referred to in the detection in the detection phase 101.
  • the camera 410 is an imaging device that is used to capture an image to be learned in the learning phase 401.
  • the camera 410 includes an imaging unit 411.
  • the imaging unit 411 is an imaging element that captures an image of a subject including an object.
  • the object is an object that widely includes non-living objects as well as living things such as people and animals.
  • Image data captured by the camera 410 is output to the control unit 420.
  • a camera separate from the image sensor 100 it is assumed that a camera separate from the image sensor 100 is provided.
  • the learning in the learning phase 401 may be performed using the image sensor 100 as the camera 410.
  • the control unit 420 controls the operation of the camera 410 and supplies the captured image data to the behavior learning device 430.
  • the behavior learning device 430 performs behavior learning based on the image data captured by the camera 410.
  • This behavior learning device 430 can learn not only the discriminator but also the feature amount extraction by deep learning (Deep Learning). In addition, learning that provides better performance than existing recognizers, such as boosting, can be performed simply by preparing a large number of data sets.
  • this behavior learning device 430 filter coefficients, detection conditions, target determination conditions, and the like are obtained as learning results.
  • the condition holding unit 440 holds various conditions obtained by action learning in the action learning device 430 as learning results.
  • the condition held in the condition holding unit 440 is referred to as a condition for detecting a predetermined event in the detection phase 101.
  • the image sensor 100 captures an image of a subject including an object, and detects a predetermined event according to the conditions held in the condition holding unit 440.
  • the operation input device 310 receives an operation input from the outside.
  • the output device 320 outputs information obtained by the image sensor 100.
  • FIG. 2 is a diagram illustrating a configuration example of the image sensor 100 according to the embodiment of the present technology.
  • the image sensor 100 includes a condition setting unit 104, an imaging unit 110, a filter processing unit 120, a binarization processing unit 130, a tracking processing unit 140, a moment generation unit 150, and a centroid position generation unit 160. .
  • the image sensor 100 also includes a totalization processing unit 210, a control unit 220, and an interface 230.
  • the imaging unit 110 is an imaging element that images a subject including a target object.
  • the imaging unit 110 generates frames of image data arranged in time series at a predetermined frame rate.
  • a high frame rate of 1000 frames per second (1000 fps) or more is assumed as the frame rate. It is not necessary for all the frames of image data captured by the imaging unit 110 to be supplied to the outside of the image sensor 100.
  • the image data with a high frame rate is for the purpose of detection described below, and a frame rate lower than this is sufficient for display. In other words, it is possible to effectively utilize the bandwidth of the image sensor 100 by keeping high frame rate image data as a reference in the image sensor 100.
  • the imaging unit 110 is an example of an imaging element described in the claims.
  • the filter processing unit 120 performs a filtering process on each frame of image data captured by the imaging unit 110.
  • the filter processing in the filter processing unit 120 for example, noise removal processing using a moving average filter or median filter, contour detection processing using a Sobel filter, edge detection using a Laplacian filter, or the like is assumed.
  • the number of objects included in the image can be calculated by obtaining the Euler number of the image by the filter processing unit 120.
  • the Euler number is the number of components minus the number of holes.
  • the filter processing unit 120 can extract other feature amounts of the image data.
  • the binarization processing unit 130 performs binarization processing on each of the frames subjected to the filter processing by the filter processing unit 120.
  • the binarization processing unit 130 binarizes the image data based on luminance and color histogram information included in the image data of each frame, and generates a binarized frame including the binarized data.
  • the tracking processing unit 140 detects an object included in the binarized frame by generating a difference between frames adjacent in time series for the binarized frame generated by the binarization processing unit 130. , To track changes in the position of the object. When detecting an object, it is possible to designate a specific region in the image as a measurement target.
  • the moment generation unit 150 calculates the moment of the two-variable function in the binarized frame based on the result of the tracking processing unit 140.
  • the 0th-order moment represents the amount of change in the area of the object included in the binarized frame, and is a value that is invariant to image rotation and enlargement / reduction.
  • the center-of-gravity position generation unit 160 generates the center-of-gravity position of the target object included in the binarized frame based on the moment generated by the moment generation unit 150.
  • a value obtained by dividing the respective primary moments in the horizontal direction and the vertical direction by the zeroth moment represents the position of the center of gravity.
  • the aggregation processing unit 210 performs aggregation processing based on various data obtained by the image sensor 100 and detects a predetermined event. As shown in the application example below, the totalization processing unit 210 performs necessary processing according to the application that operates.
  • the control unit 220 performs operation control on each unit of the image sensor 100.
  • the interface 230 serves as an interface with the outside. In this example, the interface 230 is connected to the output device 320 and causes the output device 320 to display information obtained by the image sensor 100.
  • the aggregation processing unit 210 is an example of a detection unit described in the claims.
  • the interface 230 is an example of a control signal supply unit described in the claims.
  • the output from the center-of-gravity position generation unit 160 clearly shows the route through which the output is supplied to the totalization processing unit 210 and the control unit 220.
  • a route for supplying data may be provided as necessary.
  • the condition setting unit 104 sets conditions for detecting a predetermined event from image data in the detection phase 101.
  • a set value for each action is set as a result of action learning in the learning phase 401.
  • conditions set in the condition setting unit 104 for example, filter coefficients, detection conditions, target determination conditions, and the like are assumed.
  • the filter coefficient is a filter coefficient for facilitating extraction of desired information from the image data, how to process the image data captured by the imaging unit 110.
  • This filter coefficient is mainly used by the filter processing unit 120, but is also used in the binarization processing unit 130 and the like.
  • the detection condition is a condition that the detection phase 101 wants to detect. For example, a dangerous act described later and an event such as a sign language pattern correspond to this.
  • the target determination condition is a target of the condition to be detected in the detection phase 101. For example, if the detection condition is a dangerous act, the person or animal performing the act corresponds to this. Further, if the detection condition is a sign language pattern, this corresponds to an arm or mouth used for sign language.
  • FIG. 3 is a diagram illustrating an example of the operation of the detection phase 101 according to the embodiment of the present technology.
  • condition setting unit 104 a condition for detecting a predetermined event from image data in the detection phase 101 is set. Further, the image data is converted so that the shape of the target can be easily recognized by the filter calculation by the filter processing unit 120 and the binarization processing by the binarization processing unit 130. For example, feature point extraction is performed and the amount of data is reduced.
  • the aggregation processing unit 210 determines the position of the object as a target based on the converted image data, and designates the target to the tracking processing unit 140.
  • the tracking processing unit 140 captures the designated target, updates the target position information, and continues tracking.
  • the aggregation processing unit 210 monitors the movement amount and shape of the target, and confirms whether or not the shape matches the set value of the condition setting unit 104.
  • a control signal is supplied to the output device 320 via the control unit 220.
  • FIG. 4 is a diagram illustrating an example of dangerous behavior detection in the first application example of the embodiment of the present technology.
  • the image processing unit 210 captures an image of an object and determines whether or not the condition set in the condition setting unit 104 is met by the aggregation processing unit 210. Detect. As a result, by predicting a dangerous action and giving a warning, the dangerous action can be suppressed in advance.
  • mischievous actions As examples of dangerous actions in this example, mischievous actions, dangerous actions, harm to people, etc. are assumed.
  • a mischievous act for example, an act of breaking a room by a pet, such as a nail clipper using furniture by a cat, is assumed.
  • the dangerous action for example, an action that may be life-threatening, such as a case where a child is trying to get over the fence of the veranda, is assumed.
  • harm to a person for example, a case where damage is caused to a person by fishing trash of a crow is assumed.
  • a warning sound is output from the output device 320 when the action of nail clipping is detected.
  • the crows are captured as targets, and it is judged whether or not they fall under garbage fishing behavior from the change in the amount of movement and shape. In this case, when a garbage fishing action is detected, a warning sound is output from the output device 320.
  • FIG. 5 is a flowchart showing an example of a dangerous action detection processing procedure in the first application example of the embodiment of the present technology.
  • image data is acquired by imaging an object by the imaging unit 110 (step S811).
  • the acquired image data constitutes a time-series frame.
  • the noise is removed (noise reduction) from the acquired frames by the filter processing unit 120 (step S812). Further, the feature amount is extracted by the filter processing unit 120 (step S813).
  • the binarization processing unit 130 performs binarization based on the color and brightness of the image in the frame (step S814). Thereby, the amount of data to be processed thereafter is reduced.
  • an object such as a cat is determined as a target (step S815). Then, a difference between frames adjacent in time series is generated by the tracking processing unit 140, and target tracking (target tracking) is performed (step S816).
  • moment calculation is performed on the target by the moment generator 150 (step S817).
  • the centroid position generation unit 160 Based on the moment generated by the moment generation unit 150, the centroid position generation unit 160 generates the centroid position of the target included in the binarized frame.
  • the movement amount and shape of the target are calculated by the totalization processing unit 210 (step S818).
  • the aggregation processing unit 210 detects whether or not the target movement amount and shape calculated in this way match the set values set in the condition setting unit 104. If the condition is met (step S819: YES), the control unit 220 supplies a control signal for outputting a warning sound to the output device 320 (step S821).
  • the behavior is predicted based on the movement amount and shape of the target object, so that the dangerous action of the target object is reduced. Can be detected.
  • FIG. 6 is a diagram illustrating an example of assisting hearing impairment in the second application example of the embodiment of the present technology.
  • a person is imaged by the image sensor 100, and whether or not the condition set in the condition setting unit 104 is met is determined by the aggregation processing unit 210. Detect. As a result, it is possible to read the sign language and the movement of the lips and output the corresponding subtitles.
  • the mobile terminal 622 captures an image of the person 621 performing sign language.
  • the portable terminal 622 includes the image sensor 100, and imaging is performed by the imaging unit 110.
  • Target tracking and moment calculation are performed based on the captured image, the finger and arm of the person 621 are detected, and the shape, movement, area, and the like are measured. The motion is then mapped to the corresponding word.
  • a captured person image 623 is displayed on the display unit of the portable terminal 622, and a caption 624 representing the corresponding word is displayed.
  • FIG. 7 is a flowchart showing an example of a sign language analysis processing procedure in the second application example of the embodiment of the present technology.
  • image data is acquired by imaging an object by the imaging unit 110 (step S831).
  • the acquired image data constitutes a time-series frame.
  • the noise is removed from each acquired frame by the filter processing unit 120 (step S832). Further, the feature amount is extracted by the filter processing unit 120 (step S833).
  • binarization is performed by the binarization processing unit 130 based on the color and luminance of the image in the frame (step S834). Thereby, the amount of data to be processed thereafter is reduced.
  • the finger or arm of the person who performs sign language is determined as the target (step S835). Then, a difference between frames adjacent in time series is generated by the tracking processing unit 140, and target tracking (target tracking) is performed (step S836).
  • moment calculation is performed on the target by the moment generator 150 (step S837).
  • the centroid position generation unit 160 Based on the moment generated by the moment generation unit 150, the centroid position generation unit 160 generates the centroid position of the target included in the binarized frame. Further, the totaling processing unit 210 calculates the target motion (shape, movement, area, etc.) (step S838).
  • the aggregation processing unit 210 detects whether or not the target action calculated in this way matches the set value set in the condition setting unit 104. If the condition is met (step S839: Yes), the control unit 220 maps the operation to a word (step S841). And the control signal which outputs a subtitle (text information) etc. as a recognition result is supplied with respect to the output device 320 (step S842).
  • FIG. 8 is a flowchart showing an example of a lip reading processing procedure in the second application example of the embodiment of the present technology.
  • Lip reading means reading the utterance content based on the movement and shape of the lips.
  • image data is acquired by imaging an object by the imaging unit 110 (step S851).
  • the acquired image data constitutes a time-series frame.
  • the noise is removed from each acquired frame by the filter processing unit 120 (step S852).
  • the filter processing unit 120 extracts feature amounts (step S853).
  • the binarization processing unit 130 performs binarization based on the color and brightness of the image in the frame (step S854). Thereby, the amount of data to be processed thereafter is reduced.
  • the lips (mouth) of the person are determined as targets (step S855). Then, a difference between frames adjacent in time series is generated by the tracking processing unit 140, and target tracking (target tracking) is performed (step S856).
  • moment calculation is performed on the target by the moment generator 150 (step S857).
  • the centroid position generation unit 160 Based on the moment generated by the moment generation unit 150, the centroid position generation unit 160 generates the centroid position of the target included in the binarized frame. Further, the totaling processing unit 210 calculates the target movement (shape, movement, area, etc.) (step S858).
  • the aggregation processing unit 210 detects whether or not the target action calculated in this way matches the set value set in the condition setting unit 104.
  • the control unit 220 maps the utterance content corresponding to the operation to the word (step S861).
  • the control signal which outputs a subtitle (text information) etc. as a recognition result is supplied with respect to the output device 320 (step S862).
  • the lip reading result is output as text information.
  • the sign device information may be output to the output device 320 based on the lip reading result.
  • the meaning of the action is recognized based on the movement of a part of the person, thereby assisting the hearing impairment. be able to.
  • FIG. 9 is a diagram illustrating an example of detection based on a plurality of conditions in the third application example of the embodiment of the present technology.
  • the detection processing is executed in a time-sharing manner, and processing is performed as if it is operating in parallel to increase the speed.
  • the image sensor 100 captures the road condition, and the total processing unit 210 detects whether the image matches the vehicle type condition set in the condition setting unit 104. The type of vehicle is detected. And it counts according to the type about the detected vehicle.
  • condition setting unit 104 which type of vehicle the object in the image is is set in the condition setting unit 104 as a separate detection condition for each type.
  • These detection conditions are separate and can be detected independently of each other.
  • the detection conditions used for the detection of the face image are separate and can be detected independently of each other.
  • FIG. 10 is a diagram illustrating a relationship example between a plurality of conditions and frames in the third application example of the embodiment of the present technology.
  • four conditions 441 to 444 are set in the condition setting unit 104.
  • a high frame rate of 1000 fps or higher is assumed as the frame rate.
  • the first condition 441 is detected in the first frame
  • the second condition 442 is detected in the second frame
  • the third condition 443 is detected in the third frame
  • the fourth condition 444 is detected.
  • These four conditions 441 to 444 are separate and can be detected independently of each other. When a high frame rate is assumed, it is not necessary to detect one condition for all frames, and there is little possibility of an error even if frames are thinned out. By performing detection in a time-sharing manner for separate frames as in this example, it is possible to perform processing as if they are operating in parallel.
  • detection can be performed at 20 times the speed.
  • the type and number to be detected can be increased 20 times.
  • five automobiles 631, trucks 632, tuk-tuk 633, and five bicycles (Sam Lo) 634 each correspond to 20 detections.
  • these 20 detections can be performed in the same time.
  • detection based on a plurality of conditions can be performed in real time by assuming processing at a high frame rate of 1000 fps or more as a frame rate.
  • an object is imaged at a high frame rate and detected under the conditions set in the image sensor 100, thereby recognizing the action from the shape change of the object. Can be performed in real time.
  • the processing procedure described in the above embodiment may be regarded as a method having a series of these procedures, and a program for causing a computer to execute these series of procedures or a recording medium storing the program. You may catch it.
  • a recording medium for example, a CD (Compact Disc), an MD (MiniDisc), a DVD (Digital Versatile Disc), a memory card, a Blu-ray disc (Blu-ray (registered trademark) Disc), or the like can be used.
  • this technique can also take the following structures.
  • an image sensor that images a target and generates frames of image data arranged in time series;
  • a binarization processing unit that performs binarization processing on each of the frames to generate a binarized frame;
  • a tracking processing unit that generates a difference between the binarized frames adjacent to each other in time series and tracks a change in the position of the object included in the binarized frame;
  • a moment generation unit that calculates a moment of the object included in the binarized frame based on a result of the tracking processing unit;
  • a condition setting unit for setting conditions for detecting a predetermined event from the image data;
  • a detection unit that detects the predetermined event by comparing the moment of the object and the condition set in the condition setting unit;
  • An image sensor comprising: a control signal supply unit that supplies a control signal to the output device according to the detection result.
  • the detection unit detects a sign language pattern as the predetermined event based on the movement and shape of the object, performs sign language analysis,
  • the detection unit detects the utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
  • the image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that causes the output device to output text information based on the result of the lip reading.
  • the detection unit detects the utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
  • the image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that causes the output device to output sign language information based on the result of the lip reading.
  • the condition setting unit sets a plurality of the conditions, The image sensor according to any one of (1) to (7), wherein the detection unit performs the detection independently for the plurality of conditions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The purpose of the present invention is to increase the speed of recognition processing using image data. An imaging element images an object and generates frames of image data arranged in time series. A binarization processing unit performs binarization processing on each of the frames to generate binarized frames. A tracking processing unit generates differences between adjacent binarized frames in time series and tracks the change in the position of the object included in the binarized frames. A moment generation unit calculates the moment of the object included in the binarized frames on the basis of the results from the tracking processing unit. A condition setting unit sets conditions for detecting a prescribed event from the image data. A detection unit detects the prescribed event by comparing the moment of the object with the conditions set by the condition setting unit. A control signal supply unit supplies a control signal to an output device in accordance with the detection results.

Description

画像センサImage sensor
 本技術は、画像センサに関する。詳しくは、撮像された画像データを利用して事象を検知するための画像センサに関する。 This technology relates to image sensors. Specifically, the present invention relates to an image sensor for detecting an event using captured image data.
 従来、イメージセンサを用いて撮像された画像データを利用して様々な認識処理が行われている。そのような認識処理を行うシステムにおけるデータフローは、画像データを表示画像として出力するデータフローと、画像データから必要な情報を抽出して認識処理を行うデータフローとに大別される。認識処理を行うデータフローを高速化するために、例えば、イメージセンサの画素毎にプロセッシングエレメントを設けた画像検出処理装置が提案されている(例えば、特許文献1参照。)。 Conventionally, various recognition processes have been performed using image data captured using an image sensor. A data flow in a system that performs such recognition processing is roughly divided into a data flow that outputs image data as a display image and a data flow that extracts necessary information from the image data and performs recognition processing. In order to speed up the data flow for performing the recognition processing, for example, an image detection processing device in which a processing element is provided for each pixel of the image sensor has been proposed (for example, see Patent Document 1).
特開2001-195564号公報JP 2001-195564 A
 上述の従来技術では、画像データを転送するとともに、モーメント等の算出に必要な処理データを外部のプロセッサに出力している。しかしながら、その場合、重心位置等を算出するために、外部のプロセッサにおいてさらに処理が必要になる。画像を表示するためには、一般に30乃至120fps(フレーム/秒)程度の処理速度で足りるが、高度な認識処理を行うためにはそれでは不十分である。 In the above-described conventional technology, image data is transferred and processing data necessary for calculating moments and the like is output to an external processor. However, in this case, further processing is required in an external processor in order to calculate the position of the center of gravity and the like. In order to display an image, a processing speed of about 30 to 120 fps (frames / second) is generally sufficient, but this is insufficient for performing advanced recognition processing.
 本技術はこのような状況に鑑みて生み出されたものであり、画像データを利用した認識処理を高速化することを目的とする。 This technology was created in view of such circumstances, and aims to speed up recognition processing using image data.
 本技術は、上述の問題点を解消するためになされたものであり、その第1の側面は、対象物を撮像して時系列に並ぶ画像データのフレームを生成する撮像素子と、上記フレームの各々に対して二値化処理を行って二値化フレームを生成する二値化処理部と、時系列に隣接する上記二値化フレームの間の差分を生成して上記二値化フレームに含まれる上記対象物の位置の変化を追跡するトラッキング処理部と、上記トラッキング処理部による結果に基づいて上記二値化フレームに含まれる上記対象物のモーメントを算出するモーメント生成部と、上記画像データから所定の事象を検知するための条件を設定する条件設定部と、上記対象物のモーメントと上記条件設定部に設定された上記条件とを比較して上記所定の事象を検知する検知部と、上記検知の結果に応じて出力装置に制御信号を供給する制御信号供給部とを具備する画像センサである。これにより、画像センサ内の条件設定部に設定された条件に従って検知された結果に応じて出力装置に制御信号を供給するという作用をもたらす。 The present technology has been made to solve the above-described problems. The first aspect of the present technology is an imaging device that captures an object and generates frames of image data arranged in time series, and the above-described frame. Generate a difference between a binarization processing unit that performs binarization processing on each to generate a binarized frame and the binarized frame adjacent to the time series, and includes the difference in the binarized frame A tracking processing unit that tracks a change in the position of the target object, a moment generation unit that calculates a moment of the target object included in the binarized frame based on a result of the tracking processing unit, and the image data A condition setting unit for setting a condition for detecting a predetermined event, a detection unit for detecting the predetermined event by comparing the moment of the object and the condition set in the condition setting unit, Serial is an image sensor and a control signal supply unit for supplying a control signal to the output device according to the result of detection. Accordingly, there is an effect that the control signal is supplied to the output device according to the result detected according to the condition set in the condition setting unit in the image sensor.
 また、この第1の側面において、上記フレームの各々に対してフィルタ処理を施すフィルタ処理部をさらに具備し、上記二値化処理部は、上記フィルタ処理の施されたフレームの各々に対して上記二値化処理を行うようにしてもよい。これにより、フレームの各々に対してフィルタ処理を施すという作用をもたらす。 Further, in the first aspect, the image processing apparatus further includes a filter processing unit that performs a filtering process on each of the frames, and the binarization processing unit performs the filtering process on each of the frames subjected to the filtering process. Binarization processing may be performed. This brings about the effect | action of performing a filter process with respect to each of a flame | frame.
 また、この第1の側面において、上記モーメント生成部によって生成された上記モーメントに基づいて上記二値化フレームに含まれる上記対象物の重心位置を生成する重心位置生成部をさらに具備してもよい。これにより、二値化フレームに含まれる対象物の重心位置を生成するという作用をもたらす。 The first aspect may further include a centroid position generation unit that generates a centroid position of the object included in the binarization frame based on the moment generated by the moment generation unit. . This brings about the effect | action of producing | generating the gravity center position of the target object contained in a binarization frame.
 また、この第1の側面において、上記検知部は、上記対象物の移動量および形状に基づいて上記所定の事象として危険行為を検知し、上記制御信号供給部は、上記危険行為を検知すると上記出力装置に警告を出力させる上記制御信号を供給するようにしてもよい。これにより、危険行為を検知して出力装置に警告を出力させるという作用をもたらす。 In the first aspect, the detection unit detects a dangerous action as the predetermined event based on the movement amount and shape of the object, and the control signal supply unit detects the dangerous action as described above. The control signal that causes the output device to output a warning may be supplied. This brings about the effect | action of detecting a dangerous act and outputting a warning to an output device.
 また、この第1の側面において、上記検知部は、上記対象物の動きおよび形状に基づいて上記所定の事象として手話のパターンを検知して手話解析を行い、上記制御信号供給部は、上記手話解析に基づいて上記出力装置にテキスト情報を出力させる上記制御信号を供給するようにしてもよい。これにより、手話解析を行って、それに基づいて出力装置にテキスト情報を出力させるという作用をもたらす。 In the first aspect, the detection unit detects a sign language pattern as the predetermined event based on the movement and shape of the object, performs sign language analysis, and the control signal supply unit performs the sign language analysis. The control signal for causing the output device to output text information based on the analysis may be supplied. Thus, the sign language analysis is performed and the text information is output to the output device based on the analysis.
 また、この第1の側面において、上記検知部は、上記対象物における唇の動きおよび形状に基づいて上記所定の事象として発話内容のパターンを検知して読唇を行い、上記制御信号供給部は、上記読唇の結果に基づいて上記出力装置にテキスト情報を出力させる上記制御信号を供給するようにしてもよい。これにより、読唇を行って、それに基づいて出力装置にテキスト情報を出力させるという作用をもたらす。 Further, in this first aspect, the detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, reads the lips, and the control signal supply unit The control signal that causes the output device to output text information based on the result of the lip reading may be supplied. This brings about the effect of reading the lips and causing the output device to output text information based on it.
 また、この第1の側面において、上記検知部は、上記対象物における唇の動きおよび形状に基づいて上記所定の事象として発話内容のパターンを検知して読唇を行い、上記制御信号供給部は、上記読唇の結果に基づいて上記出力装置に手話情報を出力させる上記制御信号を供給するようにしてもよい。これにより、読唇を行って、それに基づいて出力装置に手話情報を出力させるという作用をもたらす。 Further, in this first aspect, the detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, reads the lips, and the control signal supply unit The control signal for causing the output device to output sign language information based on the result of the lip reading may be supplied. This brings about the effect that lip reading is performed and sign language information is output to the output device based on the lip reading.
 また、この第1の側面において、上記条件設定部は、複数の上記条件を設定し、上記検知部は、上記複数の条件について独立に上記検知を行うようにしてもよい。これにより、条件設定部に設定された複数の条件について独立に検知を行うという作用をもたらす。 Further, in the first aspect, the condition setting unit may set a plurality of the conditions, and the detection unit may perform the detection independently for the plurality of conditions. This brings about the effect | action which detects independently about several conditions set to the condition setting part.
 本技術によれば、画像データを利用した認識処理を高速化することができるという優れた効果を奏し得る。なお、ここに記載された効果は必ずしも限定されるものではなく、本開示中に記載されたいずれかの効果であってもよい。 According to the present technology, an excellent effect that the recognition processing using the image data can be accelerated can be achieved. Note that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.
本技術の実施の形態における検知システムの全体構成例を示す図である。It is a figure showing an example of whole composition of a detection system in an embodiment of this art. 本技術の実施の形態における画像センサ100の一構成例を示す図である。It is a figure showing an example of 1 composition of image sensor 100 in an embodiment of this art. 本技術の実施の形態における検知フェーズ101の動作の一例を示す図である。It is a figure showing an example of operation of detection phase 101 in an embodiment of this art. 本技術の実施の形態の第1の適用例における危険行為検知の例を示す図である。It is a figure showing an example of dangerous act detection in the 1st example of application of an embodiment of this art. 本技術の実施の形態の第1の適用例における危険行為検知の処理手順例を示す流れ図である。It is a flowchart which shows the example of a process sequence of the dangerous act detection in the 1st application example of embodiment of this technique. 本技術の実施の形態の第2の適用例における聴覚障害の補助の例を示す図である。It is a figure showing an example of assistance of a hearing disorder in the 2nd example of application of an embodiment of this art. 本技術の実施の形態の第2の適用例における手話解析の処理手順例を示す流れ図である。12 is a flowchart illustrating an example of a processing procedure for sign language analysis in a second application example of the embodiment of the present technology. 本技術の実施の形態の第2の適用例におけるリップリーディングの処理手順例を示す流れ図である。12 is a flowchart illustrating an example of a lip reading processing procedure according to a second application example of the embodiment of the present technology. 本技術の実施の形態の第3の適用例における複数条件による検知の例を示す図である。It is a figure showing an example of detection by a plurality of conditions in the 3rd application example of an embodiment of this art. 本技術の実施の形態の第3の適用例における複数の条件とフレームとの関係例を示す図である。It is a figure showing an example of relation between a plurality of conditions and a frame in the 3rd example of application of an embodiment of this art.
 以下、本技術を実施するための形態(以下、実施の形態と称する)について説明する。説明は以下の順序により行う。
 1.実施の形態(検知システムの構成例)
 2.第1の適用例(危険行為検知の例)
 3.第2の適用例(聴覚障害の補助の例)
 4.第3の適用例(複数条件による検知の例)
Hereinafter, modes for carrying out the present technology (hereinafter referred to as embodiments) will be described. The description will be made in the following order.
1. Embodiment (configuration example of detection system)
2. First application example (risk act detection example)
3. Second application example (example of assisting hearing impairment)
4). Third application example (example of detection by multiple conditions)
 <1.実施の形態>
 [検知システム]
 図1は、本技術の実施の形態における検知システムの全体構成例を示す図である。この検知システムは、カメラ410と、制御部420と、行動学習装置430と、条件保持部440と、画像センサ100と、操作入力装置310と、出力装置320とを備えている。ここで、カメラ410、制御部420、および、行動学習装置430は、学習フェーズ401を構成する。また、画像センサ100、操作入力装置310、および、出力装置320は、検知フェーズ101を構成する。学習フェーズ401による学習結果は、条件保持部440に保持され、検知フェーズ101における検知の際に参照される。
<1. Embodiment>
[Detection system]
FIG. 1 is a diagram illustrating an example of the overall configuration of a detection system according to an embodiment of the present technology. This detection system includes a camera 410, a control unit 420, a behavior learning device 430, a condition holding unit 440, an image sensor 100, an operation input device 310, and an output device 320. Here, the camera 410, the control unit 420, and the behavior learning device 430 constitute a learning phase 401. In addition, the image sensor 100, the operation input device 310, and the output device 320 constitute a detection phase 101. The learning result in the learning phase 401 is held in the condition holding unit 440 and is referred to in the detection in the detection phase 101.
 カメラ410は、学習フェーズ401において学習対象となる画像を撮像するために利用される撮像装置である。このカメラ410は、撮像部411を備える。撮像部411は、対象物を含む被写体を撮像する撮像素子である。対象物は、人物や動物などの生物のみならず、非生物を広く含むオブジェクトである。このカメラ410によって撮像された画像データは制御部420に出力される。なお、この例では、画像センサ100とは別個のカメラを設けることを想定したが、画像センサ100をカメラ410として使用して学習フェーズ401における学習を行うようにしてもよい。 The camera 410 is an imaging device that is used to capture an image to be learned in the learning phase 401. The camera 410 includes an imaging unit 411. The imaging unit 411 is an imaging element that captures an image of a subject including an object. The object is an object that widely includes non-living objects as well as living things such as people and animals. Image data captured by the camera 410 is output to the control unit 420. In this example, it is assumed that a camera separate from the image sensor 100 is provided. However, the learning in the learning phase 401 may be performed using the image sensor 100 as the camera 410.
 制御部420は、カメラ410の動作を制御して、撮像された画像データを行動学習装置430に供給するものである。 The control unit 420 controls the operation of the camera 410 and supplies the captured image data to the behavior learning device 430.
 行動学習装置430は、カメラ410によって撮像された画像データに基づいて、行動学習を行うものである。この行動学習装置430では、ディープラーニング(Deep Learning:深層学習)によって、識別器だけでなく特徴量抽出も同時に学習することができる。また、ブースティング(Boosting)などの既存の認識器よりもよい性能を出す学習が、大量のデータセットを用意するだけで可能である。この行動学習装置430によって、フィルタ係数、検知条件、ターゲット判定条件などが、学習結果として得られる。 The behavior learning device 430 performs behavior learning based on the image data captured by the camera 410. This behavior learning device 430 can learn not only the discriminator but also the feature amount extraction by deep learning (Deep Learning). In addition, learning that provides better performance than existing recognizers, such as boosting, can be performed simply by preparing a large number of data sets. By this behavior learning device 430, filter coefficients, detection conditions, target determination conditions, and the like are obtained as learning results.
 条件保持部440は、行動学習装置430における行動学習により得られた各種の条件を学習結果として保持するものである。この条件保持部440に保持された条件は、検知フェーズ101において所定の事象を検知するための条件として参照される。 The condition holding unit 440 holds various conditions obtained by action learning in the action learning device 430 as learning results. The condition held in the condition holding unit 440 is referred to as a condition for detecting a predetermined event in the detection phase 101.
 画像センサ100は、対象物を含む被写体を撮像して、条件保持部440に保持された条件に従って、所定の事象を検知するものである。操作入力装置310は、外部からの操作入力を受け付けるものである。出力装置320は、画像センサ100によって得られた情報を出力するものである。 The image sensor 100 captures an image of a subject including an object, and detects a predetermined event according to the conditions held in the condition holding unit 440. The operation input device 310 receives an operation input from the outside. The output device 320 outputs information obtained by the image sensor 100.
 図2は、本技術の実施の形態における画像センサ100の一構成例を示す図である。画像センサ100は、条件設定部104と、撮像部110と、フィルタ処理部120と、二値化処理部130と、トラッキング処理部140と、モーメント生成部150と、重心位置生成部160とを備える。また、この画像センサ100は、集計処理部210と、制御部220と、インターフェース230とを備える。 FIG. 2 is a diagram illustrating a configuration example of the image sensor 100 according to the embodiment of the present technology. The image sensor 100 includes a condition setting unit 104, an imaging unit 110, a filter processing unit 120, a binarization processing unit 130, a tracking processing unit 140, a moment generation unit 150, and a centroid position generation unit 160. . The image sensor 100 also includes a totalization processing unit 210, a control unit 220, and an interface 230.
 撮像部110は、対象物を含む被写体を撮像する撮像素子である。この撮像部110は、所定のフレームレートにより、時系列に並ぶ画像データのフレームを生成する。ここで、フレームレートとしては、1秒当たり1000フレーム(1000fps)以上の高フレームレートを想定する。この撮像部110によって撮像された画像データのフレームは、その全てが画像センサ100の外部に供給される必要はない。高フレームレートの画像データは以下に説明する検知を目的としたものであり、表示のためにはこれよりも低いフレームレートで十分である。すなわち、高フレームレートの画像データを画像センサ100内の参照に留めることにより、画像センサ100のバンド幅を有効に活用することが可能となる。なお、撮像部110は、特許請求の範囲に記載の撮像素子の一例である。 The imaging unit 110 is an imaging element that images a subject including a target object. The imaging unit 110 generates frames of image data arranged in time series at a predetermined frame rate. Here, a high frame rate of 1000 frames per second (1000 fps) or more is assumed as the frame rate. It is not necessary for all the frames of image data captured by the imaging unit 110 to be supplied to the outside of the image sensor 100. The image data with a high frame rate is for the purpose of detection described below, and a frame rate lower than this is sufficient for display. In other words, it is possible to effectively utilize the bandwidth of the image sensor 100 by keeping high frame rate image data as a reference in the image sensor 100. The imaging unit 110 is an example of an imaging element described in the claims.
 フィルタ処理部120は、撮像部110によって撮像された画像データのフレームの各々に対してフィルタ処理を施すものである。このフィルタ処理部120におけるフィルタ処理としては、例えば、移動平均フィルタやメディアンフィルタなどによるノイズ除去処理、Sobelフィルタなどによる輪郭検出処理、ラプラシアンフィルタなどによるエッジ検出などが想定される。また、このフィルタ処理部120によって画像のオイラー数を求めることにより、画像に含まれる対象物の個数を算出することも可能である。オイラー数とは、成分数から孔の数を引いた数である。また、このフィルタ処理部120によって、画像データの他の特徴量を抽出することができる。 The filter processing unit 120 performs a filtering process on each frame of image data captured by the imaging unit 110. As the filter processing in the filter processing unit 120, for example, noise removal processing using a moving average filter or median filter, contour detection processing using a Sobel filter, edge detection using a Laplacian filter, or the like is assumed. Further, the number of objects included in the image can be calculated by obtaining the Euler number of the image by the filter processing unit 120. The Euler number is the number of components minus the number of holes. Further, the filter processing unit 120 can extract other feature amounts of the image data.
 二値化処理部130は、フィルタ処理部120によってフィルタ処理の施されたフレームの各々に対して二値化処理を行うものである。この二値化処理部130は、各フレームの画像データに含まれる輝度や色のヒストグラム情報に基づいてその画像データを二値化して、二値化データからなる二値化フレームを生成する。 The binarization processing unit 130 performs binarization processing on each of the frames subjected to the filter processing by the filter processing unit 120. The binarization processing unit 130 binarizes the image data based on luminance and color histogram information included in the image data of each frame, and generates a binarized frame including the binarized data.
 トラッキング処理部140は、二値化処理部130によって生成された二値化フレームについて、時系列に隣接するフレーム間の差分を生成することにより、二値化フレームに含まれる対象物を検出して、その対象物の位置の変化を追跡するものである。対象物の検出の際には、画像における特定の領域を測定対象として指定することが可能である。 The tracking processing unit 140 detects an object included in the binarized frame by generating a difference between frames adjacent in time series for the binarized frame generated by the binarization processing unit 130. , To track changes in the position of the object. When detecting an object, it is possible to designate a specific region in the image as a measurement target.
 モーメント生成部150は、トラッキング処理部140による結果に基づいて、二値化フレームにおける2変数関数のモーメントを算出するものである。0次モーメントは、その二値化フレームに含まれる対象物の面積の変化量を表し、画像の回転や拡大縮小に対して不変な値である。 The moment generation unit 150 calculates the moment of the two-variable function in the binarized frame based on the result of the tracking processing unit 140. The 0th-order moment represents the amount of change in the area of the object included in the binarized frame, and is a value that is invariant to image rotation and enlargement / reduction.
 重心位置生成部160は、モーメント生成部150によって生成されたモーメントに基づいて、二値化フレームに含まれる対象物の重心位置を生成するものである。水平方向および垂直方向の各1次モーメントをそれぞれ0次モーメントで除算した値が、重心位置を表す。 The center-of-gravity position generation unit 160 generates the center-of-gravity position of the target object included in the binarized frame based on the moment generated by the moment generation unit 150. A value obtained by dividing the respective primary moments in the horizontal direction and the vertical direction by the zeroth moment represents the position of the center of gravity.
 集計処理部210は、画像センサ100によって得られた各種データに基づいて集計処理を行って、所定の事象を検知するものである。この集計処理部210は、以下の適用例に示すように、動作するアプリケーションに応じて必要な処理を行う。制御部220は、画像センサ100の各部に対する動作制御を行うものである。インターフェース230は、外部とのインターフェースを司るものである。この例では、インターフェース230は出力装置320と接続して、画像センサ100によって得られた情報を出力装置320に表示させる。なお、集計処理部210は、特許請求の範囲に記載の検知部の一例である。また、インターフェース230は、特許請求の範囲に記載の制御信号供給部の一例である。 The aggregation processing unit 210 performs aggregation processing based on various data obtained by the image sensor 100 and detects a predetermined event. As shown in the application example below, the totalization processing unit 210 performs necessary processing according to the application that operates. The control unit 220 performs operation control on each unit of the image sensor 100. The interface 230 serves as an interface with the outside. In this example, the interface 230 is connected to the output device 320 and causes the output device 320 to display information obtained by the image sensor 100. The aggregation processing unit 210 is an example of a detection unit described in the claims. The interface 230 is an example of a control signal supply unit described in the claims.
 なお、この図においては、重心位置生成部160からの出力が集計処理部210および制御部220に供給される経路を明示しているが、画像センサ100の各部から集計処理部210に対して各種データを供給する経路が必要に応じて設けられてもよい。 In this figure, the output from the center-of-gravity position generation unit 160 clearly shows the route through which the output is supplied to the totalization processing unit 210 and the control unit 220. A route for supplying data may be provided as necessary.
 条件設定部104は、検知フェーズ101において画像データから所定の事象を検知するための条件を設定するものである。この条件設定部104には、学習フェーズ401による行動学習の結果として、行動ごとの設定値が設定される。この条件設定部104に設定される条件としては、例えば、フィルタ係数、検知条件、ターゲット判定条件などが想定される。 The condition setting unit 104 sets conditions for detecting a predetermined event from image data in the detection phase 101. In the condition setting unit 104, a set value for each action is set as a result of action learning in the learning phase 401. As conditions set in the condition setting unit 104, for example, filter coefficients, detection conditions, target determination conditions, and the like are assumed.
 フィルタ係数は、撮像部110によって撮像された画像データをどのように処理するか、その画像データから欲しい情報を抽出し易くするためのフィルタの係数である。このフィルタ係数は、主としてフィルタ処理部120によって用いられるが、二値化処理部130などにおいても利用される。 The filter coefficient is a filter coefficient for facilitating extraction of desired information from the image data, how to process the image data captured by the imaging unit 110. This filter coefficient is mainly used by the filter processing unit 120, but is also used in the binarization processing unit 130 and the like.
 検知条件は、検知フェーズ101において検知したい条件である。例えば、後述する危険行為や、手話パターンなどの事象がこれに該当する。 The detection condition is a condition that the detection phase 101 wants to detect. For example, a dangerous act described later and an event such as a sign language pattern correspond to this.
 ターゲット判定条件は、検知フェーズ101において検知したい条件の対象になるものである。例えば、検知条件が危険行為であれば、その行為を行う人物や動物がこれに該当する。また、検知条件が手話パターンであれば、手話に用いられる腕や口がこれに該当する。 The target determination condition is a target of the condition to be detected in the detection phase 101. For example, if the detection condition is a dangerous act, the person or animal performing the act corresponds to this. Further, if the detection condition is a sign language pattern, this corresponds to an arm or mouth used for sign language.
 図3は、本技術の実施の形態における検知フェーズ101の動作の一例を示す図である。 FIG. 3 is a diagram illustrating an example of the operation of the detection phase 101 according to the embodiment of the present technology.
 条件設定部104には、検知フェーズ101において画像データから所定の事象を検知するための条件が設定されている。また、画像データは、フィルタ処理部120によるフィルタ演算および二値化処理部130における二値化処理によって、ターゲットの形状を認識し易いように変換される。例えば、特徴点抽出が行われ、また、データ量の軽減がなされる。 In the condition setting unit 104, a condition for detecting a predetermined event from image data in the detection phase 101 is set. Further, the image data is converted so that the shape of the target can be easily recognized by the filter calculation by the filter processing unit 120 and the binarization processing by the binarization processing unit 130. For example, feature point extraction is performed and the amount of data is reduced.
 集計処理部210は、変換された画像データに基づいて対象物の位置をターゲットとして決定して、トラッキング処理部140にそのターゲットを指定する。トラッキング処理部140は、指定されたターゲットを捕捉して、ターゲットの位置情報を更新して追跡を継続する。集計処理部210は、ターゲットの移動量および形状を監視して、その形状が条件設定部104の設定値に合致するか否かを確認する。条件設定部104の設定値に合致したことを集計処理部210が検知すると、制御部220を介して出力装置320に制御信号が供給される。 The aggregation processing unit 210 determines the position of the object as a target based on the converted image data, and designates the target to the tracking processing unit 140. The tracking processing unit 140 captures the designated target, updates the target position information, and continues tracking. The aggregation processing unit 210 monitors the movement amount and shape of the target, and confirms whether or not the shape matches the set value of the condition setting unit 104. When the totalization processing unit 210 detects that the setting value of the condition setting unit 104 matches, a control signal is supplied to the output device 320 via the control unit 220.
 このように構成することにより、画像センサ100内の条件設定部104に設定された条件に従って、対象物の形状変化から行動認識をリアルタイムに行い、その検知結果を出力するよう制御することができる。 With this configuration, it is possible to perform control so that action recognition is performed in real time from a change in the shape of an object in accordance with the conditions set in the condition setting unit 104 in the image sensor 100 and the detection result is output.
 <2.第1の適用例>
 図4は、本技術の実施の形態の第1の適用例における危険行為検知の例を示す図である。本実施の形態における検知システムを用いて危険行為検知を行うためには、画像センサ100によって対象物を撮像し、条件設定部104に設定された条件に合致するか否かを集計処理部210によって検知する。これにより、危険行為を予測し、警告を行うことにより、その危険行為を未然に抑止することができる。
<2. First application example>
FIG. 4 is a diagram illustrating an example of dangerous behavior detection in the first application example of the embodiment of the present technology. In order to perform dangerous action detection using the detection system in the present embodiment, the image processing unit 210 captures an image of an object and determines whether or not the condition set in the condition setting unit 104 is met by the aggregation processing unit 210. Detect. As a result, by predicting a dangerous action and giving a warning, the dangerous action can be suppressed in advance.
 この例における危険行為としては、例えば、いたずら行為、危険動作、人への害などが想定される。いたずら行為としては、例えば、猫による家具を用いた爪とぎなどの、ペットによる室内の荒らし行為などが想定される。危険動作としては、例えば、子供がベランダの柵を乗り越えようとしている場合などの、生命に危険を及ぼし得る行為などが想定される。人への害としては、例えば、カラスのゴミ漁りなどによって人に被害を与える場合などが想定される。 As examples of dangerous actions in this example, mischievous actions, dangerous actions, harm to people, etc. are assumed. As a mischievous act, for example, an act of breaking a room by a pet, such as a nail clipper using furniture by a cat, is assumed. As the dangerous action, for example, an action that may be life-threatening, such as a case where a child is trying to get over the fence of the veranda, is assumed. As harm to a person, for example, a case where damage is caused to a person by fishing trash of a crow is assumed.
 同図におけるaに示すように、猫が家具を用いて爪とぎをしようとしている場合、猫がターゲットとして捕捉され、その移動量および形状の変化から爪とぎ行為に該当するか否かが判断される。この場合、爪とぎ行為が検知されると、出力装置320から警告の音声が出力される。 As shown to a in the figure, when the cat is trying to nail using furniture, the cat is captured as a target, and it is determined whether or not it falls under the action of nail pegging from the movement amount and the change in shape. The In this case, a warning sound is output from the output device 320 when the action of nail clipping is detected.
 一方、同図におけるbに示すように、猫がペットグッズや人間にじゃれているような場合には、猫がターゲットとして捕捉されるものの、その移動量および形状の変化から爪とぎ行為に該当しないと判断され、警告は行われない。 On the other hand, as shown in b in the figure, when the cat is petting pet goods or humans, the cat is captured as a target, but it does not fall under the action of nail cutting because of its movement and shape change. And no warning is given.
 また、カラスがゴミ集積場に群がっている場合には、カラスがターゲットとして捕捉され、その移動量および形状の変化からゴミ漁り行為に該当するか否かが判断される。この場合、ゴミ漁り行為が検知されると、出力装置320から警告の音声が出力される。 Also, if crows flock to the garbage dump, the crows are captured as targets, and it is judged whether or not they fall under garbage fishing behavior from the change in the amount of movement and shape. In this case, when a garbage fishing action is detected, a warning sound is output from the output device 320.
 一方、犬、猫、ペンギンなどの動物が散歩の際にゴミ集積場を通りかかった場合には、その動物がターゲットとして捕捉されるものの、その移動量および形状の変化からゴミ漁り行為に該当しないと判断され、警告は行われない。 On the other hand, if an animal such as a dog, cat, or penguin passes through a garbage collection site during a walk, the animal will be captured as a target, but due to changes in the amount of movement and shape, it does not fall under garbage fishing. Judgment is made and no warning is given.
 なお、猫がベランダの柵に飛び乗ったような場合には、警告を行うか否かは適宜選択できるようにすることが考えられる。実際には、それを禁止するためには、柵に乗ってしまったら手遅れであり、飛び乗ろうとしている段階で止める必要がある。 It should be noted that when a cat jumps on the balcony of the veranda, it can be considered that whether or not to give a warning can be appropriately selected. In fact, in order to prohibit it, it is too late to get on the fence, and it is necessary to stop when you are about to jump.
 図5は、本技術の実施の形態の第1の適用例における危険行為検知の処理手順例を示す流れ図である。 FIG. 5 is a flowchart showing an example of a dangerous action detection processing procedure in the first application example of the embodiment of the present technology.
 まず、撮像部110によって対象物を撮像することによって画像データが取得される(ステップS811)。取得された画像データは、時系列のフレームを構成する。 First, image data is acquired by imaging an object by the imaging unit 110 (step S811). The acquired image data constitutes a time-series frame.
 取得された各フレームは、フィルタ処理部120によって、ノイズが除去(ノイズリダクション)される(ステップS812)。また、フィルタ処理部120によって、特徴量の抽出が行われる(ステップS813)。 The noise is removed (noise reduction) from the acquired frames by the filter processing unit 120 (step S812). Further, the feature amount is extracted by the filter processing unit 120 (step S813).
 そして、二値化処理部130によってフレームにおける画像の色および輝度による二値化が行われる(ステップS814)。これにより、その後の処理対象となるデータ量が軽減される。 Then, the binarization processing unit 130 performs binarization based on the color and brightness of the image in the frame (step S814). Thereby, the amount of data to be processed thereafter is reduced.
 二値化されたフレームにおいて、猫などの対象物がターゲットとして決定される(ステップS815)。そして、時系列に隣接するフレーム間の差分がトラッキング処理部140によって生成されて、ターゲットの追跡(ターゲットトラッキング)が行われる(ステップS816)。 In the binarized frame, an object such as a cat is determined as a target (step S815). Then, a difference between frames adjacent in time series is generated by the tracking processing unit 140, and target tracking (target tracking) is performed (step S816).
 また、そのターゲットについて、モーメント生成部150によってモーメント演算が行われる(ステップS817)。モーメント生成部150によって生成されたモーメントに基づいて、重心位置生成部160が二値化フレームに含まれるターゲットの重心位置を生成する。また、集計処理部210によって、ターゲットの移動量および形状が算出される(ステップS818)。 Also, moment calculation is performed on the target by the moment generator 150 (step S817). Based on the moment generated by the moment generation unit 150, the centroid position generation unit 160 generates the centroid position of the target included in the binarized frame. In addition, the movement amount and shape of the target are calculated by the totalization processing unit 210 (step S818).
 このようにして算出されたターゲットの移動量および形状が条件設定部104に設定された設定値に合致するか否かを、集計処理部210が検知する。条件が合致する場合には(ステップS819:Yes)、制御部220は警告の音声を出力する制御信号を出力装置320に対して供給する(ステップS821)。 The aggregation processing unit 210 detects whether or not the target movement amount and shape calculated in this way match the set values set in the condition setting unit 104. If the condition is met (step S819: YES), the control unit 220 supplies a control signal for outputting a warning sound to the output device 320 (step S821).
 これらの処理は、時系列に並ぶ画像データのフレームのそれぞれについて繰り返し行われる。 These processes are repeated for each frame of image data arranged in time series.
 このように、この第1の適用例では、フレームレートとして1000fps以上の高フレームレートを想定して、対象物の移動量および形状などに基づいて行動を予測することにより、対象物の危険行為を検知することができる。 As described above, in the first application example, assuming a high frame rate of 1000 fps or more as the frame rate, the behavior is predicted based on the movement amount and shape of the target object, so that the dangerous action of the target object is reduced. Can be detected.
 <3.第2の適用例>
 図6は、本技術の実施の形態の第2の適用例における聴覚障害の補助の例を示す図である。本実施の形態における検知システムを用いて聴覚障害の補助を行うためには、画像センサ100によって人物を撮像し、条件設定部104に設定された条件に合致するか否かを集計処理部210によって検知する。これにより、手話や唇の動きを読み取って、対応する字幕などを出力することができる。
<3. Second application example>
FIG. 6 is a diagram illustrating an example of assisting hearing impairment in the second application example of the embodiment of the present technology. In order to assist hearing impairment using the detection system according to the present embodiment, a person is imaged by the image sensor 100, and whether or not the condition set in the condition setting unit 104 is met is determined by the aggregation processing unit 210. Detect. As a result, it is possible to read the sign language and the movement of the lips and output the corresponding subtitles.
 ここでは、聴覚障害の補助の一例として、手話解析の例を示している。この例では、人物621が手話を行う様子を、携帯端末622によって撮像する。この携帯端末622には画像センサ100が含まれており、その撮像部110によって撮像が行われる。撮像された画像に基づいてターゲットトラッキングやモーメント演算が行われ、人物621の指や腕が検出され、その形状、動き、面積などが計測される。そして、その動作が対応する単語にマッピングされる。 Here, an example of sign language analysis is shown as an example of assisting hearing impairment. In this example, the mobile terminal 622 captures an image of the person 621 performing sign language. The portable terminal 622 includes the image sensor 100, and imaging is performed by the imaging unit 110. Target tracking and moment calculation are performed based on the captured image, the finger and arm of the person 621 are detected, and the shape, movement, area, and the like are measured. The motion is then mapped to the corresponding word.
 その結果、携帯端末622の表示部には、撮像された人物の画像623が表示されるとともに、対応する単語を表す字幕624が表示される。 As a result, a captured person image 623 is displayed on the display unit of the portable terminal 622, and a caption 624 representing the corresponding word is displayed.
 図7は、本技術の実施の形態の第2の適用例における手話解析の処理手順例を示す流れ図である。 FIG. 7 is a flowchart showing an example of a sign language analysis processing procedure in the second application example of the embodiment of the present technology.
 まず、撮像部110によって対象物を撮像することによって画像データが取得される(ステップS831)。取得された画像データは、時系列のフレームを構成する。 First, image data is acquired by imaging an object by the imaging unit 110 (step S831). The acquired image data constitutes a time-series frame.
 取得された各フレームは、フィルタ処理部120によって、ノイズが除去される(ステップS832)。また、フィルタ処理部120によって、特徴量の抽出が行われる(ステップS833)。 The noise is removed from each acquired frame by the filter processing unit 120 (step S832). Further, the feature amount is extracted by the filter processing unit 120 (step S833).
 そして、二値化処理部130によってフレームにおける画像の色および輝度による二値化が行われる(ステップS834)。これにより、その後の処理対象となるデータ量が軽減される。 Then, binarization is performed by the binarization processing unit 130 based on the color and luminance of the image in the frame (step S834). Thereby, the amount of data to be processed thereafter is reduced.
 二値化されたフレームにおいて、手話を行う人物の指や腕がターゲットとして決定される(ステップS835)。そして、時系列に隣接するフレーム間の差分がトラッキング処理部140によって生成されて、ターゲットの追跡(ターゲットトラッキング)が行われる(ステップS836)。 In the binarized frame, the finger or arm of the person who performs sign language is determined as the target (step S835). Then, a difference between frames adjacent in time series is generated by the tracking processing unit 140, and target tracking (target tracking) is performed (step S836).
 また、そのターゲットについて、モーメント生成部150によってモーメント演算が行われる(ステップS837)。モーメント生成部150によって生成されたモーメントに基づいて、重心位置生成部160が二値化フレームに含まれるターゲットの重心位置を生成する。また、集計処理部210によって、ターゲットの動作(形状、動き、面積など)が算出される(ステップS838)。 Also, moment calculation is performed on the target by the moment generator 150 (step S837). Based on the moment generated by the moment generation unit 150, the centroid position generation unit 160 generates the centroid position of the target included in the binarized frame. Further, the totaling processing unit 210 calculates the target motion (shape, movement, area, etc.) (step S838).
 このようにして算出されたターゲットの動作が条件設定部104に設定された設定値に合致するか否かを、集計処理部210が検知する。条件が合致する場合には(ステップS839:Yes)、制御部220はその動作を単語にマッピングする(ステップS841)。そして、その認識結果として字幕(テキスト情報)等を出力する制御信号を出力装置320に対して供給する(ステップS842)。 The aggregation processing unit 210 detects whether or not the target action calculated in this way matches the set value set in the condition setting unit 104. If the condition is met (step S839: Yes), the control unit 220 maps the operation to a word (step S841). And the control signal which outputs a subtitle (text information) etc. as a recognition result is supplied with respect to the output device 320 (step S842).
 これらの処理は、時系列に並ぶ画像データのフレームのそれぞれについて繰り返し行われる。 These processes are repeated for each frame of image data arranged in time series.
 図8は、本技術の実施の形態の第2の適用例におけるリップリーディングの処理手順例を示す流れ図である。上述の流れ図では手話解析の例を示したが、聴覚障害の補助の他の例としてリップリーディングを行うことが考えられる。リップリーディング(読唇)とは、唇の動きおよび形状などに基づいてその発話内容を読み取ることである。 FIG. 8 is a flowchart showing an example of a lip reading processing procedure in the second application example of the embodiment of the present technology. In the above flowchart, an example of sign language analysis is shown, but it is conceivable to perform lip reading as another example of assisting hearing impairment. Lip reading means reading the utterance content based on the movement and shape of the lips.
 まず、撮像部110によって対象物を撮像することによって画像データが取得される(ステップS851)。取得された画像データは、時系列のフレームを構成する。 First, image data is acquired by imaging an object by the imaging unit 110 (step S851). The acquired image data constitutes a time-series frame.
 取得された各フレームは、フィルタ処理部120によって、ノイズが除去される(ステップS852)。また、フィルタ処理部120によって、特徴量の抽出が行われる(ステップS853)。 The noise is removed from each acquired frame by the filter processing unit 120 (step S852). The filter processing unit 120 extracts feature amounts (step S853).
 そして、二値化処理部130によってフレームにおける画像の色および輝度による二値化が行われる(ステップS854)。これにより、その後の処理対象となるデータ量が軽減される。 Then, the binarization processing unit 130 performs binarization based on the color and brightness of the image in the frame (step S854). Thereby, the amount of data to be processed thereafter is reduced.
 二値化されたフレームにおいて、人物の唇(口元)がターゲットとして決定される(ステップS855)。そして、時系列に隣接するフレーム間の差分がトラッキング処理部140によって生成されて、ターゲットの追跡(ターゲットトラッキング)が行われる(ステップS856)。 In the binarized frame, the lips (mouth) of the person are determined as targets (step S855). Then, a difference between frames adjacent in time series is generated by the tracking processing unit 140, and target tracking (target tracking) is performed (step S856).
 また、そのターゲットについて、モーメント生成部150によってモーメント演算が行われる(ステップS857)。モーメント生成部150によって生成されたモーメントに基づいて、重心位置生成部160が二値化フレームに含まれるターゲットの重心位置を生成する。また、集計処理部210によって、ターゲットの動作(形状、動き、面積など)が算出される(ステップS858)。 Also, moment calculation is performed on the target by the moment generator 150 (step S857). Based on the moment generated by the moment generation unit 150, the centroid position generation unit 160 generates the centroid position of the target included in the binarized frame. Further, the totaling processing unit 210 calculates the target movement (shape, movement, area, etc.) (step S858).
 このようにして算出されたターゲットの動作が条件設定部104に設定された設定値に合致するか否かを、集計処理部210が検知する。条件が合致する場合には(ステップS859:Yes)、制御部220はその動作に対応する発話内容を単語にマッピングする(ステップS861)。そして、その認識結果として字幕(テキスト情報)等を出力する制御信号を出力装置320に対して供給する(ステップS862)。 The aggregation processing unit 210 detects whether or not the target action calculated in this way matches the set value set in the condition setting unit 104. When the condition is met (step S859: Yes), the control unit 220 maps the utterance content corresponding to the operation to the word (step S861). And the control signal which outputs a subtitle (text information) etc. as a recognition result is supplied with respect to the output device 320 (step S862).
 なお、この例では、リップリーディングの結果をテキスト情報により出力する例を示したが、リップリーディングの結果に基づいて出力装置320に手話情報を出力させるようにしてもよい。 In this example, the lip reading result is output as text information. However, the sign device information may be output to the output device 320 based on the lip reading result.
 このように、この第2の適用例では、フレームレートとして1000fps以上の高フレームレートを想定して、人物の一部の動きに基づいてその行動の意味を認識することにより、聴覚障害を補助することができる。 As described above, in this second application example, assuming a high frame rate of 1000 fps or more as the frame rate, the meaning of the action is recognized based on the movement of a part of the person, thereby assisting the hearing impairment. be able to.
 <4.第3の適用例>
 図9は、本技術の実施の形態の第3の適用例における複数条件による検知の例を示す図である。
<4. Third application example>
FIG. 9 is a diagram illustrating an example of detection based on a plurality of conditions in the third application example of the embodiment of the present technology.
 街中での周辺状況調査などでは、車や歩行者が自由に動いており、それぞれが正面を向いているとは限らない。これらを的確に判別して検知するためには複数の検知条件を設定して、それらを高速に検知していくことが有用である。この例では、高フレームレートを想定することにより、検知処理を時分割に実行して、あたかも並列に動作しているかのように処理して高速化を図る。 In the surrounding situation surveys etc. in the city, cars and pedestrians are moving freely, and each is not necessarily facing the front. In order to accurately detect and detect these, it is useful to set a plurality of detection conditions and detect them at high speed. In this example, assuming a high frame rate, the detection processing is executed in a time-sharing manner, and processing is performed as if it is operating in parallel to increase the speed.
 この例の繁華街の道路では、車両として、自動車631、トラック632、トゥクトゥク633、および、自転車(サムロー)634などが混在して通行している。通行量調査では、画像センサ100によって道路の様子を撮像し、その画像について条件設定部104に設定された車両の種類の条件に合致するか否かを集計処理部210によって検知することにより、それぞれの車両の種類を検知する。そして、その検知された車両について種類別に計数する。 In this example downtown road, vehicles 631, truck 632, tuk tuk 633, bicycle (Sam Lo) 634 and the like are mixed as vehicles. In the traffic volume survey, the image sensor 100 captures the road condition, and the total processing unit 210 detects whether the image matches the vehicle type condition set in the condition setting unit 104. The type of vehicle is detected. And it counts according to the type about the detected vehicle.
 この場合、画像内の対象物が何れの種類の車両であるかは、種類ごとに別々の検知条件として条件設定部104に設定される。これら検知条件は別個のものであり、互いに独立に検知を行うことが可能である。 In this case, which type of vehicle the object in the image is is set in the condition setting unit 104 as a separate detection condition for each type. These detection conditions are separate and can be detected independently of each other.
 また、同様の手法により、指名手配犯を検知することもできる。繁華街の画像において人物の特徴量を算出して、検出された顔画像をデータベースの画像と比較することにより、指名手配犯の検知を行う。この場合においても、顔画像の検知に用いられる検知条件は別個のものであり、互いに独立に検知を行うことが可能である。 Also, it is possible to detect a wanted crime by a similar method. The feature amount of the person is calculated in the image of the downtown area and the detected face image is compared with the image in the database, thereby detecting the wanted crime. Even in this case, the detection conditions used for the detection of the face image are separate and can be detected independently of each other.
 図10は、本技術の実施の形態の第3の適用例における複数の条件とフレームとの関係例を示す図である。ここでは、条件設定部104に4つの条件441乃至444が設定されていることを想定する。また、フレームレートとして1000fps以上の高フレームレートを想定する。 FIG. 10 is a diagram illustrating a relationship example between a plurality of conditions and frames in the third application example of the embodiment of the present technology. Here, it is assumed that four conditions 441 to 444 are set in the condition setting unit 104. A high frame rate of 1000 fps or higher is assumed as the frame rate.
 この例では、第1フレームにおいては第1の条件441の検知を行い、第2フレームにおいては第2の条件442の検知を行い、第3フレームにおいては第3の条件443の検知を行い、第4フレームにおいては第4の条件444の検知を行う。これら4つの条件441乃至444は別個のものであり、互いに独立に検知を行うことが可能である。高フレームレートを想定した場合、1つの条件の検知を全てのフレームに対して行う必要はなく、フレームを間引いて行っても誤差を生じる可能性は少ない。この例のように別個のフレームに対して時分割に検知を行うことにより、あたかも並列に動作しているかのような処理を行うことが可能である。 In this example, the first condition 441 is detected in the first frame, the second condition 442 is detected in the second frame, the third condition 443 is detected in the third frame, In the 4th frame, the fourth condition 444 is detected. These four conditions 441 to 444 are separate and can be detected independently of each other. When a high frame rate is assumed, it is not necessary to detect one condition for all frames, and there is little possibility of an error even if frames are thinned out. By performing detection in a time-sharing manner for separate frames as in this example, it is possible to perform processing as if they are operating in parallel.
 例えば、50fpsの動作を基準とすると、1000fpsで動作すれば20倍の速度で検知を行うことが可能になる。検知する種類および数を20倍にすることができる。上述の例では、自動車631、トラック632、トゥクトゥク633、および、自転車(サムロー)634をそれぞれ5台ずつで20個の検知に該当する。50fpsの動作で1つの検知を行っていた場合、1000fpsで動作すればこれら20個の検知を同等の時間で行うことができることになる。 For example, on the basis of the operation at 50 fps, if it operates at 1000 fps, detection can be performed at 20 times the speed. The type and number to be detected can be increased 20 times. In the above-described example, five automobiles 631, trucks 632, tuk-tuk 633, and five bicycles (Sam Lo) 634 each correspond to 20 detections. When one detection is performed at an operation of 50 fps, if the operation is performed at 1000 fps, these 20 detections can be performed in the same time.
 このように、この第3の適用例では、フレームレートとして1000fps以上の高フレームレートによる処理を想定することにより、複数の条件による検知をリアルタイムに行うことができる。 As described above, in the third application example, detection based on a plurality of conditions can be performed in real time by assuming processing at a high frame rate of 1000 fps or more as a frame rate.
 ここまで説明したように、本技術の実施の形態によれば、対象物を高フレームレートにより撮像して画像センサ100内に設定された条件で検知することにより、対象物の形状変化から行動認識をリアルタイムに行うことができる。 As described so far, according to the embodiment of the present technology, an object is imaged at a high frame rate and detected under the conditions set in the image sensor 100, thereby recognizing the action from the shape change of the object. Can be performed in real time.
 なお、上述の実施の形態は本技術を具現化するための一例を示したものであり、実施の形態における事項と、特許請求の範囲における発明特定事項とはそれぞれ対応関係を有する。同様に、特許請求の範囲における発明特定事項と、これと同一名称を付した本技術の実施の形態における事項とはそれぞれ対応関係を有する。ただし、本技術は実施の形態に限定されるものではなく、その要旨を逸脱しない範囲において実施の形態に種々の変形を施すことにより具現化することができる。 The above-described embodiment shows an example for embodying the present technology, and the matters in the embodiment and the invention-specific matters in the claims have a corresponding relationship. Similarly, the invention specific matter in the claims and the matter in the embodiment of the present technology having the same name as this have a corresponding relationship. However, the present technology is not limited to the embodiment, and can be embodied by making various modifications to the embodiment without departing from the gist thereof.
 また、上述の実施の形態において説明した処理手順は、これら一連の手順を有する方法として捉えてもよく、また、これら一連の手順をコンピュータに実行させるためのプログラム乃至そのプログラムを記憶する記録媒体として捉えてもよい。この記録媒体として、例えば、CD(Compact Disc)、MD(MiniDisc)、DVD(Digital Versatile Disc)、メモリカード、ブルーレイディスク(Blu-ray(登録商標)Disc)等を用いることができる。 Further, the processing procedure described in the above embodiment may be regarded as a method having a series of these procedures, and a program for causing a computer to execute these series of procedures or a recording medium storing the program. You may catch it. As this recording medium, for example, a CD (Compact Disc), an MD (MiniDisc), a DVD (Digital Versatile Disc), a memory card, a Blu-ray disc (Blu-ray (registered trademark) Disc), or the like can be used.
 なお、本明細書に記載された効果はあくまで例示であって、限定されるものではなく、また、他の効果があってもよい。 It should be noted that the effects described in this specification are merely examples, and are not limited, and other effects may be obtained.
 なお、本技術は以下のような構成もとることができる。
(1)対象物を撮像して時系列に並ぶ画像データのフレームを生成する撮像素子と、
 前記フレームの各々に対して二値化処理を行って二値化フレームを生成する二値化処理部と、
 時系列に隣接する前記二値化フレームの間の差分を生成して前記二値化フレームに含まれる前記対象物の位置の変化を追跡するトラッキング処理部と、
 前記トラッキング処理部による結果に基づいて前記二値化フレームに含まれる前記対象物のモーメントを算出するモーメント生成部と、
 前記画像データから所定の事象を検知するための条件を設定する条件設定部と、
 前記対象物のモーメントと前記条件設定部に設定された前記条件とを比較して前記所定の事象を検知する検知部と、
 前記検知の結果に応じて出力装置に制御信号を供給する制御信号供給部と
を具備する画像センサ。
(2)前記フレームの各々に対してフィルタ処理を施すフィルタ処理部をさらに具備し、
 前記二値化処理部は、前記フィルタ処理の施されたフレームの各々に対して前記二値化処理を行う
前記(1)に記載の画像センサ。
(3)前記モーメント生成部によって生成された前記モーメントに基づいて前記二値化フレームに含まれる前記対象物の重心位置を生成する重心位置生成部をさらに具備する前記(1)または(2)に記載の画像センサ。
(4)前記検知部は、前記対象物の移動量および形状に基づいて前記所定の事象として危険行為を検知し、
 前記制御信号供給部は、前記危険行為を検知すると前記出力装置に警告を出力させる前記制御信号を供給する
前記(1)から(3)のいずれかに記載の画像センサ。
(5)前記検知部は、前記対象物の動きおよび形状に基づいて前記所定の事象として手話のパターンを検知して手話解析を行い、
 前記制御信号供給部は、前記手話解析に基づいて前記出力装置にテキスト情報を出力させる前記制御信号を供給する
前記(1)から(3)のいずれかに記載の画像センサ。
(6)前記検知部は、前記対象物における唇の動きおよび形状に基づいて前記所定の事象として発話内容のパターンを検知して読唇を行い、
 前記制御信号供給部は、前記読唇の結果に基づいて前記出力装置にテキスト情報を出力させる前記制御信号を供給する
前記(1)から(3)のいずれかに記載の画像センサ。
(7)前記検知部は、前記対象物における唇の動きおよび形状に基づいて前記所定の事象として発話内容のパターンを検知して読唇を行い、
 前記制御信号供給部は、前記読唇の結果に基づいて前記出力装置に手話情報を出力させる前記制御信号を供給する
前記(1)から(3)のいずれかに記載の画像センサ。
(8)前記条件設定部は、複数の前記条件を設定し、
 前記検知部は、前記複数の条件について独立に前記検知を行う
前記(1)から(7)のいずれかに記載の画像センサ。
In addition, this technique can also take the following structures.
(1) an image sensor that images a target and generates frames of image data arranged in time series;
A binarization processing unit that performs binarization processing on each of the frames to generate a binarized frame;
A tracking processing unit that generates a difference between the binarized frames adjacent to each other in time series and tracks a change in the position of the object included in the binarized frame;
A moment generation unit that calculates a moment of the object included in the binarized frame based on a result of the tracking processing unit;
A condition setting unit for setting conditions for detecting a predetermined event from the image data;
A detection unit that detects the predetermined event by comparing the moment of the object and the condition set in the condition setting unit;
An image sensor comprising: a control signal supply unit that supplies a control signal to the output device according to the detection result.
(2) further comprising a filter processing unit that performs a filtering process on each of the frames;
The image sensor according to (1), wherein the binarization processing unit performs the binarization processing on each of the frames subjected to the filter processing.
(3) In (1) or (2), further comprising a centroid position generation unit that generates a centroid position of the object included in the binarized frame based on the moment generated by the moment generation unit. The image sensor described.
(4) The detection unit detects a dangerous act as the predetermined event based on the movement amount and shape of the object,
The image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that outputs a warning to the output device when the dangerous action is detected.
(5) The detection unit detects a sign language pattern as the predetermined event based on the movement and shape of the object, performs sign language analysis,
The image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that causes the output device to output text information based on the sign language analysis.
(6) The detection unit detects the utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
The image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that causes the output device to output text information based on the result of the lip reading.
(7) The detection unit detects the utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
The image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that causes the output device to output sign language information based on the result of the lip reading.
(8) The condition setting unit sets a plurality of the conditions,
The image sensor according to any one of (1) to (7), wherein the detection unit performs the detection independently for the plurality of conditions.
 100 画像センサ
 101 検知フェーズ
 104 条件設定部
 110 撮像部
 120 フィルタ処理部
 130 二値化処理部
 140 トラッキング処理部
 150 モーメント生成部
 160 重心位置生成部
 210 集計処理部
 220 制御部
 230 インターフェース
 310 操作入力装置
 320 出力装置
 401 学習フェーズ
 410 カメラ
 411 撮像部
 420 制御部
 430 行動学習装置
 440 条件保持部
 621 人物
 622 携帯端末
 624 字幕
 631 自動車
 632 トラック
 633 トゥクトゥク
 634 自転車(サムロー)
DESCRIPTION OF SYMBOLS 100 Image sensor 101 Detection phase 104 Condition setting part 110 Imaging part 120 Filter process part 130 Binarization process part 140 Tracking process part 150 Moment generation part 160 Centroid position generation part 210 Aggregation process part 220 Control part 230 Interface 310 Operation input device 320 Output device 401 Learning phase 410 Camera 411 Imaging unit 420 Control unit 430 Behavior learning device 440 Condition holding unit 621 Person 622 Portable terminal 624 Subtitle 631 Car 632 Truck 633 Tuk tuk 634 Bicycle (Samlow)

Claims (8)

  1.  対象物を撮像して時系列に並ぶ画像データのフレームを生成する撮像素子と、
     前記フレームの各々に対して二値化処理を行って二値化フレームを生成する二値化処理部と、
     時系列に隣接する前記二値化フレームの間の差分を生成して前記二値化フレームに含まれる前記対象物の位置の変化を追跡するトラッキング処理部と、
     前記トラッキング処理部による結果に基づいて前記二値化フレームに含まれる前記対象物のモーメントを算出するモーメント生成部と、
     前記画像データから所定の事象を検知するための条件を設定する条件設定部と、
     前記対象物のモーメントと前記条件設定部に設定された前記条件とを比較して前記所定の事象を検知する検知部と、
     前記検知の結果に応じて出力装置に制御信号を供給する制御信号供給部と
    を具備する画像センサ。
    An image sensor that images a target and generates frames of image data arranged in time series; and
    A binarization processing unit that performs binarization processing on each of the frames to generate a binarized frame;
    A tracking processing unit that generates a difference between the binarized frames adjacent to each other in time series and tracks a change in the position of the object included in the binarized frame;
    A moment generation unit that calculates a moment of the object included in the binarized frame based on a result of the tracking processing unit;
    A condition setting unit for setting conditions for detecting a predetermined event from the image data;
    A detection unit that detects the predetermined event by comparing the moment of the object and the condition set in the condition setting unit;
    An image sensor comprising: a control signal supply unit that supplies a control signal to the output device according to the detection result.
  2.  前記フレームの各々に対してフィルタ処理を施すフィルタ処理部をさらに具備し、
     前記二値化処理部は、前記フィルタ処理の施されたフレームの各々に対して前記二値化処理を行う
    請求項1記載の画像センサ。
    A filter processing unit that performs a filtering process on each of the frames;
    The image sensor according to claim 1, wherein the binarization processing unit performs the binarization processing on each of the frames subjected to the filter processing.
  3.  前記モーメント生成部によって生成された前記モーメントに基づいて前記二値化フレームに含まれる前記対象物の重心位置を生成する重心位置生成部をさらに具備する請求項1記載の画像センサ。 The image sensor according to claim 1, further comprising a centroid position generation unit that generates a centroid position of the object included in the binarized frame based on the moment generated by the moment generation unit.
  4.  前記検知部は、前記対象物の移動量および形状に基づいて前記所定の事象として危険行為を検知し、
     前記制御信号供給部は、前記危険行為を検知すると前記出力装置に警告を出力させる前記制御信号を供給する
    請求項1記載の画像センサ。
    The detection unit detects a dangerous act as the predetermined event based on the movement amount and shape of the object,
    The image sensor according to claim 1, wherein the control signal supply unit supplies the control signal that causes the output device to output a warning when the dangerous action is detected.
  5.  前記検知部は、前記対象物の動きおよび形状に基づいて前記所定の事象として手話のパターンを検知して手話解析を行い、
     前記制御信号供給部は、前記手話解析に基づいて前記出力装置にテキスト情報を出力させる前記制御信号を供給する
    請求項1記載の画像センサ。
    The detection unit detects a sign language pattern as the predetermined event based on the movement and shape of the object, performs sign language analysis,
    The image sensor according to claim 1, wherein the control signal supply unit supplies the control signal that causes the output device to output text information based on the sign language analysis.
  6.  前記検知部は、前記対象物における唇の動きおよび形状に基づいて前記所定の事象として発話内容のパターンを検知して読唇を行い、
     前記制御信号供給部は、前記読唇の結果に基づいて前記出力装置にテキスト情報を出力させる前記制御信号を供給する
    請求項1記載の画像センサ。
    The detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
    The image sensor according to claim 1, wherein the control signal supply unit supplies the control signal that causes the output device to output text information based on the result of the lip reading.
  7.  前記検知部は、前記対象物における唇の動きおよび形状に基づいて前記所定の事象として発話内容のパターンを検知して読唇を行い、
     前記制御信号供給部は、前記読唇の結果に基づいて前記出力装置に手話情報を出力させる前記制御信号を供給する
    請求項1記載の画像センサ。
    The detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
    The image sensor according to claim 1, wherein the control signal supply unit supplies the control signal that causes the output device to output sign language information based on the result of the lip reading.
  8.  前記条件設定部は、複数の前記条件を設定し、
     前記検知部は、前記複数の条件について独立に前記検知を行う
    請求項1記載の画像センサ。
    The condition setting unit sets a plurality of the conditions,
    The image sensor according to claim 1, wherein the detection unit performs the detection independently for the plurality of conditions.
PCT/JP2017/037867 2016-12-07 2017-10-19 Image sensor WO2018105246A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016237175A JP2018092494A (en) 2016-12-07 2016-12-07 Image sensor
JP2016-237175 2016-12-07

Publications (1)

Publication Number Publication Date
WO2018105246A1 true WO2018105246A1 (en) 2018-06-14

Family

ID=62491822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/037867 WO2018105246A1 (en) 2016-12-07 2017-10-19 Image sensor

Country Status (2)

Country Link
JP (1) JP2018092494A (en)
WO (1) WO2018105246A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7437983B2 (en) 2020-03-11 2024-02-26 日本放送協会 Conversion device and program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07129781A (en) * 1993-11-05 1995-05-19 Ken Ishihara Device and method for extracting time series image information
JP2001076156A (en) * 1999-09-03 2001-03-23 Mitsubishi Electric Corp Device for monitoring image
JP2008287594A (en) * 2007-05-18 2008-11-27 Nippon Hoso Kyokai <Nhk> Specific movement determination device, reference data generation device, specific movement determination program and reference data generation program
WO2010084902A1 (en) * 2009-01-22 2010-07-29 株式会社日立国際電気 Intrusion alarm video processing device
JP2012238293A (en) * 2011-04-28 2012-12-06 Nextedge Technology Inc Input device
JP2013037675A (en) * 2011-06-23 2013-02-21 Omek Interactive Ltd System and method for close-range movement tracking

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07129781A (en) * 1993-11-05 1995-05-19 Ken Ishihara Device and method for extracting time series image information
JP2001076156A (en) * 1999-09-03 2001-03-23 Mitsubishi Electric Corp Device for monitoring image
JP2008287594A (en) * 2007-05-18 2008-11-27 Nippon Hoso Kyokai <Nhk> Specific movement determination device, reference data generation device, specific movement determination program and reference data generation program
WO2010084902A1 (en) * 2009-01-22 2010-07-29 株式会社日立国際電気 Intrusion alarm video processing device
JP2012238293A (en) * 2011-04-28 2012-12-06 Nextedge Technology Inc Input device
JP2013037675A (en) * 2011-06-23 2013-02-21 Omek Interactive Ltd System and method for close-range movement tracking

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7437983B2 (en) 2020-03-11 2024-02-26 日本放送協会 Conversion device and program

Also Published As

Publication number Publication date
JP2018092494A (en) 2018-06-14

Similar Documents

Publication Publication Date Title
CN102997900B (en) Vehicle systems, devices, and methods for recognizing external worlds
CN106952303B (en) Vehicle distance detection method, device and system
JP4173901B2 (en) Vehicle periphery monitoring device
KR101116273B1 (en) Apparatus and Method for Traffic Accident Recognition
US10210420B2 (en) Image processing method, image processing apparatus, and recording medium
US8879786B2 (en) Method for detecting and/or tracking objects in motion in a scene under surveillance that has interfering factors; apparatus; and computer program
US20120147188A1 (en) Vehicle vicinity monitoring apparatus
CN112513873A (en) Identification of pedestrian&#39;s movement intention from camera images
JP2005354597A (en) Vehicle vicinity monitoring apparatus
JP4852355B2 (en) Abandoned object detection device and abandoned object detection method
CN110730966A (en) System and method for pedestrian detection
JP6436357B2 (en) Pedestrian motion identification device for vehicle
KR20140004291A (en) Forward collision warning system and forward collision warning method
US10282634B2 (en) Image processing method, image processing apparatus, and recording medium for reducing variation in quality of training data items
JP2007310805A (en) Object recognizing device
CN111626170A (en) Image identification method for railway slope rockfall invasion limit detection
Dozza et al. Recognizing Safetycritical Events from Naturalistic Driving Data
KR101256873B1 (en) A method for tracking object, an object tracking apparatus and a traffic watching system
Ramesh et al. An automated vision-based method to detect elephants for mitigation of human-elephant conflicts
KR20170137273A (en) Apparatus and Method for Pedestrian Detection using Deformable Part Model
CN115083199B (en) Parking space information determining method and related equipment thereof
WO2018105246A1 (en) Image sensor
CN111832450B (en) Knife holding detection method based on image recognition
Irhebhude et al. Speed breakers, road marking detection and recognition using image processing techniques
KR20170048108A (en) Method and system for recognizing object and environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17879082

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17879082

Country of ref document: EP

Kind code of ref document: A1