WO2018105246A1

WO2018105246A1 - Image sensor

Info

Publication number: WO2018105246A1
Application number: PCT/JP2017/037867
Authority: WO
Inventors: 久美子馬原; 佐伯　隆司
Original assignee: ソニーセミコンダクタソリューションズ株式会社
Priority date: 2016-12-07
Filing date: 2017-10-19
Publication date: 2018-06-14
Also published as: JP2018092494A

Abstract

The purpose of the present invention is to increase the speed of recognition processing using image data. An imaging element images an object and generates frames of image data arranged in time series. A binarization processing unit performs binarization processing on each of the frames to generate binarized frames. A tracking processing unit generates differences between adjacent binarized frames in time series and tracks the change in the position of the object included in the binarized frames. A moment generation unit calculates the moment of the object included in the binarized frames on the basis of the results from the tracking processing unit. A condition setting unit sets conditions for detecting a prescribed event from the image data. A detection unit detects the prescribed event by comparing the moment of the object with the conditions set by the condition setting unit. A control signal supply unit supplies a control signal to an output device in accordance with the detection results.

Description

Image sensor

This technology relates to image sensors. Specifically, the present invention relates to an image sensor for detecting an event using captured image data.

Conventionally, various recognition processes have been performed using image data captured using an image sensor. A data flow in a system that performs such recognition processing is roughly divided into a data flow that outputs image data as a display image and a data flow that extracts necessary information from the image data and performs recognition processing. In order to speed up the data flow for performing the recognition processing, for example, an image detection processing device in which a processing element is provided for each pixel of the image sensor has been proposed (for example, see Patent Document 1).

JP 2001-195564 A

In the above-described conventional technology, image data is transferred and processing data necessary for calculating moments and the like is output to an external processor. However, in this case, further processing is required in an external processor in order to calculate the position of the center of gravity and the like. In order to display an image, a processing speed of about 30 to 120 fps (frames / second) is generally sufficient, but this is insufficient for performing advanced recognition processing.

This technology was created in view of such circumstances, and aims to speed up recognition processing using image data.

The present technology has been made to solve the above-described problems. The first aspect of the present technology is an imaging device that captures an object and generates frames of image data arranged in time series, and the above-described frame. Generate a difference between a binarization processing unit that performs binarization processing on each to generate a binarized frame and the binarized frame adjacent to the time series, and includes the difference in the binarized frame A tracking processing unit that tracks a change in the position of the target object, a moment generation unit that calculates a moment of the target object included in the binarized frame based on a result of the tracking processing unit, and the image data A condition setting unit for setting a condition for detecting a predetermined event, a detection unit for detecting the predetermined event by comparing the moment of the object and the condition set in the condition setting unit, Serial is an image sensor and a control signal supply unit for supplying a control signal to the output device according to the result of detection. Accordingly, there is an effect that the control signal is supplied to the output device according to the result detected according to the condition set in the condition setting unit in the image sensor.

Further, in the first aspect, the image processing apparatus further includes a filter processing unit that performs a filtering process on each of the frames, and the binarization processing unit performs the filtering process on each of the frames subjected to the filtering process. Binarization processing may be performed. This brings about the effect | action of performing a filter process with respect to each of a flame | frame.

The first aspect may further include a centroid position generation unit that generates a centroid position of the object included in the binarization frame based on the moment generated by the moment generation unit. . This brings about the effect | action of producing | generating the gravity center position of the target object contained in a binarization frame.

In the first aspect, the detection unit detects a dangerous action as the predetermined event based on the movement amount and shape of the object, and the control signal supply unit detects the dangerous action as described above. The control signal that causes the output device to output a warning may be supplied. This brings about the effect | action of detecting a dangerous act and outputting a warning to an output device.

In the first aspect, the detection unit detects a sign language pattern as the predetermined event based on the movement and shape of the object, performs sign language analysis, and the control signal supply unit performs the sign language analysis. The control signal for causing the output device to output text information based on the analysis may be supplied. Thus, the sign language analysis is performed and the text information is output to the output device based on the analysis.

Further, in this first aspect, the detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, reads the lips, and the control signal supply unit The control signal that causes the output device to output text information based on the result of the lip reading may be supplied. This brings about the effect of reading the lips and causing the output device to output text information based on it.

Further, in this first aspect, the detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, reads the lips, and the control signal supply unit The control signal for causing the output device to output sign language information based on the result of the lip reading may be supplied. This brings about the effect that lip reading is performed and sign language information is output to the output device based on the lip reading.

Further, in the first aspect, the condition setting unit may set a plurality of the conditions, and the detection unit may perform the detection independently for the plurality of conditions. This brings about the effect | action which detects independently about several conditions set to the condition setting part.

According to the present technology, an excellent effect that the recognition processing using the image data can be accelerated can be achieved. Note that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

It is a figure showing an example of whole composition of a detection system in an embodiment of this art. It is a figure showing an example of 1 composition of image sensor 100 in an embodiment of this art. It is a figure showing an example of operation of detection phase 101 in an embodiment of this art. It is a figure showing an example of dangerous act detection in the 1st example of application of an embodiment of this art. It is a flowchart which shows the example of a process sequence of the dangerous act detection in the 1st application example of embodiment of this technique. It is a figure showing an example of assistance of a hearing disorder in the 2nd example of application of an embodiment of this art. 12 is a flowchart illustrating an example of a processing procedure for sign language analysis in a second application example of the embodiment of the present technology. 12 is a flowchart illustrating an example of a lip reading processing procedure according to a second application example of the embodiment of the present technology. It is a figure showing an example of detection by a plurality of conditions in the 3rd application example of an embodiment of this art. It is a figure showing an example of relation between a plurality of conditions and a frame in the 3rd example of application of an embodiment of this art.

Hereinafter, modes for carrying out the present technology (hereinafter referred to as embodiments) will be described. The description will be made in the following order.
1. Embodiment (configuration example of detection system)
2. First application example (risk act detection example)
3. Second application example (example of assisting hearing impairment)
4). Third application example (example of detection by multiple conditions)

<1. Embodiment>
[Detection system]
FIG. 1 is a diagram illustrating an example of the overall configuration of a detection system according to an embodiment of the present technology. This detection system includes a camera 410, a control unit 420, a behavior learning device 430, a condition holding unit 440, an image sensor 100, an operation input device 310, and an output device 320. Here, the camera 410, the control unit 420, and the behavior learning device 430 constitute a learning phase 401. In addition, the image sensor 100, the operation input device 310, and the output device 320 constitute a detection phase 101. The learning result in the learning phase 401 is held in the condition holding unit 440 and is referred to in the detection in the detection phase 101.

The camera 410 is an imaging device that is used to capture an image to be learned in the learning phase 401. The camera 410 includes an imaging unit 411. The imaging unit 411 is an imaging element that captures an image of a subject including an object. The object is an object that widely includes non-living objects as well as living things such as people and animals. Image data captured by the camera 410 is output to the control unit 420. In this example, it is assumed that a camera separate from the image sensor 100 is provided. However, the learning in the learning phase 401 may be performed using the image sensor 100 as the camera 410.

The control unit 420 controls the operation of the camera 410 and supplies the captured image data to the behavior learning device 430.

The behavior learning device 430 performs behavior learning based on the image data captured by the camera 410. This behavior learning device 430 can learn not only the discriminator but also the feature amount extraction by deep learning (Deep Learning). In addition, learning that provides better performance than existing recognizers, such as boosting, can be performed simply by preparing a large number of data sets. By this behavior learning device 430, filter coefficients, detection conditions, target determination conditions, and the like are obtained as learning results.

The condition holding unit 440 holds various conditions obtained by action learning in the action learning device 430 as learning results. The condition held in the condition holding unit 440 is referred to as a condition for detecting a predetermined event in the detection phase 101.

The image sensor 100 captures an image of a subject including an object, and detects a predetermined event according to the conditions held in the condition holding unit 440. The operation input device 310 receives an operation input from the outside. The output device 320 outputs information obtained by the image sensor 100.

FIG. 2 is a diagram illustrating a configuration example of the image sensor 100 according to the embodiment of the present technology. The image sensor 100 includes a condition setting unit 104, an imaging unit 110, a filter processing unit 120, a binarization processing unit 130, a tracking processing unit 140, a moment generation unit 150, and a centroid position generation unit 160. . The image sensor 100 also includes a totalization processing unit 210, a control unit 220, and an interface 230.

The imaging unit 110 is an imaging element that images a subject including a target object. The imaging unit 110 generates frames of image data arranged in time series at a predetermined frame rate. Here, a high frame rate of 1000 frames per second (1000 fps) or more is assumed as the frame rate. It is not necessary for all the frames of image data captured by the imaging unit 110 to be supplied to the outside of the image sensor 100. The image data with a high frame rate is for the purpose of detection described below, and a frame rate lower than this is sufficient for display. In other words, it is possible to effectively utilize the bandwidth of the image sensor 100 by keeping high frame rate image data as a reference in the image sensor 100. The imaging unit 110 is an example of an imaging element described in the claims.

The filter processing unit 120 performs a filtering process on each frame of image data captured by the imaging unit 110. As the filter processing in the filter processing unit 120, for example, noise removal processing using a moving average filter or median filter, contour detection processing using a Sobel filter, edge detection using a Laplacian filter, or the like is assumed. Further, the number of objects included in the image can be calculated by obtaining the Euler number of the image by the filter processing unit 120. The Euler number is the number of components minus the number of holes. Further, the filter processing unit 120 can extract other feature amounts of the image data.

The binarization processing unit 130 performs binarization processing on each of the frames subjected to the filter processing by the filter processing unit 120. The binarization processing unit 130 binarizes the image data based on luminance and color histogram information included in the image data of each frame, and generates a binarized frame including the binarized data.

The tracking processing unit 140 detects an object included in the binarized frame by generating a difference between frames adjacent in time series for the binarized frame generated by the binarization processing unit 130. , To track changes in the position of the object. When detecting an object, it is possible to designate a specific region in the image as a measurement target.

The moment generation unit 150 calculates the moment of the two-variable function in the binarized frame based on the result of the tracking processing unit 140. The 0th-order moment represents the amount of change in the area of the object included in the binarized frame, and is a value that is invariant to image rotation and enlargement / reduction.

The center-of-gravity position generation unit 160 generates the center-of-gravity position of the target object included in the binarized frame based on the moment generated by the moment generation unit 150. A value obtained by dividing the respective primary moments in the horizontal direction and the vertical direction by the zeroth moment represents the position of the center of gravity.

The aggregation processing unit 210 performs aggregation processing based on various data obtained by the image sensor 100 and detects a predetermined event. As shown in the application example below, the totalization processing unit 210 performs necessary processing according to the application that operates. The control unit 220 performs operation control on each unit of the image sensor 100. The interface 230 serves as an interface with the outside. In this example, the interface 230 is connected to the output device 320 and causes the output device 320 to display information obtained by the image sensor 100. The aggregation processing unit 210 is an example of a detection unit described in the claims. The interface 230 is an example of a control signal supply unit described in the claims.

In this figure, the output from the center-of-gravity position generation unit 160 clearly shows the route through which the output is supplied to the totalization processing unit 210 and the control unit 220. A route for supplying data may be provided as necessary.

The condition setting unit 104 sets conditions for detecting a predetermined event from image data in the detection phase 101. In the condition setting unit 104, a set value for each action is set as a result of action learning in the learning phase 401. As conditions set in the condition setting unit 104, for example, filter coefficients, detection conditions, target determination conditions, and the like are assumed.

The filter coefficient is a filter coefficient for facilitating extraction of desired information from the image data, how to process the image data captured by the imaging unit 110. This filter coefficient is mainly used by the filter processing unit 120, but is also used in the binarization processing unit 130 and the like.

The detection condition is a condition that the detection phase 101 wants to detect. For example, a dangerous act described later and an event such as a sign language pattern correspond to this.

The target determination condition is a target of the condition to be detected in the detection phase 101. For example, if the detection condition is a dangerous act, the person or animal performing the act corresponds to this. Further, if the detection condition is a sign language pattern, this corresponds to an arm or mouth used for sign language.

FIG. 3 is a diagram illustrating an example of the operation of the detection phase 101 according to the embodiment of the present technology.

In the condition setting unit 104, a condition for detecting a predetermined event from image data in the detection phase 101 is set. Further, the image data is converted so that the shape of the target can be easily recognized by the filter calculation by the filter processing unit 120 and the binarization processing by the binarization processing unit 130. For example, feature point extraction is performed and the amount of data is reduced.

The aggregation processing unit 210 determines the position of the object as a target based on the converted image data, and designates the target to the tracking processing unit 140. The tracking processing unit 140 captures the designated target, updates the target position information, and continues tracking. The aggregation processing unit 210 monitors the movement amount and shape of the target, and confirms whether or not the shape matches the set value of the condition setting unit 104. When the totalization processing unit 210 detects that the setting value of the condition setting unit 104 matches, a control signal is supplied to the output device 320 via the control unit 220.

With this configuration, it is possible to perform control so that action recognition is performed in real time from a change in the shape of an object in accordance with the conditions set in the condition setting unit 104 in the image sensor 100 and the detection result is output.

<2. First application example>
FIG. 4 is a diagram illustrating an example of dangerous behavior detection in the first application example of the embodiment of the present technology. In order to perform dangerous action detection using the detection system in the present embodiment, the image processing unit 210 captures an image of an object and determines whether or not the condition set in the condition setting unit 104 is met by the aggregation processing unit 210. Detect. As a result, by predicting a dangerous action and giving a warning, the dangerous action can be suppressed in advance.

As examples of dangerous actions in this example, mischievous actions, dangerous actions, harm to people, etc. are assumed. As a mischievous act, for example, an act of breaking a room by a pet, such as a nail clipper using furniture by a cat, is assumed. As the dangerous action, for example, an action that may be life-threatening, such as a case where a child is trying to get over the fence of the veranda, is assumed. As harm to a person, for example, a case where damage is caused to a person by fishing trash of a crow is assumed.

As shown to a in the figure, when the cat is trying to nail using furniture, the cat is captured as a target, and it is determined whether or not it falls under the action of nail pegging from the movement amount and the change in shape. The In this case, a warning sound is output from the output device 320 when the action of nail clipping is detected.

On the other hand, as shown in b in the figure, when the cat is petting pet goods or humans, the cat is captured as a target, but it does not fall under the action of nail cutting because of its movement and shape change. And no warning is given.

Also, if crows flock to the garbage dump, the crows are captured as targets, and it is judged whether or not they fall under garbage fishing behavior from the change in the amount of movement and shape. In this case, when a garbage fishing action is detected, a warning sound is output from the output device 320.

On the other hand, if an animal such as a dog, cat, or penguin passes through a garbage collection site during a walk, the animal will be captured as a target, but due to changes in the amount of movement and shape, it does not fall under garbage fishing. Judgment is made and no warning is given.

It should be noted that when a cat jumps on the balcony of the veranda, it can be considered that whether or not to give a warning can be appropriately selected. In fact, in order to prohibit it, it is too late to get on the fence, and it is necessary to stop when you are about to jump.

FIG. 5 is a flowchart showing an example of a dangerous action detection processing procedure in the first application example of the embodiment of the present technology.

First, image data is acquired by imaging an object by the imaging unit 110 (step S811). The acquired image data constitutes a time-series frame.

The noise is removed (noise reduction) from the acquired frames by the filter processing unit 120 (step S812). Further, the feature amount is extracted by the filter processing unit 120 (step S813).

Then, the binarization processing unit 130 performs binarization based on the color and brightness of the image in the frame (step S814). Thereby, the amount of data to be processed thereafter is reduced.

In the binarized frame, an object such as a cat is determined as a target (step S815). Then, a difference between frames adjacent in time series is generated by the tracking processing unit 140, and target tracking (target tracking) is performed (step S816).

Also, moment calculation is performed on the target by the moment generator 150 (step S817). Based on the moment generated by the moment generation unit 150, the centroid position generation unit 160 generates the centroid position of the target included in the binarized frame. In addition, the movement amount and shape of the target are calculated by the totalization processing unit 210 (step S818).

The aggregation processing unit 210 detects whether or not the target movement amount and shape calculated in this way match the set values set in the condition setting unit 104. If the condition is met (step S819: YES), the control unit 220 supplies a control signal for outputting a warning sound to the output device 320 (step S821).

These processes are repeated for each frame of image data arranged in time series.

As described above, in the first application example, assuming a high frame rate of 1000 fps or more as the frame rate, the behavior is predicted based on the movement amount and shape of the target object, so that the dangerous action of the target object is reduced. Can be detected.

<3. Second application example>
FIG. 6 is a diagram illustrating an example of assisting hearing impairment in the second application example of the embodiment of the present technology. In order to assist hearing impairment using the detection system according to the present embodiment, a person is imaged by the image sensor 100, and whether or not the condition set in the condition setting unit 104 is met is determined by the aggregation processing unit 210. Detect. As a result, it is possible to read the sign language and the movement of the lips and output the corresponding subtitles.

Here, an example of sign language analysis is shown as an example of assisting hearing impairment. In this example, the mobile terminal 622 captures an image of the person 621 performing sign language. The portable terminal 622 includes the image sensor 100, and imaging is performed by the imaging unit 110. Target tracking and moment calculation are performed based on the captured image, the finger and arm of the person 621 are detected, and the shape, movement, area, and the like are measured. The motion is then mapped to the corresponding word.

As a result, a captured person image 623 is displayed on the display unit of the portable terminal 622, and a caption 624 representing the corresponding word is displayed.

FIG. 7 is a flowchart showing an example of a sign language analysis processing procedure in the second application example of the embodiment of the present technology.

First, image data is acquired by imaging an object by the imaging unit 110 (step S831). The acquired image data constitutes a time-series frame.

The noise is removed from each acquired frame by the filter processing unit 120 (step S832). Further, the feature amount is extracted by the filter processing unit 120 (step S833).

Then, binarization is performed by the binarization processing unit 130 based on the color and luminance of the image in the frame (step S834). Thereby, the amount of data to be processed thereafter is reduced.

In the binarized frame, the finger or arm of the person who performs sign language is determined as the target (step S835). Then, a difference between frames adjacent in time series is generated by the tracking processing unit 140, and target tracking (target tracking) is performed (step S836).

Also, moment calculation is performed on the target by the moment generator 150 (step S837). Based on the moment generated by the moment generation unit 150, the centroid position generation unit 160 generates the centroid position of the target included in the binarized frame. Further, the totaling processing unit 210 calculates the target motion (shape, movement, area, etc.) (step S838).

The aggregation processing unit 210 detects whether or not the target action calculated in this way matches the set value set in the condition setting unit 104. If the condition is met (step S839: Yes), the control unit 220 maps the operation to a word (step S841). And the control signal which outputs a subtitle (text information) etc. as a recognition result is supplied with respect to the output device 320 (step S842).

FIG. 8 is a flowchart showing an example of a lip reading processing procedure in the second application example of the embodiment of the present technology. In the above flowchart, an example of sign language analysis is shown, but it is conceivable to perform lip reading as another example of assisting hearing impairment. Lip reading means reading the utterance content based on the movement and shape of the lips.

First, image data is acquired by imaging an object by the imaging unit 110 (step S851). The acquired image data constitutes a time-series frame.

The noise is removed from each acquired frame by the filter processing unit 120 (step S852). The filter processing unit 120 extracts feature amounts (step S853).

Then, the binarization processing unit 130 performs binarization based on the color and brightness of the image in the frame (step S854). Thereby, the amount of data to be processed thereafter is reduced.

In the binarized frame, the lips (mouth) of the person are determined as targets (step S855). Then, a difference between frames adjacent in time series is generated by the tracking processing unit 140, and target tracking (target tracking) is performed (step S856).

Also, moment calculation is performed on the target by the moment generator 150 (step S857). Based on the moment generated by the moment generation unit 150, the centroid position generation unit 160 generates the centroid position of the target included in the binarized frame. Further, the totaling processing unit 210 calculates the target movement (shape, movement, area, etc.) (step S858).

The aggregation processing unit 210 detects whether or not the target action calculated in this way matches the set value set in the condition setting unit 104. When the condition is met (step S859: Yes), the control unit 220 maps the utterance content corresponding to the operation to the word (step S861). And the control signal which outputs a subtitle (text information) etc. as a recognition result is supplied with respect to the output device 320 (step S862).

In this example, the lip reading result is output as text information. However, the sign device information may be output to the output device 320 based on the lip reading result.

As described above, in this second application example, assuming a high frame rate of 1000 fps or more as the frame rate, the meaning of the action is recognized based on the movement of a part of the person, thereby assisting the hearing impairment. be able to.

<4. Third application example>
FIG. 9 is a diagram illustrating an example of detection based on a plurality of conditions in the third application example of the embodiment of the present technology.

In the surrounding situation surveys etc. in the city, cars and pedestrians are moving freely, and each is not necessarily facing the front. In order to accurately detect and detect these, it is useful to set a plurality of detection conditions and detect them at high speed. In this example, assuming a high frame rate, the detection processing is executed in a time-sharing manner, and processing is performed as if it is operating in parallel to increase the speed.

In this example downtown road, vehicles 631, truck 632, tuk tuk 633, bicycle (Sam Lo) 634 and the like are mixed as vehicles. In the traffic volume survey, the image sensor 100 captures the road condition, and the total processing unit 210 detects whether the image matches the vehicle type condition set in the condition setting unit 104. The type of vehicle is detected. And it counts according to the type about the detected vehicle.

In this case, which type of vehicle the object in the image is is set in the condition setting unit 104 as a separate detection condition for each type. These detection conditions are separate and can be detected independently of each other.

Also, it is possible to detect a wanted crime by a similar method. The feature amount of the person is calculated in the image of the downtown area and the detected face image is compared with the image in the database, thereby detecting the wanted crime. Even in this case, the detection conditions used for the detection of the face image are separate and can be detected independently of each other.

FIG. 10 is a diagram illustrating a relationship example between a plurality of conditions and frames in the third application example of the embodiment of the present technology. Here, it is assumed that four conditions 441 to 444 are set in the condition setting unit 104. A high frame rate of 1000 fps or higher is assumed as the frame rate.

In this example, the first condition 441 is detected in the first frame, the second condition 442 is detected in the second frame, the third condition 443 is detected in the third frame, In the 4th frame, the fourth condition 444 is detected. These four conditions 441 to 444 are separate and can be detected independently of each other. When a high frame rate is assumed, it is not necessary to detect one condition for all frames, and there is little possibility of an error even if frames are thinned out. By performing detection in a time-sharing manner for separate frames as in this example, it is possible to perform processing as if they are operating in parallel.

For example, on the basis of the operation at 50 fps, if it operates at 1000 fps, detection can be performed at 20 times the speed. The type and number to be detected can be increased 20 times. In the above-described example, five automobiles 631, trucks 632, tuk-tuk 633, and five bicycles (Sam Lo) 634 each correspond to 20 detections. When one detection is performed at an operation of 50 fps, if the operation is performed at 1000 fps, these 20 detections can be performed in the same time.

As described above, in the third application example, detection based on a plurality of conditions can be performed in real time by assuming processing at a high frame rate of 1000 fps or more as a frame rate.

As described so far, according to the embodiment of the present technology, an object is imaged at a high frame rate and detected under the conditions set in the image sensor 100, thereby recognizing the action from the shape change of the object. Can be performed in real time.

The above-described embodiment shows an example for embodying the present technology, and the matters in the embodiment and the invention-specific matters in the claims have a corresponding relationship. Similarly, the invention specific matter in the claims and the matter in the embodiment of the present technology having the same name as this have a corresponding relationship. However, the present technology is not limited to the embodiment, and can be embodied by making various modifications to the embodiment without departing from the gist thereof.

Further, the processing procedure described in the above embodiment may be regarded as a method having a series of these procedures, and a program for causing a computer to execute these series of procedures or a recording medium storing the program. You may catch it. As this recording medium, for example, a CD (Compact Disc), an MD (MiniDisc), a DVD (Digital Versatile Disc), a memory card, a Blu-ray disc (Blu-ray (registered trademark) Disc), or the like can be used.

It should be noted that the effects described in this specification are merely examples, and are not limited, and other effects may be obtained.

In addition, this technique can also take the following structures.
(1) an image sensor that images a target and generates frames of image data arranged in time series;
A binarization processing unit that performs binarization processing on each of the frames to generate a binarized frame;
A tracking processing unit that generates a difference between the binarized frames adjacent to each other in time series and tracks a change in the position of the object included in the binarized frame;
A moment generation unit that calculates a moment of the object included in the binarized frame based on a result of the tracking processing unit;
A condition setting unit for setting conditions for detecting a predetermined event from the image data;
A detection unit that detects the predetermined event by comparing the moment of the object and the condition set in the condition setting unit;
An image sensor comprising: a control signal supply unit that supplies a control signal to the output device according to the detection result.
(2) further comprising a filter processing unit that performs a filtering process on each of the frames;
The image sensor according to (1), wherein the binarization processing unit performs the binarization processing on each of the frames subjected to the filter processing.
(3) In (1) or (2), further comprising a centroid position generation unit that generates a centroid position of the object included in the binarized frame based on the moment generated by the moment generation unit. The image sensor described.
(4) The detection unit detects a dangerous act as the predetermined event based on the movement amount and shape of the object,
The image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that outputs a warning to the output device when the dangerous action is detected.
(5) The detection unit detects a sign language pattern as the predetermined event based on the movement and shape of the object, performs sign language analysis,
The image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that causes the output device to output text information based on the sign language analysis.
(6) The detection unit detects the utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
The image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that causes the output device to output text information based on the result of the lip reading.
(7) The detection unit detects the utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
The image sensor according to any one of (1) to (3), wherein the control signal supply unit supplies the control signal that causes the output device to output sign language information based on the result of the lip reading.
(8) The condition setting unit sets a plurality of the conditions,
The image sensor according to any one of (1) to (7), wherein the detection unit performs the detection independently for the plurality of conditions.

DESCRIPTION OF SYMBOLS 100 Image sensor 101 Detection phase 104 Condition setting part 110 Imaging part 120 Filter process part 130 Binarization process part 140 Tracking process part 150 Moment generation part 160 Centroid position generation part 210 Aggregation process part 220 Control part 230 Interface 310 Operation input device 320 Output device 401 Learning phase 410 Camera 411 Imaging unit 420 Control unit 430 Behavior learning device 440 Condition holding unit 621 Person 622 Portable terminal 624 Subtitle 631 Car 632 Truck 633 Tuk tuk 634 Bicycle (Samlow)

Claims

An image sensor that images a target and generates frames of image data arranged in time series; and
A binarization processing unit that performs binarization processing on each of the frames to generate a binarized frame;
A tracking processing unit that generates a difference between the binarized frames adjacent to each other in time series and tracks a change in the position of the object included in the binarized frame;
A moment generation unit that calculates a moment of the object included in the binarized frame based on a result of the tracking processing unit;
A condition setting unit for setting conditions for detecting a predetermined event from the image data;
A detection unit that detects the predetermined event by comparing the moment of the object and the condition set in the condition setting unit;
An image sensor comprising: a control signal supply unit that supplies a control signal to the output device according to the detection result.
A filter processing unit that performs a filtering process on each of the frames;
The image sensor according to claim 1, wherein the binarization processing unit performs the binarization processing on each of the frames subjected to the filter processing.
The image sensor according to claim 1, further comprising a centroid position generation unit that generates a centroid position of the object included in the binarized frame based on the moment generated by the moment generation unit.
The detection unit detects a dangerous act as the predetermined event based on the movement amount and shape of the object,
The image sensor according to claim 1, wherein the control signal supply unit supplies the control signal that causes the output device to output a warning when the dangerous action is detected.
The detection unit detects a sign language pattern as the predetermined event based on the movement and shape of the object, performs sign language analysis,
The image sensor according to claim 1, wherein the control signal supply unit supplies the control signal that causes the output device to output text information based on the sign language analysis.
The detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
The image sensor according to claim 1, wherein the control signal supply unit supplies the control signal that causes the output device to output text information based on the result of the lip reading.
The detection unit detects a utterance content pattern as the predetermined event based on the movement and shape of the lips on the object, and reads the lips.
The image sensor according to claim 1, wherein the control signal supply unit supplies the control signal that causes the output device to output sign language information based on the result of the lip reading.
The condition setting unit sets a plurality of the conditions,
The image sensor according to claim 1, wherein the detection unit performs the detection independently for the plurality of conditions.