TWI749870B

TWI749870B - Device of handling video content analysis

Info

Publication number: TWI749870B
Application number: TW109139742A
Authority: TW
Inventors: 李威諭
Original assignee: 四零四科技股份有限公司
Priority date: 2020-04-08
Filing date: 2020-11-13
Publication date: 2021-12-11
Also published as: CN113496188A; TW202139134A; CN113496188B

Abstract

A computing device for handling video content analysis, comprises a preprocessing module, for receiving a first plurality of frames and for determining whether to delete at least one of the first plurality of frames according to an event detection, to generate a second plurality of frames according to the determination for the first plurality of frames; a first deep learning module, coupled to the preprocessing module, for receiving the second plurality of frames and for determining whether to delete at least one of the second plurality of frames according to a plurality of features of the second plurality of frames, to generate a third plurality of frames according to the determination for the second plurality of frames; and a second deep learning module, coupled to the first deep learning module, for receiving the third plurality of frames, to generate a plurality of prediction outputs of the third plurality of frames.

Description

Device for processing video content analysis

本發明相關於一種用於多媒體系統的裝置，尤指一種處理視訊內容分析的裝置。 The present invention relates to a device used in a multimedia system, in particular to a device for processing video content analysis.

視訊內容分析旨在識別(例如偵測或決定)在視訊或串流的幀中的時間事件及/或空間事件(例如目標)。視訊內容分析已被應用來解決幾個應用領域的問題，例如偵測(例如竄改偵測及異常偵測)、視訊追蹤(例如行人重新識別(person re-identification，person re-ID))、交通監測(例如人數/車輛計數)等。在習知技術中，用於視訊內容分析的幾種方法被提出，但這些方法以高計算複雜度為代價來達到較好的效能。當硬體的計算能力不足時，藉由這些方法來實現硬體計算視訊的所有幀是困難的。因此，用來處理視訊內容分析且具有低計算複雜度的方法仍是亟待解決的問題。 Video content analysis aims to identify (e.g., detect or determine) temporal events and/or spatial events (e.g. targets) in the frames of the video or stream. Video content analysis has been applied to solve problems in several application areas, such as detection (such as tamper detection and anomaly detection), video tracking (such as person re-identification (person re-ID)), traffic Monitoring (such as number of people/vehicle count), etc. In the prior art, several methods for video content analysis have been proposed, but these methods achieve better performance at the cost of high computational complexity. When the computing power of the hardware is insufficient, it is difficult to realize the hardware calculation of all frames of the video by these methods. Therefore, a method with low computational complexity for processing video content analysis is still a problem to be solved urgently.

本發明提供了一種方法及其裝置，用來處理視訊內容分析，以解決上述問題。 The present invention provides a method and device for processing video content analysis to solve the above-mentioned problems.

本發明揭露一種計算裝置(computing device)，用來處理視訊內容分析(video content analysis)，包含有：一預處理模組(preprocessing module)，用來接收第一複數個幀，以及根據一事件偵測(event detection)，決定是否刪除該第一複數個幀的至少一幀，以根據用於該第一複數個幀的該決定，產生第二複數個幀；一第一深度學習模組(deep learning module)，耦接於該預處理模組，用來接收該第二複數個幀，以及根據該第二複數個幀的複數個特徵，決定是否刪除該第二複數個幀的至少一幀，以根據用於該第二複數個幀的該決定，產生第三複數個幀；以及一第二深度學習模組，耦接於該第一深度學習模組，用來接收該第三複數個幀，以產生該第三複數個幀的複數個預測輸出。 The present invention discloses a computing device for processing video content distribution Video content analysis, including: a preprocessing module, used to receive the first plurality of frames, and according to an event detection (event detection), determine whether to delete the first plurality of frames At least one frame to generate a second plurality of frames according to the decision for the first plurality of frames; a first deep learning module (deep learning module) coupled to the preprocessing module for receiving The second plurality of frames, and based on the plurality of characteristics of the second plurality of frames, determine whether to delete at least one frame of the second plurality of frames, so as to generate the first plurality of frames according to the decision for the second plurality of frames Three plural frames; and a second deep learning module, coupled to the first deep learning module, for receiving the third plural frames to generate plural prediction outputs of the third plural frames.

10:計算裝置 10: Computing device

100:預處理模組 100: preprocessing module

110:第一深度學習模組 110: The first deep learning module

120:第二深度學習模組 120: The second deep learning module

20:計算裝置 20: Computing device

202:第一緩衝器 202: first buffer

204:第二緩衝器 204: second buffer

206:可適性緩衝器 206: adaptability buffer

30:流程圖 30: Flow chart

300~318:步驟 300~318: steps

400:幀 400: Frame

410:中心線 410: Centerline

420:軌道 420: Orbit

500:幀 500: frame

502:軌道 502: Track

504:扣件 504: Fastener

506:螺栓 506: Bolt

508:標線 508: Marking Line

60:流程圖 60: Flow chart

600~618:步驟 600~618: steps

70:流程圖 70: Flow Chart

700~718:步驟 700~718: steps

80:計算裝置 80: computing device

802:第三深度學習模組 802: The third deep learning module

804:第四深度學習模組 804: Fourth Deep Learning Module

90:流程圖 90: Flow chart

900~918:步驟 900~918: Steps

第1圖為本發明實施例一計算裝置的示意圖。 Figure 1 is a schematic diagram of a computing device according to an embodiment of the present invention.

第2圖為本發明實施例一計算裝置的示意圖。 Figure 2 is a schematic diagram of a computing device according to an embodiment of the present invention.

第3圖為本發明實施例用於鐵道元件異常檢測的一流程的流程圖。 Figure 3 is a flowchart of a process for detecting abnormalities in railway components according to an embodiment of the present invention.

第4圖為本發明實施例一幀的示意圖。 Figure 4 is a schematic diagram of a frame according to an embodiment of the present invention.

第5圖為本發明實施例一幀的示意圖。 Figure 5 is a schematic diagram of a frame according to an embodiment of the present invention.

第6圖為本發明實施例用於行人重新識別的一流程的流程圖。 Figure 6 is a flowchart of a process for pedestrian re-identification according to an embodiment of the present invention.

第7圖為本發明實施例用於交通監測的一流程的流程圖。 Figure 7 is a flow chart of a process for traffic monitoring according to an embodiment of the present invention.

第8圖為本發明實施例一計算裝置的示意圖。 Figure 8 is a schematic diagram of a computing device according to an embodiment of the present invention.

第9圖為本發明實施例一流程的流程圖。 Figure 9 is a flowchart of a process according to an embodiment of the present invention.

第1圖為本發明實施例一計算裝置10的示意圖。計算裝置10包含有預處理模組(preprocessing module)100、第一深度學習模組(deep learning module) 110及第二深度學習模組120。詳細來說，預處理模組100接收第一複數個幀，以及根據事件偵測(event detection)，決定是否刪除第一複數個幀的至少一幀。根據用於第一複數個幀的決定，預處理模組100產生第二複數個幀。也就是說，預處理模組100處理(例如檢查)第一複數個幀，以根據事件偵測，決定第一複數個幀是否包含有至少一事件。若第一複數個幀的幀包含有至少一事件，預處理模組100不刪除第一複數個幀的幀，以及根據第一複數個幀的幀，產生第二複數個幀的幀。若第一複數個幀的幀不包含有任何事件，預處理模組100刪除第一複數個幀的幀，以及不產生第二複數個幀的幀。換言之，第二複數個幀包含有事件。 FIG. 1 is a schematic diagram of a computing device 10 according to an embodiment of the present invention. The computing device 10 includes a preprocessing module (preprocessing module) 100 and a first deep learning module (deep learning module) 110 and the second deep learning module 120. In detail, the preprocessing module 100 receives the first plurality of frames, and determines whether to delete at least one frame of the first plurality of frames according to event detection. According to the decision for the first plurality of frames, the preprocessing module 100 generates a second plurality of frames. That is, the preprocessing module 100 processes (eg, checks) the first plurality of frames to determine whether the first plurality of frames contains at least one event based on event detection. If the frames of the first plurality of frames include at least one event, the preprocessing module 100 does not delete the frames of the first plurality of frames, and generates the frames of the second plurality of frames according to the frames of the first plurality of frames. If the frames of the first plurality of frames do not contain any events, the preprocessing module 100 deletes the frames of the first plurality of frames and does not generate the frames of the second plurality of frames. In other words, the second plurality of frames contains events.

第一深度學習模組110耦接於預處理模組100，以及接收第二複數個幀。根據第二複數個幀的第一複數個特徵，第一深度學習模組110決定是否刪除第二複數個幀的至少一幀，以及根據用於第二複數個幀的決定，產生第三複數個幀。也就是說，第一深度學習模組110處理(例如檢查)第二複數個幀，以提取第二複數個幀的第一複數個特徵。根據第一複數個特徵，第一深度學習模組110決定第二複數個幀的事件是否屬於第一目標事件。若第二複數個幀的幀的至少一事件屬於第一目標事件，第一深度學習模組110不刪除第二複數個幀的幀，以及根據第二複數個幀的幀，產生第三複數個幀的幀。若第二複數個幀的幀的至少一事件不屬於第一目標事件，第一深度學習模組110刪除第二複數個幀的幀，以及不產生第三複數個幀的幀。換言之，第三複數個幀的幀為包含有至少一事件的第二複數個幀的幀，以及至少一事件屬於第一目標事件。 The first deep learning module 110 is coupled to the preprocessing module 100 and receives a second plurality of frames. According to the first plurality of characteristics of the second plurality of frames, the first deep learning module 110 determines whether to delete at least one frame of the second plurality of frames, and according to the decision for the second plurality of frames, generates a third plurality of frames frame. That is, the first deep learning module 110 processes (for example, checks) the second plurality of frames to extract the first plurality of features of the second plurality of frames. According to the first plurality of characteristics, the first deep learning module 110 determines whether the events of the second plurality of frames belong to the first target event. If at least one event of the frames of the second plurality of frames belongs to the first target event, the first deep learning module 110 does not delete the frames of the second plurality of frames, and generates a third plurality of frames based on the frames of the second plurality of frames. Frame of frame. If at least one event of the second plurality of frames does not belong to the first target event, the first deep learning module 110 deletes the second plurality of frames, and does not generate the third plurality of frames. In other words, the frames of the third plurality of frames are the frames of the second plurality of frames including at least one event, and at least one event belongs to the first target event.

第二深度學習模組120耦接於第一深度學習模組110，以及接收第三複數個幀。第二深度學習模組120產生第三複數個幀的複數個預測輸出(例如預測結果)。也就是說，第二深度學習模組120處理第三複數個幀，以提取第三複數個幀的第二複數個特徵。根據第二複數個特徵，第二深度學習模組120決定第三複數個幀的事件是否屬於第二目標事件，以產生複數個預測輸出。若第三複數個幀的幀的至少一事件屬於第二目標事件，第二深度學習模組120產生複數個預測輸出的一預測輸出，其中預測輸出指示至少一事件屬於第二目標事件。若第三複數個幀的幀的至少一事件不屬於第二目標事件，第二深度學習模組120產生複數個預測輸出的一預測輸出，其中預測輸出指示至少一事件不屬於第二目標事件。 The second deep learning module 120 is coupled to the first deep learning module 110 and receives a third plurality of frames. The second deep learning module 120 generates a plurality of prediction outputs of a third plurality of frames (e.g., prediction Test results). That is, the second deep learning module 120 processes the third plurality of frames to extract the second plurality of features of the third plurality of frames. According to the second plurality of characteristics, the second deep learning module 120 determines whether the events of the third plurality of frames belong to the second target event, so as to generate a plurality of prediction outputs. If at least one event of the third plurality of frames belongs to the second target event, the second deep learning module 120 generates a prediction output of a plurality of prediction outputs, wherein the prediction output indicates that at least one event belongs to the second target event. If at least one event of the third plurality of frames does not belong to the second target event, the second deep learning module 120 generates a prediction output of a plurality of prediction outputs, wherein the prediction output indicates that at least one event does not belong to the second target event.

在一實施例中，第一目標事件及第二目標事件可為不同的。舉例來說，當計算裝置10用來處理鐵道元件異常偵測時，第一目標事件為被完美拍攝的元件，以及第二目標事件為元件的正常狀態。在一實施例中，第一目標事件及第二目標事件可為相同的。舉例來說，當計算裝置10用來處理行人重新識別(person re-identification，person Re-ID)時，第一目標事件及第二目標事件為相同的目標行人。需注意的是，第一深度學習模組110的計算複雜度小於第二深度學習模組120的計算複雜度。因此，即使第一目標事件及第二目標事件為相同的，相較於第一深度學習模組110的決定，第二深度學習模組120的決定可更準確。 In an embodiment, the first target event and the second target event may be different. For example, when the computing device 10 is used to process the abnormal detection of railway components, the first target event is the perfectly photographed component, and the second target event is the normal state of the component. In an embodiment, the first target event and the second target event may be the same. For example, when the computing device 10 is used to process pedestrian re-identification (person re-identification, person Re-ID), the first target event and the second target event are the same target pedestrian. It should be noted that the computational complexity of the first deep learning module 110 is less than the computational complexity of the second deep learning module 120. Therefore, even if the first target event and the second target event are the same, the determination of the second deep learning module 120 may be more accurate than the determination of the first deep learning module 110.

在一實施例中，計算裝置10可另包含有可適性緩衝器(adaptive buffer)。可適性緩衝器耦接於第二深度學習模組120，以及用來儲存複數個預測輸出。根據包含有事件的至少一幀的數量，可適性緩衝器的大小(size)被決定，以及至少一幀被包含在第一複數個幀中。 In an embodiment, the computing device 10 may further include an adaptive buffer. The adaptability buffer is coupled to the second deep learning module 120, and is used to store a plurality of prediction outputs. According to the number of at least one frame containing the event, the size of the adaptability buffer is determined, and at least one frame is included in the first plurality of frames.

第2圖為本發明實施例一計算裝置20的示意圖。計算裝置20包含有預處理模組100、第一深度學習模組110、第二深度學習模組120、第一緩衝器202、第二緩衝器204及可適性緩衝器206。預處理模組100產生第二複數個幀，以及傳送第二複數個幀到第一緩衝器202。第一緩衝器202耦接於預處理模組100及第一深度學習模組110，用來儲存第二複數個幀。第一深度學習模組110從第一緩衝器202接收第二複數個幀。第一深度學習模組110產生第三複數個幀，以及傳送第三複數個幀到第二緩衝器204。第二緩衝器204耦接於第一深度學習模組110及第二深度學習模組120，用來儲存第三複數個幀。第二深度學習模組120從第二緩衝器204接收第三複數個幀。第二深度學習模組120產生複數個預測輸出，以及傳送複數個預測輸出到可適性緩衝器206。可適性緩衝器206耦接於第二深度學習模組120，用來儲存複數個預測輸出。 FIG. 2 is a schematic diagram of a computing device 20 according to an example of the present invention. The computing device 20 includes a preprocessing module 100, a first deep learning module 110, a second deep learning module 120, a first buffer 202, a second buffer 204, and an adaptability buffer 206. The preprocessing module 100 generates a second plurality of frames and transmits the second plurality of frames to the first buffer 202. The first buffer 202 is coupled to the preprocessing module 100 and the first deep learning module 110 for storing the second plurality of frames. The first deep learning module 110 receives the second plurality of frames from the first buffer 202. The first deep learning module 110 generates a third plurality of frames, and transmits the third plurality of frames to the second buffer 204. The second buffer 204 is coupled to the first deep learning module 110 and the second deep learning module 120 for storing the third plurality of frames. The second deep learning module 120 receives the third plurality of frames from the second buffer 204. The second deep learning module 120 generates a plurality of prediction outputs, and transmits the plurality of prediction outputs to the adaptability buffer 206. The adaptability buffer 206 is coupled to the second deep learning module 120 for storing a plurality of prediction outputs.

需注意的是，根據幀率(frame rate)，在第一複數個幀中的連續(successive)幀可包含有相同事件，但第二深度學習模組120可產生關聯於相同事件的不同預測輸出(在理想情況下，關聯於相同事件的預測輸出應為相同的)。因此，關聯於相同事件的預測輸出可被平均，以預防根據第二深度學習模組120所產生的錯誤預測輸出，錯誤運作(例如錯誤告警(false alarm))被執行。 It should be noted that, according to the frame rate, the successive (successive) frames in the first plurality of frames may contain the same event, but the second deep learning module 120 may generate different prediction outputs related to the same event (Ideally, the predicted output associated with the same event should be the same). Therefore, the predicted output related to the same event can be averaged to prevent the execution of an incorrect operation (such as a false alarm) based on the incorrect predicted output generated by the second deep learning module 120.

此外，根據包含有事件的至少一幀的數量，可適性緩衝器206的大小被決定(例如改變)，其中至少一幀被包含在第一複數個幀中。舉例來說，預處理模組100評估第一複數個幀，以及計算事件的相對速度(即包含有事件的至少一幀的數量)。根據相對速度(即數量)，預處理模組100產生一結果，以及傳送結果到可適性緩衝器206。若包含有新事件的至少一第一幀的數量少於包含有當前事件的至少一第二幀的數量，可適性緩衝器206的大小被減少。若包含有新事件的至少一第一幀的數量多於包含有當前事件的至少一第二幀的數量，可適性緩衝器206的大小被增加。至少一第一幀及至少一第二幀在第一複數個幀中。 In addition, according to the number of at least one frame containing the event, the size of the adaptability buffer 206 is determined (for example, changed), in which at least one frame is included in the first plurality of frames. For example, the preprocessing module 100 evaluates the first plurality of frames, and calculates the relative speed of the event (that is, the number of at least one frame containing the event). According to the relative speed (ie, the number), the preprocessing module 100 generates a result, and transmits the result to the adaptability buffer 206. If the number of at least one first frame containing a new event is less than the number of at least one second frame containing a current event, the size of the adaptability buffer 206 is reduced. If it contains something new The number of at least one first frame of the file is more than the number of at least one second frame containing the current event, and the size of the adaptability buffer 206 is increased. At least one first frame and at least one second frame are in the first plurality of frames.

在一實施例中，第一緩衝器202、第二緩衝器204及可適性緩衝器206可為檔案系統(file systems)或記憶系統(memory systems)。 In an embodiment, the first buffer 202, the second buffer 204, and the adaptable buffer 206 may be file systems or memory systems.

在一實施例中，當沒有事件被偵測時，預處理模組100及第一深度學習模組110可被暫停，以及第二深度學習模組120可繼續來處理(例如接收)儲存在第二緩衝器204中的第三複數個幀的至少一幀，以產生複數個預測輸出的至少一預測輸出。 In one embodiment, when no event is detected, the preprocessing module 100 and the first deep learning module 110 can be suspended, and the second deep learning module 120 can continue to process (for example receive) storage in the first At least one frame of the third plurality of frames in the second buffer 204 is used to generate at least one prediction output of the plurality of prediction outputs.

在一實施例中，預處理模組100執行運作的時間點、第一深度學習模組110執行運作的時間點及第二深度學習模組120執行運作的時間點可為相同的。也就是說，預處理模組100、第一深度學習模組110及第二深度學習模組120可平行(即同時)運作。 In one embodiment, the time point when the preprocessing module 100 executes the operation, the time point when the first deep learning module 110 executes the operation, and the time point when the second deep learning module 120 executes the operation may be the same. In other words, the preprocessing module 100, the first deep learning module 110, and the second deep learning module 120 can operate in parallel (ie, simultaneously).

在一實施例中，第一複數個幀可由視訊記錄器(例如相機)產生(例如擷取)。在一實施例中，第一複數個幀可由不同的視訊記錄器(例如不同的相機)產生(例如捕捉)。 In one embodiment, the first plurality of frames may be generated (eg captured) by a video recorder (eg, camera). In an embodiment, the first plurality of frames may be generated (for example, captured) by different video recorders (for example, different cameras).

在一實施例中，第一複數個幀被產生用於串流(streaming)或視訊(例如視訊片段)。也就是說，當視訊記錄器運行(例如運作或捕捉)時，預處理模組100可立即接收及處理第一複數個幀。此外，在視訊記錄器停止運行(例如運作或捕捉)後，預處理模組100可接收及處理第一複數個幀。 In one embodiment, the first plurality of frames are generated for streaming or video (such as video clips). In other words, when the video recorder is running (for example, operating or capturing), the preprocessing module 100 can immediately receive and process the first plurality of frames. In addition, after the video recorder stops running (such as operating or capturing), the preprocessing module 100 can receive and process the first plurality of frames.

在一實施例中，第一複數個幀可為彩色影像(例如RGB影像)。在一實施例中，第一複數個幀可為灰階影像。 In an embodiment, the first plurality of frames may be color images (for example, RGB images). In an embodiment, the first plurality of frames may be grayscale images.

在一實施例中，事件偵測包含有移動偵測(motion detection)及目標偵測(object detection)中至少一者。在一實施例中，移動偵測可包含有場景相減(scene subtraction)。預處理模組100可使用場景相減來執行移動偵測，以決定是否刪除第一複數個幀的至少一幀。舉例來說，預處理模組100將第一複數個幀的幀的相鄰幀減去第一複數個幀的幀，以產生與幀相同大小的矩陣。預處理模組100加總矩陣的所有元素，以產生一數值。接著，若數值大於一臨界值，預處理模組100決定不刪除第一複數個幀的幀。若數值小於臨界值，預處理模組100決定刪除第一複數個幀的幀。 In one embodiment, event detection includes at least one of motion detection and object detection. In one embodiment, motion detection may include scene subtraction. The preprocessing module 100 can use scene subtraction to perform motion detection to determine whether to delete at least one of the first plurality of frames. For example, the preprocessing module 100 subtracts the frames of the first plurality of frames from adjacent frames of the first plurality of frames to generate a matrix of the same size as the frame. The preprocessing module 100 sums all the elements of the matrix to generate a value. Then, if the value is greater than a critical value, the preprocessing module 100 decides not to delete the first plurality of frames. If the value is less than the critical value, the preprocessing module 100 decides to delete the first plurality of frames.

在一實施例中，目標偵測包含有特徵提取(feature extraction)。預處理模組100可使用特徵提取來執行目標偵測，以決定是否刪除第一複數個幀的至少一幀。特徵提取可包含有至少一傳統電腦視覺方法(例如低階電腦視覺方法)，例如邊緣偵測(例如霍夫轉換(Hough transform))。舉例來說，若根據特徵提取，預處理模組100決定第一複數個幀的幀包含有至少一目標，預處理模組100不刪除第一複數個的幀。若根據特徵提取，預處理模組100決定第一複數個幀的幀不包含有任何目標，預處理模組100刪除第一複數個的幀。 In one embodiment, target detection includes feature extraction. The preprocessing module 100 can use feature extraction to perform target detection to determine whether to delete at least one of the first plurality of frames. Feature extraction may include at least one traditional computer vision method (such as a low-level computer vision method), such as edge detection (such as Hough transform). For example, if according to feature extraction, the preprocessing module 100 determines that the frames of the first plurality of frames include at least one target, the preprocessing module 100 does not delete the first plurality of frames. If based on the feature extraction, the preprocessing module 100 determines that the frames of the first plurality of frames do not contain any objects, and the preprocessing module 100 deletes the first plurality of frames.

在一實施例中，若預處理模組100決定第一複數個的幀不被刪除，預處理模組100處理第一複數個的幀。也就是說，在執行用於第一複數個幀的決定後，預處理模組100處理沒有被刪除的第一複數個幀，以產生第二複數個幀。在一實施例中，處理第一複數個幀的幀的運作包含有雜訊降低(noise reduction)、降尺度運作(downscaling operation)、可適性直方圖等化(adaptive histogram equalization)、影像品質增強(image quality enhancement)及剪裁運作(cropping operation)中至少一者。 In an embodiment, if the preprocessing module 100 determines that the first plurality of frames are not deleted, the preprocessing module 100 processes the first plurality of frames. That is, after performing the decision for the first plurality of frames, the preprocessing module 100 processes the first plurality of frames that are not deleted to generate the second plurality of frames. exist In one embodiment, the operations of processing the frames of the first plurality of frames include noise reduction, downscaling operation, adaptive histogram equalization, and image quality enhancement (image At least one of quality enhancement and cropping operation.

在一實施例中，預處理模組100接收第一複數個幀及產生第二複數個幀之間的第一時間區間小於第一深度學習模組110接收第二複數個幀及產生第三複數個幀之間的第二時間區間。在一實施例中，第一深度學習模組110接收第二複數個幀及產生第三複數個幀之間的第二時間區間小於第二深度學習模組120接收第三複數個幀及產生複數個預測輸出之間的第三時間區間。也就是說，預處理模組100的運作速度最快，以及第一深度學習模組110的運作速度快於第二深度學習模組120的運作速度。因此，第一深度學習模組110及第二深度學習模組120不會閒置來等待由前一個模組(即預處理模組100及第一深度學習模組110)所產生的輸入(即第二複數個幀及第三複數個幀)。 In an embodiment, the first time interval between the preprocessing module 100 receiving the first plurality of frames and generating the second plurality of frames is smaller than the first deep learning module 110 receiving the second plurality of frames and generating the third plurality of frames The second time interval between frames. In an embodiment, the second time interval between the first deep learning module 110 receiving the second plurality of frames and generating the third plurality of frames is less than the second time interval between the second deep learning module 120 receiving the third plurality of frames and generating the third plurality of frames The third time interval between prediction outputs. In other words, the preprocessing module 100 has the fastest operating speed, and the operating speed of the first deep learning module 110 is faster than the operating speed of the second deep learning module 120. Therefore, the first deep learning module 110 and the second deep learning module 120 will not be idle to wait for the input generated by the previous module (that is, the preprocessing module 100 and the first deep learning module 110) (that is, the first deep learning module 110). Two plural frames and a third plural frames).

在一實施例中，預處理模組100接收第一複數個幀及產生第二複數個幀之間的第一時間區間等於或小於第一複數個幀的連續幀之間的第四時間區間(即第一複數個幀的連續兩個幀之間的捕捉時間區間)。在一實施例中，第一深度學習模組110接收第二複數個幀及產生第三複數個幀之間的第二時間區間等於或小於第一複數個幀的連續幀之間的第四時間區間(即第一複數個幀的連續兩個幀之間的捕捉時間區間)。也就是說，預處理模組100及第一深度學習模組110的運作速度等於或快於幀率，以及預處理模組100及第一深度學習模組110的運作為即時的(real-time)。 In one embodiment, the first time interval between receiving the first plurality of frames and generating the second plurality of frames by the preprocessing module 100 is equal to or less than the fourth time interval between consecutive frames of the first plurality of frames ( That is, the capture time interval between two consecutive frames of the first plurality of frames). In one embodiment, the second time interval between the first deep learning module 110 receiving the second plurality of frames and generating the third plurality of frames is equal to or less than the fourth time interval between consecutive frames of the first plurality of frames Interval (ie, the capture time interval between two consecutive frames of the first plurality of frames). That is, the operation speed of the preprocessing module 100 and the first deep learning module 110 is equal to or faster than the frame rate, and the operation of the preprocessing module 100 and the first deep learning module 110 is real-time ).

在一實施例中，第二深度學習模組120接收第三複數個幀及產生複數個預測輸出之間的第三時間區間可等於或小於第一複數個幀的連續幀之間的第四時間區間(即第一複數個幀的連續兩個幀之間的捕捉時間區間)。在一實施例中，第二深度學習模組120接收第三複數個幀及產生複數個預測輸出之間的第三時間區間可大於第一複數個幀的連續幀之間的第四時間區間(即第一複數個幀的連續兩個幀之間的捕捉時間區間)。也就是說，第二深度學習模組120的運作速度可快於或無法快於幀率，以及第二深度學習模組120的運作可為即時或無法為即時。 In an embodiment, the third time interval between the second deep learning module 120 receiving the third plurality of frames and generating the plurality of prediction outputs may be equal to or less than the fourth time interval between consecutive frames of the first plurality of frames Interval (ie, the capture time interval between two consecutive frames of the first plurality of frames). In an embodiment, the third time interval between the second deep learning module 120 receiving the third plurality of frames and generating the plurality of prediction outputs may be greater than the fourth time interval between consecutive frames of the first plurality of frames ( That is, the capture time interval between two consecutive frames of the first plurality of frames). In other words, the operation speed of the second deep learning module 120 may or may not be faster than the frame rate, and the operation of the second deep learning module 120 may or may not be real-time.

在一實施例中，預處理模組100的計算複雜度小於第二深度學習模組120的計算複雜度。在一實施例中，第一深度學習模組110的計算複雜度小於該第二深度學習模組120的計算複雜度。在一實施例中，根據不同的方法，預處理模組100及第一深度學習模組110用來決定是否刪除所接收的幀(即第一複數個幀的至少一幀及第二複數個幀的至少一幀)。也就是說，雖然第二深度學習模組120的計算複雜度高於預處理模組100及第一深度學習模組110的計算複雜度，但第二深度學習模組120所處理(例如接收)的幀的數量(例如第三複數個幀的數量)少於第一複數個幀及第二複數個幀的數量。因此，硬體的計算複雜度被減少。 In an embodiment, the computational complexity of the preprocessing module 100 is less than the computational complexity of the second deep learning module 120. In an embodiment, the computational complexity of the first deep learning module 110 is less than the computational complexity of the second deep learning module 120. In one embodiment, according to different methods, the preprocessing module 100 and the first deep learning module 110 are used to determine whether to delete the received frames (that is, at least one frame of the first plurality of frames and the second plurality of frames). At least one frame). That is to say, although the computational complexity of the second deep learning module 120 is higher than the computational complexity of the preprocessing module 100 and the first deep learning module 110, the second deep learning module 120 processes (for example, receives) The number of frames (for example, the number of the third plurality of frames) is less than the number of the first plurality of frames and the second plurality of frames. Therefore, the computational complexity of the hardware is reduced.

在一實施例中，預處理模組100所接收的第一複數個幀的數量高於第一深度學習模組110所接收的第二複數個幀的數量。在一實施例中，第一深度學習模組110所接收的第二複數個幀的數量高於第二深度學習模組120所接收的第三複數個幀的數量。 In an embodiment, the number of the first plurality of frames received by the preprocessing module 100 is higher than the number of the second plurality of frames received by the first deep learning module 110. In an embodiment, the number of the second plurality of frames received by the first deep learning module 110 is higher than the number of the third plurality of frames received by the second deep learning module 120.

在一實施例中，第一深度學習模組110及第二深度學習模組120為卷積神經網路(convolutional neural networks，CNNs)。在一實施例中，卷積神經網路包含有一卷積層、一最大池化層(max pooling layer)、一啟動函數層(activation function layer)及一全連接層(fully connected layer)中至少一者。在一實施例中，卷積神經網路另包含有至少一恆等映射(例如恆等映射捷徑(identity mapping shortcut))。 In one embodiment, the first deep learning module 110 and the second deep learning module 120 are convolutional neural networks (CNNs). In one embodiment, the convolutional neural network includes at least one of a convolutional layer, a max pooling layer, an activation function layer, and a fully connected layer . In one embodiment, the convolutional neural network further includes at least one identity mapping (for example, an identity mapping shortcut).

在一實施例中，第一深度學習模組110及第二深度學習模組120的損失函數(例如目標函數)可為相同或不同。在一實施例中，損失函數可為交叉熵(cross-entropy)損失函數。在一實施例中，不同卷積層的核映射(kernel maps)的大小可為不同。在一實施例中，當第一深度學習模組110及/或第二深度學習模組120被訓練時，丟棄法(dropout)可被使用來減少由過度擬合(overfitting)所造成的影響。在一實施例中，在第一深度學習模組110及第二深度學習模組120的每一層後，批次正規化(batch normalization)可被使用。 In an embodiment, the loss functions (for example, objective functions) of the first deep learning module 110 and the second deep learning module 120 may be the same or different. In an embodiment, the loss function may be a cross-entropy loss function. In an embodiment, the size of the kernel maps of different convolutional layers may be different. In an embodiment, when the first deep learning module 110 and/or the second deep learning module 120 are trained, a dropout method can be used to reduce the impact caused by overfitting. In one embodiment, after each layer of the first deep learning module 110 and the second deep learning module 120, batch normalization may be used.

在一實施例中，當第一深度學習模組110及第二深度學習模組120被訓練時，適應矩估計最佳化器(Adaptive Moment Estimation optimizer，ADAM optimizer)被使用來更新第一深度學習模組110及第二深度學習模組120的參數。 In an embodiment, when the first deep learning module 110 and the second deep learning module 120 are trained, an adaptive moment estimation optimizer (Adaptive Moment Estimation optimizer, ADAM optimizer) is used to update the first deep learning The parameters of the module 110 and the second deep learning module 120.

在一實施例中，複數個預測輸出為單熱向量(one-hot vectors)。在一實施例中，複數個預測輸出用來指示第三複數個幀的事件是否為異常(anomaly)。在一實施例中，複數個預測輸出用來追蹤(track)第三複數個幀的事件。在一實施例中，複數個預測輸出用來計數第三複數個幀的事件的數量。 In one embodiment, the plurality of prediction outputs are one-hot vectors. In an embodiment, the plurality of prediction outputs are used to indicate whether the events of the third plurality of frames are anomaly. In one embodiment, the plurality of prediction outputs are used to track the events of the third plurality of frames. In an embodiment, the plurality of prediction outputs are used to count the number of events in the third plurality of frames.

在一實施例中，上述實施例的事件可為目標(例如物件、鐵道元件、人或車輛)、車禍或交通阻塞(traffic jam)。 In an embodiment, the event of the above-mentioned embodiment may be a target (for example, an object, a railway element, a person or a vehicle), a car accident, or a traffic jam.

根據本發明，計算裝置10及/或20可被應用在不同應用領域來解決不同的問題。下列實施例被用來說明計算裝置10及20的運作方式。 According to the present invention, the computing device 10 and/or 20 can be used in different application fields to solve different problems. The following embodiments are used to illustrate the operation of the computing devices 10 and 20.

第3圖為本發明實施例用於鐵道元件異常檢測的一流程的流程圖30。流程圖30可用來實現用於鐵道元件異常檢測的流程的計算裝置10及/或20，以及包含有以下步驟。 FIG. 3 is a flowchart 30 of a process for detecting abnormalities of railway components according to an embodiment of the present invention. The flowchart 30 can be used to implement the computing device 10 and/or 20 for the process of detecting abnormalities of railway components, and includes the following steps.

步驟300：開始。 Step 300: Start.

步驟302：一預處理模組(例如預處理模組100)接收第一複數個幀的一幀。 Step 302: A preprocessing module (for example, the preprocessing module 100) receives one frame of the first plurality of frames.

步驟304：根據一移動偵測，該預處理模組決定當一相機擷取該第一複數個幀的該幀時，該相機是否在移動。若否，執行步驟306；否則，移到步驟308。 Step 304: According to a motion detection, the preprocessing module determines whether the camera is moving when a camera captures the frame of the first plurality of frames. If not, go to step 306; otherwise, go to step 308.

步驟306：該預處理模組刪除該第一複數個幀的該幀。 Step 306: The preprocessing module deletes the first plurality of frames.

步驟308：根據該第一複數個幀的該幀，該預處理模組產生第二複數個幀的一幀。 Step 308: According to the frame of the first plurality of frames, the preprocessing module generates a frame of the second plurality of frames.

步驟310：一第一深度學習模組(例如第一深度學習模組110)決定該第二複數個幀的該幀的一元件是否被該相機完美擷取。若否，執行步驟312；否則，移到步驟314。 Step 310: A first deep learning module (such as the first deep learning module 110) determines whether an element of the second plurality of frames is perfectly captured by the camera. If not, go to step 312; otherwise, go to step 314.

步驟312：該第一深度學習模組刪除該第二複數個幀的該幀。 Step 312: The first deep learning module deletes the frame of the second plurality of frames.

步驟314：根據該第二複數個幀的該幀，該第一深度學習模組產生第三複數個幀的一幀。 Step 314: According to the frame of the second plurality of frames, the first deep learning module generates a frame of the third plurality of frames.

步驟316：一第二深度學習模組(例如第二深度學習模組120)產生該第三複數個幀的該幀的一預測輸出。 Step 316: A second deep learning module (for example, the second deep learning module 120) generates a prediction output of the third plurality of frames.

步驟318：結束。 Step 318: End.

根據流程圖30，一預處理模組接收第一複數個幀的一幀，以及根據一移動偵測，決定當一相機擷取該第一複數個幀的該幀時，該相機是否在移動。當該相機擷取該第一複數個幀的該幀時，若該相機被決定為沒有在移動，該預處理模組刪除第一複數個幀的幀。否則，根據該第一複數個幀的該幀，該預處理模組產生第二複數個幀的一幀。一第一深度學習模組接收該第二複數個幀的該幀，以及決定該第二複數個幀的該幀的一元件(例如目標元件)是否被該相機完美擷取(例如在第二複數個幀的幀中，元件是完整的)。若該元件被決定為不完美的被擷取，該第一深度學習模組刪除該第二複數個幀的該幀。否則，根據該第二複數個幀的該幀，該第一深度學習模組產生第三複數個幀的一幀。一第二深度學習模組接收該第三複數個幀的該幀，以及產生該第三複數個幀的該幀的一預測輸出(例如根據元件的狀態)。也就是說，當計算裝置10及/或20用於鐵道元件異常檢測，根據流程圖30，第一複數個幀被處理。 According to the flowchart 30, a preprocessing module receives a frame of the first plurality of frames, and according to a motion detection, determines whether the camera is moving when a camera captures the frame of the first plurality of frames. When the camera captures the first plurality of frames, if the camera is determined not to be moving, the preprocessing module deletes the first plurality of frames. Otherwise, according to the frame of the first plurality of frames, the preprocessing module generates a frame of the second plurality of frames. A first deep learning module receives the frame of the second plurality of frames, and determines whether an element (such as a target element) of the frame of the second plurality of frames is perfectly captured by the camera (for example, in the second plurality of frames) In each frame, the component is complete). If the element is determined to be imperfectly captured, the first deep learning module deletes the frame of the second plurality of frames. Otherwise, according to the frame of the second plurality of frames, the first deep learning module generates a frame of the third plurality of frames. A second deep learning module receives the frame of the third plurality of frames, and generates a prediction output of the frame of the third plurality of frames (for example, according to the state of the element). That is to say, when the computing device 10 and/or 20 is used for detecting abnormality of a railway element, according to the flowchart 30, the first plurality of frames are processed.

在一實施例中，第一深度學習模組產生第三複數個幀的幀的運作可被取代為第一深度學習模組傳送第二複數個幀的幀到第二深度學習模組的運作。也就是說，若第二複數個幀的幀的元件被決定為完美的被擷取，第三複數個幀的幀為第二複數個幀的幀。 In one embodiment, the operation of generating the third plurality of frames by the first deep learning module can be replaced by the operation of transmitting the second plurality of frames to the second deep learning module by the first deep learning module. In other words, if the elements of the second plurality of frames are determined to be perfectly captured, the third plurality of frames is the second plurality of frames.

在一實施例中，藉由相機，第一複數個幀被產生(例如擷取)。在一實施例中，具有光源的相機可被設置在火車的底部或在軌道(例如鐵道)上的檢測裝置。在不同的路徑上，火車或監測裝置可具有不同速度。路徑可為直的或彎曲的。在一實施例中，元件可為被安裝在軌道上的扣件及螺栓等。 In one embodiment, with the camera, the first plurality of frames are generated (eg captured). In an embodiment, a camera with a light source can be set on the bottom of a train or on a track (such as a railway). Detection device. On different paths, trains or monitoring devices can have different speeds. The path can be straight or curved. In an embodiment, the components may be fasteners, bolts, etc. installed on the track.

在一實施例中，移動偵測包含有場景相減。預處理模組使用場景相減來執行移動偵測，以決定當相機擷取第一複數個幀的幀時，相機是否在移動。舉例來說，預處理模組將第一複數個幀的幀的相鄰幀減去第一複數個幀的幀，以產生具有與幀相同大小的矩陣。預處理模組加總矩陣的所有元素，以產生一數值。接著，若數值大於一臨界值，預處理模組決定當相機擷取第一複數個幀的幀時，相機在移動。若數值小於臨界值，預處理模組決定當相機擷取第一複數個幀的幀時，相機沒有在移動，以及刪除第一複數個幀的幀。 In one embodiment, motion detection includes scene subtraction. The preprocessing module uses scene subtraction to perform motion detection to determine whether the camera is moving when the camera captures the first plurality of frames. For example, the preprocessing module subtracts the frames of the first plurality of frames from adjacent frames of the first plurality of frames to generate a matrix having the same size as the frame. The preprocessing module sums all the elements of the matrix to generate a value. Then, if the value is greater than a critical value, the preprocessing module determines that when the camera captures the first plurality of frames, the camera is moving. If the value is less than the critical value, the preprocessing module determines that when the camera captures the frames of the first plurality of frames, the camera is not moving, and deletes the frames of the first plurality of frames.

在一實施例中，預處理模組可執行以下步驟來產生第二複數個幀的幀：根據軌道定位運作，剪裁第一複數個幀的幀；處理第一複數個幀的幀；以及根據剪裁運作及處理運作，產生第二複數個幀的幀。 In one embodiment, the preprocessing module may perform the following steps to generate the second plurality of frames: according to the track positioning operation, crop the first plurality of frames; process the first plurality of frames; and according to the cropping Operation and processing operations to generate a second plurality of frames.

第4圖為本發明實施例一幀400的示意圖。具有中心線410的三種情況(a)~(c)被用來說明本發明的可能場景，但不限於此。幀400可用來實現在第3圖中預處理模組所接收的第一複數個幀。幀400包含有軌道420，以及在三種情況中，軌道420可為具有不同曲線的軌道的一部分。中心線410在幀400的中間。在情況(a)中，中心線410及軌道420的位置相比，軌道420在幀400的中間。在情況(b)中，中心線410及軌道420的位置相比，在幀400的左側中軌道420的像素多於在幀400的右側中軌道420的像素。在情況(c)中，中心線410及軌道420的位置相比，在幀400的右側中軌道420的像素多於在幀400的左側中軌道420的像素。也就是說，因為火車的晃動、火車的不同速度及軌道的不同曲線，軌道難以在第一複數個幀的每一幀的中間。因此，預處理模組需軌道定位運作來追蹤軌道的偏移(shift)，以找出剪裁座標。 Figure 4 is a schematic diagram of a frame 400 according to an embodiment of the present invention. The three cases (a) to (c) with the center line 410 are used to illustrate the possible scenarios of the present invention, but are not limited thereto. The frame 400 can be used to implement the first plurality of frames received by the preprocessing module in Figure 3. The frame 400 includes a track 420, and in the three cases, the track 420 can be a part of a track with different curves. The center line 410 is in the middle of the frame 400. In the case (a), the center line 410 and the position of the track 420 are compared, and the track 420 is in the middle of the frame 400. In case (b), compared to the positions of the center line 410 and the track 420, the track 420 has more pixels on the left side of the frame 400 than on the right side of the frame 400. In the case (c), compared with the positions of the center line 410 and the track 420, the track 420 has more pixels on the right side of the frame 400 than on the left side of the frame 400. In other words, because of the shaking of the train, the different speed of the train and the different curve of the track, the track The road is difficult to be in the middle of each of the first plural frames. Therefore, the preprocessing module needs track positioning operation to track the track shift in order to find the trimming coordinates.

在一實施例中，軌道定位運作可包含有二值化(binarization)、質心計算及移動平均(moving average)。詳細來說，根據一對比，預處理模組藉由使用一臨界值來二值化第一複數個幀的幀(因為相較於其他元件，軌道具有更高亮度)，以及計算第一複數個幀的幀的質心座標。接著，預處理模組使用移動平均，以將質心座標與先前所計算的第一複數個幀的質心座標平滑。移動平均用來確保質心座標是正確的，以及確保質心座標不受不確定因素(例如部分生鏽軌道)所影響。 In one embodiment, the orbit positioning operation may include binarization, centroid calculation, and moving average. In detail, according to a comparison, the preprocessing module binarizes the frames of the first plurality of frames by using a threshold value (because the track has higher brightness compared to other components), and calculates the first plurality of frames The centroid coordinates of the frame of the frame. Then, the preprocessing module uses a moving average to smooth the centroid coordinates with the previously calculated centroid coordinates of the first plurality of frames. The moving average is used to ensure that the coordinates of the center of mass are correct, and to ensure that the coordinates of the center of mass are not affected by uncertain factors (such as part of the rusty track).

在一實施例中，預處理模組處理第一複數個幀的幀的運作可包含有雜訊降低、降尺度運作及可適性直方圖等化中至少一者。在一實施例中，雜訊降低可為高斯平滑運作。在一實施例中，預處理模組對第一複數個幀的幀執行降維度運作，以減少計算複雜度。 In one embodiment, the operation of the preprocessing module to process the frames of the first plurality of frames may include at least one of noise reduction, downscaling, and adaptability histogram equalization. In one embodiment, the noise reduction can be Gaussian smoothing operation. In an embodiment, the preprocessing module performs a dimensionality reduction operation on the first plurality of frames to reduce computational complexity.

在一實施例中，第一深度學習模組處理(例如檢查)第二複數個幀的幀，以提取第二複數個幀的幀的至少一第一特徵。根據至少一第一特徵，第一深度學習模組決定第二複數個幀的幀的元件是否被相機完美擷取。 In an embodiment, the first deep learning module processes (eg, checks) the frames of the second plurality of frames to extract at least one first feature of the frames of the second plurality of frames. According to at least one first feature, the first deep learning module determines whether the elements of the second plurality of frames are perfectly captured by the camera.

在一實施例中，第二深度學習模組處理(例如檢查)第三複數個幀的幀，以提取第三複數個幀的幀的至少一第二特徵。根據至少一第二特徵，第二深度學習模組決定元件是否為異常，其中至少一第二特徵代表第三複數個幀的幀的元件的狀況。根據決定，第二深度學習模組產生預測輸出，其中預測輸出指示元件是否為異常。舉例來說，根據在扣件的螺栓上標線及軌道之間的角度是否大於N度，第二深度學習模組可決定扣件是否為異常。 In one embodiment, the second deep learning module processes (eg, checks) the frames of the third plurality of frames to extract at least one second feature of the third plurality of frames. According to at least one second feature, the second deep learning module determines whether the component is abnormal, and the at least one second feature represents the condition of the component in the third plurality of frames. According to the decision, the second deep learning module generates a prediction output, where the prediction output Indicates whether the component is abnormal. For example, according to whether the angle between the marking on the bolt of the fastener and the track is greater than N degrees, the second deep learning module can determine whether the fastener is abnormal.

第5圖為本發明實施例一幀500的示意圖。幀500可用來實現在第3圖中第二深度學習模組所接收的第三複數個幀。幀500包含有軌道502、扣件504、螺栓506及標線508。螺栓506用來將扣件504固定於軌道502。標線508位於螺栓506上。雖然不同軌道系統可用於不同的應用，但在第5圖中的每一個元件可被用於具有不同材質及不同形狀的任一基礎軌道系統。3種情況(a)~(c)被用於說明本發明的可能場景，但不限於此。 Figure 5 is a schematic diagram of a frame 500 according to an embodiment of the present invention. The frame 500 can be used to implement the third plurality of frames received by the second deep learning module in Figure 3. The frame 500 includes a rail 502, a fastener 504, a bolt 506, and a marking 508. The bolt 506 is used to fix the fastener 504 to the rail 502. The marking line 508 is located on the bolt 506. Although different rail systems can be used for different applications, each element in Figure 5 can be used for any basic rail system with different materials and different shapes. Three cases (a) to (c) are used to illustrate the possible scenarios of the present invention, but they are not limited thereto.

在情況(a)中，標線508及軌道502間的角度為0度，以及可被視為理想角度。因此，根據角度，第二深度學習模組可決定幀500的扣件508為正常。在情況(b)中，標線508及軌道502間具有角度A1。根據角度A1，第二深度學習模組可決定幀500的扣件508是否為異常。舉例來說，若角度A1大於N度，第二深度學習模組可決定幀500的扣件508為異常，以及產生用來指示幀500的扣件508為異常的預測輸出。若角度A1小於N度，第二深度學習模組可決定幀500的扣件508為正常，以及產生用來指示幀500的扣件508為正常的預測輸出。在情況(c)中，標線508及軌道502間具有角度A2。根據角度A2，第二深度學習模組可決定幀500的扣件508是否為異常。舉例來說，若角度A2大於N度，第二深度學習模組可決定幀500的扣件508為異常，以及產生用來指示幀500的扣件508為異常的預測輸出。若角度A2小於N度，第二深度學習模組可決定幀500的扣件508為正常，以及產生用來指示幀500的扣件508為正常的預測輸出。 In case (a), the angle between the marking line 508 and the track 502 is 0 degrees, and can be regarded as an ideal angle. Therefore, according to the angle, the second deep learning module can determine that the fastener 508 of the frame 500 is normal. In case (b), there is an angle A1 between the marking line 508 and the track 502. According to the angle A1, the second deep learning module can determine whether the fastener 508 of the frame 500 is abnormal. For example, if the angle A1 is greater than N degrees, the second deep learning module may determine that the fastener 508 of the frame 500 is abnormal, and generate a prediction output indicating that the fastener 508 of the frame 500 is abnormal. If the angle A1 is less than N degrees, the second deep learning module can determine that the fastener 508 of the frame 500 is normal, and generate a prediction output used to indicate that the fastener 508 of the frame 500 is normal. In case (c), there is an angle A2 between the marking line 508 and the track 502. According to the angle A2, the second deep learning module can determine whether the fastener 508 of the frame 500 is abnormal. For example, if the angle A2 is greater than N degrees, the second deep learning module may determine that the fastener 508 of the frame 500 is abnormal, and generate a prediction output indicating that the fastener 508 of the frame 500 is abnormal. If the angle A2 is less than N degrees, the second deep learning module can determine that the fastener 508 of the frame 500 is normal, and generate a prediction output used to indicate that the fastener 508 of the frame 500 is normal.

在一實施例中，當第一深度學習模組及第二深度學習模組被訓練時，第一深度學習模組及第二深度學習模組的訓練資料庫的內容被分為正常資料及異常資料。舉例來說，針對第一深度學習模組，正常資料為包含有完整扣件的影像，以及異常資料為包含有不完整扣件或不包含有任何扣件的影像。針對第二深度學習模組，正常資料為包含有完整扣件的影像，其中在扣件的螺栓上的標線及軌道間的角度小於N度。異常資料為包含有完整扣件的影像，其中標線及軌道間的角度大於N度。也就是說，第一深度學習模組及第二深度學習模組之間的訓練資料庫不互相共享，以及第一深度學習模組及第二深度學習模組的標記(例如label)運作為不同的。 In one embodiment, when the first deep learning module and the second deep learning module are trained At this time, the contents of the training database of the first deep learning module and the second deep learning module are divided into normal data and abnormal data. For example, for the first deep learning module, the normal data is an image that contains a complete fastener, and the abnormal data is an image that contains incomplete fasteners or does not contain any fasteners. For the second deep learning module, the normal data is an image containing a complete fastener, in which the angle between the marking on the bolt of the fastener and the track is less than N degrees. The abnormal data is an image containing a complete fastener, where the angle between the marking line and the track is greater than N degrees. In other words, the training database between the first deep learning module and the second deep learning module is not shared with each other, and the labels (such as labels) of the first deep learning module and the second deep learning module operate differently of.

在一實施例中，預測輸出為單熱向量。舉例來說，預測輸出可為向量[0 1]或向量[1 0]。向量[0 1]表示第二深度學習模組決定元件為異常，以及向量[1 0]表示第二深度學習模組決定元件為正常。 In one embodiment, the predicted output is a one-hot vector. For example, the prediction output can be a vector [0 1] or a vector [1 0]. The vector [0 1] indicates that the determining element of the second deep learning module is abnormal, and the vector [1 0] indicates that the determining element of the second deep learning module is normal.

在一實施例中，在第二深度學習模組產生用來指示元件為異常的預測輸出後，根據預測輸出及元件的位置，其他裝置可執行對應的運作(例如修復元件)。 In one embodiment, after the second deep learning module generates a prediction output indicating that the component is abnormal, other devices can perform corresponding operations (for example, repairing the component) based on the prediction output and the location of the component.

在一實施例中，第一深度學習模組及第二深度學習模組為卷積神經網路。在一實施例中，卷積神經網路包含有卷積層、最大池化層、啟動函數層及全連接層中至少一者，其中啟動函數層為洩露整流線性單元(leaky Rectified Linear Unit，ReLU)函數層。在一實施例中，不同卷積層的核映射的大小可為不同的。在一實施例中，第一深度學習模組及第二深度學習模組的損失函數為交叉熵損失函數。在一實施例中，卷積神經網路另包含有至少一恆等映射(例如恆等映射捷徑)。 In one embodiment, the first deep learning module and the second deep learning module are convolutional neural networks. In one embodiment, the convolutional neural network includes at least one of a convolutional layer, a maximum pooling layer, an activation function layer, and a fully connected layer, wherein the activation function layer is a leaky rectified linear unit (ReLU) Function layer. In an embodiment, the sizes of the kernel maps of different convolutional layers may be different. In an embodiment, the loss functions of the first deep learning module and the second deep learning module are cross-entropy loss functions. In one embodiment, the convolutional neural network further includes at least one identity map (for example, identity map shortcut).

第6圖為本發明實施例用於行人重新識別的一流程的流程圖60。流程圖60可用來實現用來處理行人重新識別的計算裝置10及/或20，以及包含有以下步驟。 FIG. 6 is a flowchart 60 of a process for pedestrian re-identification according to an embodiment of the present invention. The flowchart 60 can be used to implement the computing device 10 and/or 20 for processing pedestrian re-identification, and includes the following steps.

步驟600：開始。 Step 600: Start.

步驟602：一預處理模組(例如預處理模組100)接收第一複數個幀的一幀。 Step 602: A preprocessing module (for example, the preprocessing module 100) receives one frame of the first plurality of frames.

步驟604：根據一目標偵測，該預處理模組決定該第一複數個幀的該幀是否包含有至少一行人。若否，執行步驟606；否則，移到步驟608。 Step 604: According to a target detection, the preprocessing module determines whether the frame of the first plurality of frames contains at least one person. If not, go to step 606; otherwise, go to step 608.

步驟606：該預處理模組刪除該第一複數個幀的該幀。 Step 606: The preprocessing module deletes the first plurality of frames.

步驟608：根據該第一複數個幀的該幀，該預處理模組產生第二複數個幀的一幀。 Step 608: According to the frame of the first plurality of frames, the preprocessing module generates a frame of the second plurality of frames.

步驟610：一第一深度學習模組(例如第一深度學習模組110)決定該第二複數個幀的該幀的至少一第一特徵是否匹配在一資料庫中的複數個特徵。若否，執行步驟612；否則，移到步驟614。 Step 610: A first deep learning module (such as the first deep learning module 110) determines whether at least one first feature of the second plurality of frames matches a plurality of features in a database. If not, go to step 612; otherwise, go to step 614.

步驟612：該第一深度學習模組刪除該第二複數個幀的該幀，以及將該至少一第一特徵儲存到該資料庫。 Step 612: The first deep learning module deletes the frames of the second plurality of frames, and stores the at least one first feature in the database.

步驟614：根據該第二複數個幀的該幀，該第一深度學習模組產生第三複數個幀的一幀。 Step 614: According to the frame of the second plurality of frames, the first deep learning module generates a frame of the third plurality of frames.

步驟616：一第二深度學習模組(例如第二深度學習模組120)產生該第三複數個幀的該幀的一預測輸出。 Step 616: A second deep learning module (for example, the second deep learning module 120) generates a prediction output of the third plurality of frames.

步驟618：結束。 Step 618: End.

根據流程圖60，一預處理模組接收第一複數個幀的一幀，以及根據一目標偵測(例如人物偵測)，決定該第一複數個幀的該幀是否包含有至少一行人。若該第一複數個幀的該幀被決定為不包含有任何行人，該預處理模組刪除該第一複數個幀的該幀。否則，根據該第一複數個幀的該幀，該預處理模組產生第二複數個幀的一幀。一第一深度學習模組接收該第二複數個幀的該幀，以及決定該第二複數個幀的該幀的至少一第一特徵(例如至少一行人的至少一第一特徵)是否匹配在一資料庫中的複數個特徵。若該第二複數個幀的該幀的至少一第一特徵被決定為與在該資料庫中的該複數個特徵不匹配，該第一深度學習模組刪除該第二複數個幀的該幀，以及將該至少一第一特徵儲存到該資料庫。否則，根據該第二複數個幀的幀，該第一深度學習模組產生第三複數個幀的一幀。一第二深度學習模組接收該第三複數個幀的該幀，以及產生該第三複數個幀的該幀的一預測輸出。該預測輸出指示至少一行人是否為在資料庫中的行人(例如目標行人)。也就是說，當計算裝置10及/或20用於行人重新識別，根據流程圖60，第一複數個幀被處理。 According to the flowchart 60, a preprocessing module receives one frame of the first plurality of frames, and according to A target detection (for example, person detection) determines whether the frame of the first plurality of frames contains at least one person. If the frame of the first plurality of frames is determined to not contain any pedestrians, the preprocessing module deletes the frame of the first plurality of frames. Otherwise, according to the frame of the first plurality of frames, the preprocessing module generates a frame of the second plurality of frames. A first deep learning module receives the frame of the second plurality of frames, and determines whether at least one first feature of the frame of the second plurality of frames (for example, at least one first feature of at least a group of people) matches A plurality of features in a database. If at least one first feature of the frame of the second plurality of frames is determined to not match the plurality of features in the database, the first deep learning module deletes the frame of the second plurality of frames , And store the at least one first feature in the database. Otherwise, according to the frames of the second plurality of frames, the first deep learning module generates one frame of the third plurality of frames. A second deep learning module receives the frame of the third plurality of frames, and generates a prediction output of the frame of the third plurality of frames. The predictive output indicates whether at least one group of people is a pedestrian in the database (for example, a target pedestrian). That is, when the computing device 10 and/or 20 is used for pedestrian re-identification, according to the flowchart 60, the first plurality of frames are processed.

在一實施例中，第一深度學習模組產生第三複數個幀的幀的運作可被取代為第一深度學習模組傳送第二複數個幀的幀到第二深度學習模組的運作。也就是說，若第一深度學習模組決定至少一第一特徵與在資料庫中的複數個特徵匹配，第三複數個幀的幀為第二複數個幀的幀。 In one embodiment, the operation of generating the third plurality of frames by the first deep learning module can be replaced by the operation of transmitting the second plurality of frames to the second deep learning module by the first deep learning module. That is, if the first deep learning module determines that at least one first feature matches a plurality of features in the database, the third plurality of frames is the second plurality of frames.

在一實施例中，藉由不同視訊記錄器(例如不同相機)，第一複數個幀可被產生(例如擷取)。 In one embodiment, by using different video recorders (for example, different cameras), the first plurality of frames can be generated (for example, captured).

在一實施例中，目標偵測包含有特徵提取。預處理模組可使用特徵提取來執行目標偵測，以決定第一複數個幀的幀是否包含有至少一行人。特徵提取包含有至少一傳統電腦視覺方法(例如低階電腦視覺方法)，例如邊緣偵測(例如霍夫轉換)。 In one embodiment, target detection includes feature extraction. The preprocessing module can use feature extraction to perform target detection to determine whether the first plurality of frames contains at least one person. feature The extraction includes at least one traditional computer vision method (such as a low-level computer vision method), such as edge detection (such as Hough transform).

在一實施例中，在執行目標偵測前，根據移動偵測，預處理模組可決定是否刪除第一複數個幀的幀。移動偵測可包含有場景相減。預處理模組可使用場景相減來執行移動偵測，以決定第一複數個幀的幀與其相鄰幀是否相同。也就是說，預處理模組可執行目標偵測及移動偵測，以決定是否刪除第一複數個幀的幀。 In one embodiment, before performing target detection, according to motion detection, the preprocessing module may determine whether to delete the first plurality of frames. Motion detection can include scene subtraction. The preprocessing module can use scene subtraction to perform motion detection to determine whether the frames of the first plurality of frames are the same as their neighboring frames. In other words, the preprocessing module can perform target detection and motion detection to determine whether to delete the first plurality of frames.

在一實施例中，預處理模組可執行以下步驟來產生第二複數個幀的幀：根據目標定位運作，剪裁第一複數個幀的幀；處理第一複數個幀的幀；以及根據剪裁運作及處理運作，產生第二複數個幀的幀。 In one embodiment, the preprocessing module may perform the following steps to generate the frames of the second plurality of frames: according to the target positioning operation, crop the frames of the first plurality of frames; process the frames of the first plurality of frames; and according to the cropping Operation and processing operations to generate a second plurality of frames.

在一實施例中，目標定位運作可包含有特徵提取。預處理模組使用特徵提取來定位至少一行人的至少一位置，以剪裁至少一行人的至少一定界框(bounding box)。特徵提取包含有至少一傳統電腦視覺方法(例如低階電腦視覺方法)，例如邊緣偵測(例如霍夫轉換)。 In one embodiment, the target positioning operation may include feature extraction. The preprocessing module uses feature extraction to locate at least one position of at least one group of people, so as to tailor at least a certain bounding box of the at least one group of people. Feature extraction includes at least one traditional computer vision method (such as a low-level computer vision method), such as edge detection (such as Hough transform).

在一實施例中，預處理模組處理第一複數個幀的幀的運作可包含有雜訊降低、降尺度運作及影像品質增強(例如彩色對比增強)中至少一者。在一實施例中，雜訊降低可為高斯平滑運作。在一實施例中，預處理模組對第一複數個幀的幀執行降維度運作，以減少計算複雜度。 In one embodiment, the operation of the preprocessing module to process the frames of the first plurality of frames may include at least one of noise reduction, downscaling operation, and image quality enhancement (such as color contrast enhancement). In one embodiment, the noise reduction can be Gaussian smoothing operation. In an embodiment, the preprocessing module performs a dimensionality reduction operation on the first plurality of frames to reduce computational complexity.

在一實施例中，資料庫包含有複數個人物的複數個特徵。在一實施例中，第一深度學習模組處理(例如檢查)第二複數個幀的幀，以提取第二複數個幀的幀的至少一第一特徵。第一深度學習模組決定至少一第一特徵是否匹配在資料庫中複數個特徵。 In one embodiment, the database contains a plurality of characteristics of a plurality of persons. In one implementation In an example, the first deep learning module processes (eg, checks) the frames of the second plurality of frames to extract at least one first feature of the frames of the second plurality of frames. The first deep learning module determines whether at least one first feature matches a plurality of features in the database.

在一實施例中，第二深度學習模組處理(例如檢查)第三複數個幀的幀，以提取第三複數個幀的幀的至少一第二特徵(例如至少一行人的至少一第二特徵)。第二深度學習模組決定至少一第二特徵是否匹配在資料庫中複數個特徵以產生預測輸出。預測輸出指示至少一行人是否為在資料庫中的行人(例如目標行人)。需注意的是，第一深度學習模組的計算複雜度小於第二深度學習模組的計算複雜度。因此，第二深度學習模組所提取的至少一第二特徵可不同於第一深度學習模組所提取的至少一第一特徵。相較於第一深度學習模組的決定，第二深度學習模組的決定可更準確。 In an embodiment, the second deep learning module processes (for example, checks) the frames of the third plurality of frames to extract at least one second feature of the third plurality of frames (for example, at least one second feature of at least one group of people). feature). The second deep learning module determines whether at least one second feature matches a plurality of features in the database to generate a prediction output. The predictive output indicates whether at least one group of people is a pedestrian in the database (for example, a target pedestrian). It should be noted that the computational complexity of the first deep learning module is less than the computational complexity of the second deep learning module. Therefore, the at least one second feature extracted by the second deep learning module may be different from the at least one first feature extracted by the first deep learning module. Compared with the decision of the first deep learning module, the decision of the second deep learning module can be more accurate.

在一實施例中，若用來指示至少一行人為在資料庫中的行人(例如目標行人)的預測輸出被產生，第三複數個幀的幀被標記以儲存在資料庫中。根據新資料庫(例如包含有第三複數個幀的幀)，第一深度學習模組及第二深度學習模組的參數被訓練。也就是說，若第二深度學習模組決定至少一第二特徵與複數個特徵匹配，根據至少一第二特徵，在資料庫中的複數個特徵被精化(refine)。因此，根據本發明，第一深度學習模組及第二深度學習模組可更穩健。 In one embodiment, if the prediction output used to indicate that at least one group of people is a pedestrian in the database (for example, a target pedestrian) is generated, the third plurality of frames are marked for storage in the database. According to the new database (for example, a frame containing a third plurality of frames), the parameters of the first deep learning module and the second deep learning module are trained. That is, if the second deep learning module determines that at least one second feature matches a plurality of features, the plurality of features in the database are refined according to the at least one second feature. Therefore, according to the present invention, the first deep learning module and the second deep learning module can be more robust.

在一實施例中，預測輸出為單熱向量。 In one embodiment, the predicted output is a one-hot vector.

在一實施例中，在第二深度學習模組產生用來指示至少一行人為在資料庫中的行人(例如目標行人)的預測輸出後，根據預測輸出及視訊記錄器的位置，其他裝置(或計算裝置10及/或20的其他模組)可追蹤至少一行人的軌跡。 In one embodiment, after the second deep learning module generates a predictive output indicating that at least one group of people is a pedestrian (for example, a target pedestrian) in the database, the predictive output and the video recorder are used according to the predictive output Other devices (or other modules of the computing device 10 and/or 20) can track the trajectory of at least a group of people.

在一實施例中，第一深度學習模組及第二深度學習模組為卷積神經網路。在一實施例中，卷積神經網路包含有卷積層、最大池化層、啟動函數層及全連接層中至少一者。在一實施例中，不同卷積層的核映射的大小可為相同或不同。在一實施例中，第一深度學習模組及第二深度學習模組的損失函數可為相同或不同。 In one embodiment, the first deep learning module and the second deep learning module are convolutional neural networks. In an embodiment, the convolutional neural network includes at least one of a convolutional layer, a maximum pooling layer, an activation function layer, and a fully connected layer. In an embodiment, the sizes of the kernel maps of different convolutional layers may be the same or different. In an embodiment, the loss functions of the first deep learning module and the second deep learning module may be the same or different.

第7圖為本發明實施例用於交通監測的一流程的流程圖70。流程圖70可用來實現用來處理交通監測的計算裝置10及/或20，以及包含有以下步驟。 FIG. 7 is a flowchart 70 of a process for traffic monitoring according to an embodiment of the present invention. The flowchart 70 can be used to implement the computing device 10 and/or 20 for processing traffic monitoring, and includes the following steps.

步驟700：開始。 Step 700: Start.

步驟702：一預處理模組(例如預處理模組100)接收第一複數個幀的一幀。 Step 702: A preprocessing module (for example, the preprocessing module 100) receives one frame of the first plurality of frames.

步驟704：根據一事件偵測，該預處理模組決定該第一複數個幀的該幀是否包含有至少一事件。若否，執行步驟706；否則，移到步驟708。 Step 704: According to an event detection, the preprocessing module determines whether the frame of the first plurality of frames contains at least one event. If not, go to step 706; otherwise, go to step 708.

步驟706：該預處理模組刪除該第一複數個幀的該幀。 Step 706: The preprocessing module deletes the first plurality of frames.

步驟708：根據該第一複數個幀的該幀，該預處理模組產生第二複數個幀的一幀。 Step 708: According to the frame of the first plurality of frames, the preprocessing module generates a frame of the second plurality of frames.

步驟710：一第一深度學習模組(例如第一深度學習模組110)決定該第二複數個幀的該幀的至少一第一特徵是否匹配在一資料庫中的複數個特徵。若否，執行步驟712；否則，移到步驟714。 Step 710: A first deep learning module (such as the first deep learning module 110) determines whether at least one first feature of the second plurality of frames matches a plurality of features in a database. If not, go to step 712; otherwise, go to step 714.

步驟712：該第一深度學習模組刪除該第二複數個幀的該幀，以及將該至少一第一特徵儲存到該資料庫。 Step 712: The first deep learning module deletes the frames of the second plurality of frames, and stores the at least one first feature in the database.

步驟714：根據該第二複數個幀的該幀，該第一深度學習模組產生第三複數個幀的一幀。 Step 714: According to the frame of the second plurality of frames, the first deep learning module generates a frame of the third plurality of frames.

步驟716：一第二深度學習模組(例如第二深度學習模組120)產生該第三複數個幀的該幀的一預測輸出。 Step 716: A second deep learning module (for example, the second deep learning module 120) generates a prediction output of the third plurality of frames.

步驟718：結束。 Step 718: End.

根據流程圖70，一預處理模組接收第一複數個幀的一幀，以及根據一事件偵測，決定該第一複數個幀的該幀是否包含有至少一事件。若該第一複數個幀的該幀被決定為不包含有任何事件，該預處理模組刪除該第一複數個幀的該幀。否則，根據該第一複數個幀的該幀，該預處理模組產生第二複數個幀的一幀。一第一深度學習模組接收該第二複數個幀的該幀，以及決定該第二複數個幀的該幀的至少一第一特徵(例如至少一事件的至少一第一特徵)是否與在一資料庫中的複數個特徵匹配。若該第二複數個幀的該幀的該至少一第一特徵被決定為與在該資料庫中的該複數個特徵不匹配，該第一深度學習模組刪除該第二複數個幀的該幀，以及將該至少一第一特徵儲存到該資料庫。否則，根據該第二複數個幀的該幀，該第一深度學習模組產生第三複數個幀的一幀。一第二深度學習模組接收該第三複數個幀的該幀，以及產生該第三複數個幀的該幀的預測輸出。該預測輸出指示該至少一事件是否為在該資料庫中的目標事件。也就是說，當計算裝置10及/或20用於交通監測，根據流程圖70，第一複數個幀被處理。 According to the flowchart 70, a preprocessing module receives a frame of the first plurality of frames, and determines whether the frame of the first plurality of frames contains at least one event according to an event detection. If the frame of the first plurality of frames is determined to not contain any event, the preprocessing module deletes the frame of the first plurality of frames. Otherwise, according to the frame of the first plurality of frames, the preprocessing module generates a frame of the second plurality of frames. A first deep learning module receives the frame of the second plurality of frames, and determines whether at least one first feature of the frame of the second plurality of frames (for example, at least one first feature of at least one event) is consistent with A plurality of features in a database are matched. If the at least one first feature of the frame of the second plurality of frames is determined to not match the plurality of features in the database, the first deep learning module deletes the second plurality of frames Frame, and storing the at least one first feature in the database. Otherwise, according to the frame of the second plurality of frames, the first deep learning module generates a frame of the third plurality of frames. A second deep learning module receives the frame of the third plurality of frames, and generates a prediction output of the frame of the third plurality of frames. The prediction output indicates whether the at least one event is a target event in the database. That is, when the computing device 10 and/or 20 is used for traffic monitoring, according to the flowchart 70, the first plurality of frames are processed.

在一實施例中，第一深度學習模組產生第三複數個幀的幀的運作可被取代為第一深度學習模組傳送第二複數個幀的幀到第二深度學習模組的運作。也就是說，若第一深度學習模組決定至少一第一特徵與在資料庫中的複數個特徵匹配，第三複數個幀的幀為第二複數個幀的幀。 In one embodiment, the operation of generating the third plurality of frames by the first deep learning module can be replaced by the operation of transmitting the second plurality of frames to the second deep learning module by the first deep learning module. In other words, if the first deep learning module determines at least one first feature and the complex number in the database The features match, and the frame of the third plurality of frames is the frame of the second plurality of frames.

在一實施例中，藉由一視訊記錄器(例如相機)，第一複數個幀可被產生(例如擷取)。在一實施例中，藉由不同視訊記錄器(例如不同相機)，第一複數個幀可被產生(例如擷取)。 In one embodiment, with a video recorder (such as a camera), the first plurality of frames can be generated (such as captured). In one embodiment, by using different video recorders (for example, different cameras), the first plurality of frames can be generated (for example, captured).

在一實施例中，事件偵測包含有目標偵測及移動偵測，以決定第一複數個幀的幀是否包含有至少一事件。 In one embodiment, the event detection includes target detection and motion detection to determine whether the first plurality of frames includes at least one event.

在一實施例中，目標偵測包含有特徵提取。預處理模組可使用特徵提取來執行目標偵測，以決定第一複數個幀的幀是否包含有至少一目標。特徵提取包含有至少一傳統電腦視覺方法(例如低階電腦視覺方法)，例如邊緣偵測(例如霍夫轉換)。在一實施例中，移動偵測包含有場景相減。預處理模組可使用場景相減來執行移動偵測。也就是說，預處理模組可執行目標偵測及移動偵測，以決定是否刪除第一複數個幀的幀。 In one embodiment, target detection includes feature extraction. The preprocessing module can use feature extraction to perform target detection to determine whether the first plurality of frames contains at least one target. Feature extraction includes at least one traditional computer vision method (such as a low-level computer vision method), such as edge detection (such as Hough transform). In one embodiment, motion detection includes scene subtraction. The preprocessing module can use scene subtraction to perform motion detection. In other words, the preprocessing module can perform target detection and motion detection to determine whether to delete the first plurality of frames.

在一實施例中，目標定位運作可包含有特徵提取。預處理模組可使用特徵提取來定位至少一目標的至少一位置，以剪裁至少一目標的至少一定界框。特徵提取可包含有至少一傳統電腦視覺方法(例如低階電腦視覺方法)，例如邊緣偵測(例如霍夫轉換)。 In one embodiment, the target positioning operation may include feature extraction. The preprocessing module can use feature extraction to locate at least one position of the at least one target, so as to tailor at least a certain bounding box of the at least one target. Feature extraction may include at least one traditional computer vision method (such as a low-level computer vision method), such as edge detection (such as Hough transform).

在一實施例中，資料庫包含有複數個事件的複數個特徵。在一實施例中，第一深度學習模組處理(例如檢查)第二複數個幀的幀，以提取第二複數個幀的幀的至少一第一特徵。第一深度學習模組決定至少一第一特徵是否匹配在資料庫中複數個特徵。 In one embodiment, the database contains a plurality of characteristics of a plurality of events. In an embodiment, the first deep learning module processes (eg, checks) the frames of the second plurality of frames to extract at least one first feature of the frames of the second plurality of frames. The first deep learning module determines whether at least one first feature matches a plurality of features in the database.

在一實施例中，第二深度學習模組處理(例如檢查)第三複數個幀的幀，以提取第三複數個幀的幀的至少一第二特徵(例如至少一事件的至少一第二特徵)。第二深度學習模組決定至少一第二特徵是否匹配在資料庫中複數個特徵，以產生預測輸出。預測輸出指示至少一事件是否為在資料庫中的目標事件。需注意的是，第一深度學習模組的計算複雜度小於第二深度學習模組的計算複雜度。因此，第二深度學習模組所提取的至少一第二特徵可不同於第一深度學習模組所提取的至少一第一特徵。相較於第一深度學習模組的決定，第二深度學習模組的決定可更準確。 In one embodiment, the second deep learning module processes (for example, checks) the frames of the third plurality of frames to extract at least one second feature of the third plurality of frames (for example, at least one second feature of the at least one event) feature). The second deep learning module determines whether at least one second feature matches a plurality of features in the database to generate a prediction output. The predictive output indicates whether at least one event is a target event in the database. It should be noted that the computational complexity of the first deep learning module is less than the computational complexity of the second deep learning module. Therefore, the at least one second feature extracted by the second deep learning module may be different from the at least one first feature extracted by the first deep learning module. Compared with the decision of the first deep learning module, the decision of the second deep learning module can be more accurate.

在一實施例中，若用來指示至少一事件為在資料庫中的目標事件的預測輸出被產生，第三複數個幀的幀被標記以儲存在資料庫中。根據新資料庫(例如包含有第三複數個幀的幀)，第一深度學習模組及第二深度學習模組的參數被訓練。也就是說，若第二深度學習模組決定至少一第二特徵與複數個特徵匹配，根據至少一第二特徵，在資料庫中的複數個特徵被精化。因此，根據本發明，第一深度學習模組及第二深度學習模組可更穩健。 In one embodiment, if the prediction output used to indicate that at least one event is a target event in the database is generated, the third plurality of frames are marked for storage in the database. According to the new database (for example, a frame containing a third plurality of frames), the parameters of the first deep learning module and the second deep learning module are trained. In other words, if the second deep learning module determines at least one second feature and a plurality of features Matching, according to at least one second feature, a plurality of features in the database are refined. Therefore, according to the present invention, the first deep learning module and the second deep learning module can be more robust.

在一實施例中，至少一事件可為車輛、行人、車禍或交通阻塞中一者。在一實施例中，複數個預測輸出為單熱向量。 In an embodiment, the at least one event may be one of a vehicle, a pedestrian, a car accident, or a traffic jam. In one embodiment, the plurality of prediction outputs are one-hot vectors.

在一實施例中，在第二深度學習模組產生用來指示至少一事件為在資料庫中的目標事件的預測輸出後，根據預測輸出及視訊記錄器的位置，其他裝置(或計算裝置10及/或20的其他模組)可執行關聯於交通監控的運作。關聯於交通監控的運作可為計算車輛的數量、搜尋特定車輛、改善(例如重新排程)交通計畫或回報車禍。 In one embodiment, after the second deep learning module generates a prediction output indicating that at least one event is a target event in the database, according to the prediction output and the location of the video recorder, other devices (or computing device 10 And/or other modules of 20) can perform operations related to traffic monitoring. Operations related to traffic monitoring can be counting the number of vehicles, searching for specific vehicles, improving (for example, rescheduling) a traffic plan, or reporting a car accident.

第8圖為本發明實施例一計算裝置80的示意圖。計算裝置80包含有預處理模組100、第一深度學習模組110、第二深度學習模組120、第三深度學習模組802及第四深度學習模組804。詳細來說，第三深度學習模組802耦接於預處理模組100，以及從預處理模組100接收第二複數個幀。第三深度學習模組802決定是否刪除第二複數個幀的至少一幀，以及根據用於第二複數個幀的決定，產生第四複數個幀。第四學深度習模組804耦接於第三深度學習模組802，以及從第三深度學習模組802接收第四複數個幀。第四學深度習模組804產生第四複數個幀的新複數個預測輸出。上述用於第一深度學習模組110及第二深度學習模組120的實施例可對應地應用到第三深度學習模組802及第四深度學習模組804，在此不贅述。因此，第三深度學習模組802及第四深度學習模組804的組合以及第一深度學習模組110及第二深度學習模組120的組合接收相同的幀(即第二複數個幀)，以產生用於不同應用的預測輸出。 FIG. 8 is a schematic diagram of a computing device 80 according to an embodiment of the present invention. The computing device 80 includes a preprocessing module 100, a first deep learning module 110, a second deep learning module 120, a third deep learning module 802, and a fourth deep learning module 804. In detail, the third deep learning module 802 is coupled to the preprocessing module 100, and receives the second plurality of frames from the preprocessing module 100. The third deep learning module 802 determines whether to delete at least one frame of the second plurality of frames, and generates a fourth plurality of frames according to the decision for the second plurality of frames. The fourth deep learning module 804 is coupled to the third deep learning module 802, and from the The three-deep learning module 802 receives the fourth plurality of frames. The fourth learning depth learning module 804 generates the new plurality of prediction outputs of the fourth plurality of frames. The above-mentioned embodiments for the first deep learning module 110 and the second deep learning module 120 can be correspondingly applied to the third deep learning module 802 and the fourth deep learning module 804, which will not be repeated here. Therefore, the combination of the third deep learning module 802 and the fourth deep learning module 804 and the combination of the first deep learning module 110 and the second deep learning module 120 receive the same frame (ie, the second plurality of frames), To generate predictive output for different applications.

在一實施例中，複數個預測輸出及新複數個預測輸出被應用來解決不同應用領域的問題。舉例來說，當計算裝置80用於交通監控，複數個預測輸出及新複數個預測輸出可分別應用於計算車輛的數量及搜尋特定車輛。 In an embodiment, a plurality of prediction outputs and a new plurality of prediction outputs are applied to solve problems in different application fields. For example, when the computing device 80 is used for traffic monitoring, the plurality of prediction outputs and the new plurality of prediction outputs can be used to calculate the number of vehicles and search for specific vehicles, respectively.

在一實施例中，計算裝置80可另包含有緩衝器在模組之間，其中緩衝器可為檔案系統或記憶系統。 In one embodiment, the computing device 80 may further include a buffer between the modules, where the buffer may be a file system or a memory system.

在一實施例中，計算裝置80可另包含有兩個深度學習模組的M種組合，其中M種組合耦接於相同的預處理模組(例如預處理模組100)，以及彼此相互並聯。也就是說，不同的組合接收相同的幀(例如第二複數個幀)，以產生用於不同應用的預測輸出。 In one embodiment, the computing device 80 may further include M combinations of two deep learning modules, wherein the M combinations are coupled to the same preprocessing module (for example, the preprocessing module 100), and are connected in parallel with each other . That is, different combinations receive the same frame (for example, a second plurality of frames) to generate prediction output for different applications.

通訊裝置10的運作在上述實施例中可被歸納為第9圖中的一流程圖90。流程圖90可被實現在通訊裝置10中，以及包含有以下步驟： The operation of the communication device 10 in the above-mentioned embodiment can be summarized as a flowchart 90 in FIG. 9. The flowchart 90 can be implemented in the communication device 10 and includes the following steps:

步驟900：開始。 Step 900: Start.

步驟902：一預處理模組接收第一複數個幀的一幀。 Step 902: A preprocessing module receives one frame of the first plurality of frames.

步驟904：根據一事件偵測，該預處理模組決定是否刪除該第一複數個幀的該幀。若是，執行步驟906；否則，移到步驟908。 Step 904: According to an event detection, the preprocessing module determines whether to delete the first plural number Frames of this frame. If yes, go to step 906; otherwise, go to step 908.

步驟906：該預處理模組刪除該第一複數個幀的該幀。 Step 906: The preprocessing module deletes the first plurality of frames.

步驟908：根據該第一複數個幀的該幀，該預處理模組產生第二複數個幀的一幀。 Step 908: According to the frame of the first plurality of frames, the preprocessing module generates a frame of the second plurality of frames.

步驟910：根據該第二複數個幀的該幀的至少一特徵，一第一深度學習模組決定是否刪除該第二複數個幀的該幀。若是，執行步驟912；否則，移到步驟914。 Step 910: According to at least one feature of the second plurality of frames, a first deep learning module determines whether to delete the second plurality of frames. If yes, go to step 912; otherwise, go to step 914.

步驟912：該第一深度學習模組刪除該第二複數個幀的該幀。 Step 912: The first deep learning module deletes the frame of the second plurality of frames.

步驟914：根據該第二複數個幀的該幀，該第一深度學習模組產生第三複數個幀的一幀。 Step 914: According to the frame of the second plurality of frames, the first deep learning module generates a frame of the third plurality of frames.

步驟916：一第二深度學習模組產生該第三複數個幀的該幀的一預測輸出。 Step 916: A second deep learning module generates a prediction output of the third plurality of frames of the frame.

步驟918：結束。 Step 918: End.

上述運作中所描述的“刪除”可被替換成“丟棄(drop)”運作。上述運作中所描述的“決定(determine)”可被替換成“識別(identify)”、“區分(distinguish)”、“決定(decide)”、“確認(confirm)”或“辨別(discriminate)”。 The "delete" described in the above operation can be replaced with a "drop" operation. The "determine" described in the above operation can be replaced with "identify", "distinguish", "decide", "confirm" or "discriminate" .

本領域具通常知識者當可依本發明的精神加以結合、修飾及/或變化以上所述的實施例，而不限於此。前述的預處理模組、深度學習模組、陳述、函數、模組及/或流程(包含建議步驟)可透過裝置實現，裝置可為硬體、軟體、韌體(為硬體裝置與電腦指令與資料的結合，且電腦指令與資料屬於硬體裝置上的唯讀軟體)、電子系統、或上述裝置的組合。 Those with ordinary knowledge in the art can combine, modify, and/or change the above-mentioned embodiments according to the spirit of the present invention, but are not limited thereto. The aforementioned preprocessing modules, deep learning modules, statements, functions, modules, and/or processes (including recommended steps) can be implemented through devices, which can be hardware, software, firmware (for hardware devices and computer commands) Combination with data, and computer commands and data belong to the read-only software on a hardware device, an electronic system, or a combination of the above devices.

硬體的實施例可包含有類比電路、數位電路及/或混合電路。舉例來說，硬體可包含有特定應用積體電路(application-specific integrated circuit(s)，ASIC(s))、場域可程式化閘陣列(field programmable gate array(s)，FPGA(s))、可程式化邏輯裝置(programmable logic device(s))、耦合硬體元件(coupled hardware components)、或上述裝置的組合。在一實施例中，硬體包含有通用處理器(general-purpose processor(s))、微處理器(microprocessor(s))、控制器(controller(s))、數位訊號處理器(digital signal processor(s)，DSP(s))、或上述裝置的組合。 Examples of hardware may include analog circuits, digital circuits, and/or hybrid circuits. For example, hardware may include application-specific integrated circuit(s) (ASIC(s)), field programmable gate array(s), FPGA(s) ), programmable logic device(s), coupled hardware components, or a combination of the above devices. In one embodiment, the hardware includes a general-purpose processor (s), a microprocessor (microprocessor (s)), a controller (controller (s)), and a digital signal processor (s). (s), DSP(s)), or a combination of the above devices.

軟體的實施例可包含有程式代碼的集合、指令的集合及/或函數的集合，其可被保留(例如存儲)在存儲單元，例如電腦可讀取介質(computer-readable medium)中。電腦可讀取介質可包含有用戶識別模組(Subscriber Identity Module，SIM)、唯讀式記憶體(Read-Only Memory，ROM)、快閃記憶體(flash memory)、隨機存取記憶體(Random-Access Memory，RAM)、CD-ROM/DVD-ROM/BD-ROM、磁帶(magnetic tape)、硬碟(hard disk)、光學資料儲存裝置(optical data storage device)、非揮發性儲存裝置(non-volatile storage device)、或上述裝置的組合。電腦可讀取介質(例如存儲單元)可在內部(例如集成(integrate))或外部(例如分離(separate))耦合到至少一處理器。包含有一個或多個模組的至少一個處理器可(例如被配置為)執行電腦可讀取介質中的軟體。程式代碼的集合、指令的集合及/或函數的集合可使至少一處理器、模組、硬體及/或電子系統執行相關步驟。 An embodiment of the software may include a collection of program codes, a collection of instructions, and/or a collection of functions, which can be retained (eg stored) in a storage unit, such as a computer-readable medium. Computer readable media can include Subscriber Identity Module (SIM), Read-Only Memory (ROM), flash memory, and random access memory (Random). -Access Memory, RAM), CD-ROM/DVD-ROM/BD-ROM, magnetic tape, hard disk, optical data storage device, non-volatile storage device -volatile storage device), or a combination of the above devices. The computer readable medium (such as a storage unit) may be internally (such as integrated) or externally (such as separate) coupled to the at least one processor. At least one processor including one or more modules can (for example, be configured to) execute software in a computer-readable medium. The collection of program codes, the collection of instructions, and/or the collection of functions can enable at least one processor, module, hardware, and/or electronic system to perform relevant steps.

綜上所述，本發明提供了一種用來處理視訊內容分析(例如鐵道元件異常檢測、行人重新識別及交通監控)的計算裝置。具有低計算複雜度的模組刪除視訊的不重要幀。接著，具有高計算複雜度的模組處理視訊的剩下幀(即重要幀)。因此，硬體的計算複雜度被減少。 In summary, the present invention provides a computing device for processing video content analysis (such as railway component abnormality detection, pedestrian re-identification, and traffic monitoring). Modules with low computational complexity Group deletes unimportant frames of video. Then, the module with high computational complexity processes the remaining frames (that is, important frames) of the video. Therefore, the computational complexity of the hardware is reduced.

以上所述僅為本發明之較佳實施例，凡依本發明申請專利範圍所做之均等變化與修飾，皆應屬本發明之涵蓋範圍。 The foregoing descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made in accordance with the scope of the patent application of the present invention shall fall within the scope of the present invention.

10:計算裝置 10: Computing device

100:預處理模組 100: preprocessing module

110:第一深度學習模組 110: The first deep learning module

120:第二深度學習模組 120: The second deep learning module

Claims

A computing device (computing device) for processing video content analysis (video content analysis), including: a preprocessing module (preprocessing module) for receiving the first plurality of frames, and according to an event detection (event detection), determining whether to delete at least one frame of the first plurality of frames, so as to generate a second plurality of frames according to the decision for the first plurality of frames; a first deep learning module (deep learning module) , Coupled to the preprocessing module, used to receive the second plurality of frames, and according to the plurality of characteristics of the second plurality of frames, determine whether to delete at least one frame of the second plurality of frames, according to the use In the determination of the second plurality of frames, a third plurality of frames are generated; and a second deep learning module, coupled to the first deep learning module, is used to receive the third plurality of frames to generate The plurality of prediction outputs of the third plurality of frames.

The computing device according to claim 1, further comprising: an adaptive buffer, coupled to the second deep learning module, for storing the plurality of prediction outputs, wherein according to the information including an event A number of at least one frame, a size of the adaptability buffer are determined, and the at least one frame is included in the first plurality of frames.

The computing device according to claim 1, wherein the first plurality of frames are generated for a streaming or a video.

The computing device according to claim 1, wherein the event detection includes at least one of a motion detection and an object detection.

The computing device according to claim 1, wherein if a frame of the first plurality of frames is determined not to be deleted, the preprocessing module processes the frame of the first plurality of frames.

The computing device according to claim 5, wherein the operation of processing the frames of the first plurality of frames includes a noise reduction, a downscaling operation, and an adaptability histogram equalization ( At least one of adaptive histogram equalization, image quality enhancement, and cropping operation.

The computing device according to claim 1, wherein a first time interval between the preprocessing module receiving the first plurality of frames and generating the second plurality of frames is less than the first deep learning module receiving the first time interval A second time interval between two plural frames and generating the third plural frames.

The computing device according to claim 1, wherein a second time interval between the first deep learning module receiving the second plurality of frames and generating the third plurality of frames is less than that received by the second deep learning module A third time interval between the third plurality of frames and the generation of the plurality of prediction outputs.

The computing device according to claim 1, wherein a first time interval between receiving the first plurality of frames and generating the second plurality of frames is equal to or less than the continuation of the first plurality of frames (successive) A fourth time interval between frames.

The computing device according to claim 1, wherein the first deep learning module receives the first deep learning module A second time interval between two plurality of frames and generating the third plurality of frames is equal to or less than a fourth time interval between consecutive frames of the first plurality of frames.

The computing device according to claim 1, wherein a computational complexity of the preprocessing module is less than a computational complexity of the second deep learning module.

The computing device according to claim 1, wherein a computational complexity of the first deep learning module is less than a computational complexity of the second deep learning module.

The computing device according to claim 1, wherein the first deep learning module and the second deep learning module are convolutional neural networks (CNNs).

The computing device according to claim 13, wherein the convolutional neural network includes a convolutional layer, a max pooling layer, an activation function layer, and a fully connected layer At least one of layer).

The computing device according to claim 1, wherein the plurality of prediction outputs are one-hot vectors.

The computing device according to claim 1, wherein the plurality of prediction outputs are used to indicate whether an event of the third plurality of frames is anomaly.

The computing device according to claim 1, wherein the plurality of prediction outputs are used to track an event of the third plurality of frames.

The computing device according to claim 1, wherein the plurality of prediction outputs are used to count a number of an event in the third plurality of frames.