TWI776489B

TWI776489B - Electronic device and method for document segmentation

Info

Publication number: TWI776489B
Application number: TW110115669A
Authority: TW
Inventors: 郭景明; 張立穎
Original assignee: 國立臺灣科技大學
Priority date: 2020-06-16
Filing date: 2021-04-29
Publication date: 2022-09-01
Also published as: TW202201272A

Abstract

An electronic device and a method for document segmentation are provided. The method includes: obtaining a first feature map and a second feature map corresponding to an original document; performing a first upsampling on the second feature map to generate a third feature map; concating the first feature map and the third feature map to generate a fourth feature map; inputting the fourth feature map to a first inverted residual block (IRB) and performing a first atrous convolution operation based on a first dilation rate to generate a fifth feature map; inputting the fourth feature map to a second IRB and performing a second atrous convolution operation based on a second dilation rate to generate a sixth feature map; concating the fifth feature map and the sixth feature map to generate a seventh feature map; performing a convolution operation on the seventh feature map to generate a segmented document.

Description

Electronic device and method for file segmentation

本發明是有關於一種用於文件分割的電子裝置和方法。The present invention relates to an electronic device and method for file segmentation.

目前，文件分割（document segmentation）為語意分割（semantic segmentation）領域中受到關注的技術。文件分割可用於識別並標示文件中的各個物件（例如：文字內容、圖像或表格）。儘管已有許多基於深度學習（deep learning）的文件分割方法被提出，該些方法所產生的結果仍受限於運算資源的多寡影響。例如，包含較少卷積層的卷積神經網路可能無法非常清楚地標示出文件中的物件。據此，如何提出一種能利用較少的運算資源達到較佳的結果的文件分割方法，是本領域人員致力的目標之一。Currently, document segmentation is a technique that has received attention in the field of semantic segmentation. Document segmentation can be used to identify and label individual objects in a document (eg textual content, images or tables). Although many deep learning-based file segmentation methods have been proposed, the results produced by these methods are still limited by the amount of computing resources. For example, a convolutional neural network with fewer convolutional layers may not be able to label objects in a file very clearly. Accordingly, how to propose a file segmentation method that can achieve better results with less computing resources is one of the goals of those in the art.

本發明提供一種用於文件分割的電子裝置和方法，可利用少量的運算資源對文件進行文件分割以產生經分割文件。The present invention provides an electronic device and method for file segmentation, which can utilize a small amount of computing resources to perform file segmentation on a file to generate a segmented file.

本發明的一種用於文件分割的電子裝置，包含處理器、儲存媒體以及收發器。收發器接收原始文件。儲存媒體儲存神經網路模型。處理器耦接儲存媒體以及收發器，並且存取和執行神經網路模型，其中神經網路模型包含第一模型，其中第一模型經配置以執行：取得對應於原始文件的第一尺寸的第一特徵圖和第二尺寸的第二特徵圖，其中第一尺寸大於第二尺寸；對第二特徵圖執行第一上取樣以產生第三尺寸的第三特徵圖，其中第三尺寸等於第一尺寸；連接第一特徵圖和第三特徵圖以產生第四特徵圖；將第四特徵圖輸入至第一反向殘差塊以產生第一輸出，並且對第一輸出執行基於第一擴張率的第一空洞卷積運算以產生第五特徵圖；將第四特徵圖輸入至第二反向殘差塊以產生第二輸出，並且對第二輸出執行基於第二擴張率的第二空洞卷積運算以產生第六特徵圖，其中第二擴張率不同於第一擴張率；連接第五特徵圖和第六特徵圖以產生第七特徵圖；以及對第七特徵圖執行第一卷積運算以產生經分割文件，其中處理器通過收發器輸出經分割文件。An electronic device for file division of the present invention includes a processor, a storage medium and a transceiver. The transceiver receives the original file. The storage medium stores the neural network model. The processor is coupled to the storage medium and the transceiver, and accesses and executes a neural network model, wherein the neural network model includes a first model, wherein the first model is configured to perform: obtaining a first size corresponding to the first size of the original file a feature map and a second feature map of a second size, wherein the first size is larger than the second size; performing a first upsampling on the second feature map to generate a third feature map of a third size, wherein the third size is equal to the first size; concatenate the first feature map and the third feature map to generate a fourth feature map; input the fourth feature map to the first inverse residual block to generate a first output, and perform a first dilation rate based on the first output The first atrous convolution operation of to generate the fifth feature map; the fourth feature map is input to the second inverse residual block to generate the second output, and the second dilation rate based second atrous convolution is performed on the second output product operation to generate a sixth feature map, wherein the second dilation rate is different from the first dilation rate; connecting the fifth feature map and the sixth feature map to generate a seventh feature map; and performing a first convolution operation on the seventh feature map to generate a segmented file, wherein the processor outputs the segmented file through the transceiver.

在本發明的一實施例中，上述的神經網路模型更包含第二模型，其中第二模型經配置以執行：對第二特徵圖執行第二上取樣以產生第四尺寸的第八特徵圖，其中第四尺寸等於第一尺寸；連接第一特徵圖和第八特徵圖以產生第九特徵圖；以及對第九特徵圖執行第二卷積運算以產生輸出特徵圖。In an embodiment of the present invention, the above-mentioned neural network model further includes a second model, wherein the second model is configured to perform: performing a second upsampling on the second feature map to generate an eighth feature map of a fourth size , where the fourth dimension is equal to the first dimension; concatenate the first feature map and the eighth feature map to generate a ninth feature map; and perform a second convolution operation on the ninth feature map to generate an output feature map.

在本發明的一實施例中，上述的第一模型對應於第一損失函數，其中第二模型對應於第二損失函數，其中處理器將第一損失函數和第二損失函數相加以產生第三損失函數，其中處理器根據第三損失函數訓練第一模型和第二模型。In an embodiment of the present invention, the above-mentioned first model corresponds to a first loss function, wherein the second model corresponds to a second loss function, wherein the processor adds the first loss function and the second loss function to generate a third loss function a loss function, wherein the processor trains the first model and the second model according to the third loss function.

在本發明的一實施例中，上述的神經網路模型更包含編碼卷積網路，其中編碼卷積網路包含第一編碼卷積層和第二編碼卷積層，其中編碼卷積網路經配置以執行：根據原始文件和第一編碼卷積層產生第一編碼特徵圖；以及根據第一編碼特徵圖和第二編碼卷積層以產生第二編碼特徵圖。In an embodiment of the present invention, the above-mentioned neural network model further includes a coding convolutional network, wherein the coding convolutional network includes a first coding convolutional layer and a second coding convolutional layer, wherein the coding convolutional network is configured to perform: generating a first encoded feature map according to the original file and the first encoded convolutional layer; and generating a second encoded feature map based on the first encoded feature map and the second encoded convolutional layer.

在本發明的一實施例中，上述的神經網路模型更包含解碼卷積網路，其中解碼卷積網路包含第一解碼層和第二解碼層，其中第一解碼層包含第二編碼卷積層以及對應於第二編碼卷積層的解碼卷積層，其中解碼卷積網路經配置以執行：根據第二編碼特徵圖和第一解碼層產生第二特徵圖；以及根據第二特徵圖和第二解碼層產生第一特徵圖。In an embodiment of the present invention, the above-mentioned neural network model further includes a decoding convolutional network, wherein the decoding convolutional network includes a first decoding layer and a second decoding layer, wherein the first decoding layer includes a second encoding volume a convolutional layer and a decoding convolutional layer corresponding to the second encoding convolutional layer, wherein the decoding convolutional network is configured to perform: generating a second feature map based on the second encoding feature map and the first decoding layer; and generating a second feature map based on the second feature map and the first decoding layer The second decoding layer produces the first feature map.

在本發明的一實施例中，上述的第一模型更經配置以執行：將第一特徵圖和第三特徵圖相加以產生第十特徵圖；以及連接第十特徵圖、第一特徵圖以及第三特徵圖以產生第四特徵圖。In one embodiment of the present invention, the first model described above is further configured to perform: adding the first feature map and the third feature map to generate a tenth feature map; and connecting the tenth feature map, the first feature map, and a third feature map to generate a fourth feature map.

在本發明的一實施例中，上述的第一模型更經配置以執行：將第五特徵圖和第六特徵圖相加以產生第十一特徵圖；以及連接第五特徵圖、第六特徵圖以及第十一特徵圖以產生第七特徵圖。In one embodiment of the present invention, the first model described above is further configured to perform: adding the fifth feature map and the sixth feature map to generate an eleventh feature map; and connecting the fifth and sixth feature maps and the eleventh feature map to generate the seventh feature map.

在本發明的一實施例中，上述的第一模型更經配置以執行：對第七特徵圖執行第一卷積運算以產生第十二特徵圖；以及將第十二特徵圖輸入至擠壓和激勵網路以產生經分割文件。In one embodiment of the present invention, the first model described above is further configured to perform: performing a first convolution operation on the seventh feature map to generate a twelfth feature map; and inputting the twelfth feature map to extrusion and incentivizing the network to produce segmented files.

在本發明的一實施例中，上述的第一編碼卷積層對原始文件執行移動反向瓶頸卷積以產生第一編碼特徵圖。In an embodiment of the present invention, the above-mentioned first encoding convolutional layer performs a moving reverse bottleneck convolution on the original file to generate a first encoding feature map.

本發明的一種用於文件分割的方法，包含：取得原始文件和神經網路模型，其中神經網路模型包含第一模型，其中第一模型經配置以執行：取得對應於原始文件的第一尺寸的第一特徵圖和第二尺寸的第二特徵圖，其中第一尺寸大於第二尺寸；對第二特徵圖執行第一上取樣以產生第三尺寸的第三特徵圖，其中第三尺寸等於第一尺寸；連接第一特徵圖和第三特徵圖以產生第四特徵圖；將第四特徵圖輸入至第一反向殘差塊以產生第一輸出，並且對第一輸出執行基於第一擴張率的第一空洞卷積運算以產生第五特徵圖；將第四特徵圖輸入至第二反向殘差塊以產生第二輸出，並且對第二輸出執行基於第二擴張率的第二空洞卷積運算以產生第六特徵圖，其中第二擴張率不同於第一擴張率；連接第五特徵圖和第六特徵圖以產生第七特徵圖；以及對第七特徵圖執行第一卷積運算以產生經分割文件；以及輸出經分割文件。A method of the present invention for file segmentation, comprising: obtaining an original file and a neural network model, wherein the neural network model includes a first model, wherein the first model is configured to perform: obtaining a first size corresponding to the original file and a second feature map of a first size of a first dimension; connecting the first feature map and the third feature map to generate a fourth feature map; inputting the fourth feature map to a first inverse residual block to generate a first output, and performing a a first dilated convolution operation of the dilation rate to generate a fifth feature map; the fourth feature map is input to a second inverse residual block to generate a second output, and a second dilation rate based second output is performed on the second output atrous convolution operation to generate a sixth feature map, where the second dilation rate is different from the first dilation rate; concatenating the fifth feature map and the sixth feature map to generate a seventh feature map; and performing the first convolution on the seventh feature map product operations to produce a split file; and outputting the split file.

基於上述，本發明提出的神經網路模型的架構可在使用到較少運算資源的情況下產生優於傳統文件分割方法的結果。Based on the above, the architecture of the neural network model proposed by the present invention can produce better results than traditional file segmentation methods under the condition of using less computing resources.

為了使本發明之內容可以被更容易明瞭，以下特舉實施例作為本發明確實能夠據以實施的範例。另外，凡可能之處，在圖式及實施方式中使用相同標號的元件/構件/步驟，係代表相同或類似部件。In order to make the content of the present invention more comprehensible, the following specific embodiments are given as examples according to which the present invention can indeed be implemented. Additionally, where possible, elements/components/steps using the same reference numerals in the drawings and embodiments represent the same or similar parts.

圖1根據本發明的實施例繪示一種用於文件分割的電子裝置100的示意圖。電子裝置100可包含處理器110、儲存媒體120以及收發器130。FIG. 1 is a schematic diagram of an electronic device 100 for file division according to an embodiment of the present invention. The electronic device 100 may include a processor 110 , a storage medium 120 and a transceiver 130 .

處理器110例如是中央處理單元（central processing unit，CPU），或是其他可程式化之一般用途或特殊用途的微控制單元（micro control unit，MCU）、微處理器（microprocessor）、數位信號處理器（digital signal processor，DSP）、可程式化控制器、特殊應用積體電路（application specific integrated circuit，ASIC）、圖形處理器（graphics processing unit，GPU）、影像訊號處理器（image signal processor，ISP）、影像處理單元（image processing unit，IPU）、算數邏輯單元（arithmetic logic unit，ALU）、複雜可程式邏輯裝置（complex programmable logic device，CPLD）、現場可程式化邏輯閘陣列（field programmable gate array，FPGA）或其他類似元件或上述元件的組合。處理器110可耦接至儲存媒體120以及收發器120，並且存取和執行儲存於儲存媒體120中的多個模組和各種應用程式。The processor 110 is, for example, a central processing unit (CPU), or other programmable general-purpose or special-purpose micro control unit (micro control unit, MCU), microprocessor (microprocessor), digital signal processing digital signal processor (DSP), programmable controller, application specific integrated circuit (ASIC), graphics processor (graphics processing unit, GPU), image signal processor (image signal processor, ISP) ), image processing unit (IPU), arithmetic logic unit (ALU), complex programmable logic device (CPLD), field programmable gate array (field programmable gate array) , FPGA) or other similar elements or a combination of the above. The processor 110 may be coupled to the storage medium 120 and the transceiver 120 , and access and execute a plurality of modules and various application programs stored in the storage medium 120 .

儲存媒體120例如是任何型態的固定式或可移動式的隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、快閃記憶體（flash memory）、硬碟（hard disk drive，HDD）、固態硬碟（solid state drive，SSD）或類似元件或上述元件的組合，而用於儲存可由處理器OOO執行的多個模組或各種應用程式。在本實施例中，儲存媒體120可儲存用於對原始文件進行文件分割的神經網路模型200。The storage medium 120 is, for example, any type of fixed or removable random access memory (random access memory, RAM), read-only memory (ROM), and flash memory (flash memory). , hard disk drive (HDD), solid state drive (solid state drive, SSD) or similar components or a combination of the above components for storing a plurality of modules or various applications executable by the processor OOO. In this embodiment, the storage medium 120 can store the neural network model 200 for file segmentation of the original file.

收發器130以無線或有線的方式傳送及接收訊號。收發器130還可以執行例如低噪聲放大、阻抗匹配、混頻、向上或向下頻率轉換、濾波、放大以及類似的操作。電子裝置100可通過收發器130接收原始文件，從而利用儲存媒體120中的神經網路模型來對原始文件進行文件分割。The transceiver 130 transmits and receives signals in a wireless or wired manner. Transceiver 130 may also perform operations such as low noise amplification, impedance matching, frequency mixing, up or down frequency conversion, filtering, amplification, and the like. The electronic device 100 may receive the original file through the transceiver 130, so as to use the neural network model in the storage medium 120 to perform file segmentation on the original file.

圖2根據本發明的實施例繪示神經網路模型200的示意圖。神經網路模型200可包含編碼卷積網路210、解碼卷積網路220、第一模型230以及第二模型240，其中第一模型230可包含密接金字塔模組（densely joint pyramid module，DJPM）231。在一實施例中，第一模型230可進一步包含擠壓和激勵網路（squeeze-and-excitation network，SENet）232。神經網路模型200可接收原始文件30，並且將原始文件30轉換為經處理的文件。圖3根據本發明的實施例繪示原始文件30和經處理的文件的示意圖。經處理的文件可包含由第一模型230輸出的經切割文件40以及由第二模型240輸出的經切割文件50。由圖3可知，經切割文件40（或經切割文件50）可清楚地標示出原始文件30中的不同物件。換句話說，神經網路模型200的文件分割的效能十分優異。FIG. 2 is a schematic diagram of a neural network model 200 according to an embodiment of the present invention. The neural network model 200 may include an encoding convolutional network 210, a decoding convolutional network 220, a first model 230, and a second model 240, wherein the first model 230 may include a densely joint pyramid module (DJPM) 231. In one embodiment, the first model 230 may further include a squeeze-and-excitation network (SENet) 232 . The neural network model 200 may receive the raw file 30 and convert the raw file 30 into a processed file. FIG. 3 illustrates a schematic diagram of an original document 30 and a processed document according to an embodiment of the present invention. The processed files may include the cut file 40 output by the first model 230 and the cut file 50 output by the second model 240 . As can be seen from FIG. 3 , the cut document 40 (or cut document 50 ) can clearly identify the different items in the original document 30 . In other words, the performance of the file segmentation of the neural network model 200 is excellent.

參照圖2，編碼卷積網路210可包含多個編碼卷積層，其中所述多個編碼卷積層的數量可依需求而調整，本發明不加以限制。在本實施例中，編碼卷積網路210可包含編碼卷積層211、編碼卷積層212、編碼卷積層213、編碼卷積層214、編碼卷積層215、編碼卷積層216、編碼卷積層217以及編碼卷積層218。Referring to FIG. 2 , the coding convolutional network 210 may include a plurality of coding convolutional layers, wherein the number of the plurality of coding convolutional layers can be adjusted according to requirements, which is not limited in the present invention. In this embodiment, the encoding convolutional network 210 may include an encoding convolutional layer 211, an encoding convolutional layer 212, an encoding convolutional layer 213, an encoding convolutional layer 214, an encoding convolutional layer 215, an encoding convolutional layer 216, an encoding convolutional layer 217, and an encoding convolutional layer 217. Convolutional layer 218.

編碼卷積層211可接收原始文件30，並且對原始文件30進行卷積運算以產生編碼特徵圖。編碼卷積層212可接收由編碼卷積層211輸出的編碼特徵圖，並且對由編碼卷積層211輸出的編碼特徵圖進行卷積運算以產生新的編碼特徵圖。基於類似的方式，編碼卷積網路210中的編碼卷積層可接收上一個編碼卷積層所輸出的編碼特徵圖並根據所接收的編碼特徵圖產生新的編碼特徵圖。在經過多個編碼卷積層的卷積運算後，編碼卷積層218可對由編碼卷積層217輸出的編碼特徵圖進行卷積運算以產生新的編碼特徵圖。The encoding convolution layer 211 may receive the original document 30 and perform a convolution operation on the original document 30 to generate an encoded feature map. The encoding convolutional layer 212 may receive the encoded feature map output by the encoding convolutional layer 211, and perform a convolution operation on the encoded feature map output by the encoding convolutional layer 211 to generate a new encoded feature map. In a similar manner, the coding convolutional layer in the coding convolutional network 210 may receive the coding feature map output by the previous coding convolutional layer and generate a new coding feature map according to the received coding feature map. After the convolution operation of multiple encoding convolutional layers, the encoding convolutional layer 218 may perform a convolution operation on the encoded feature map output by the encoding convolutional layer 217 to generate a new encoded feature map.

編碼卷積網路210中的多個編碼卷積層可對應於不同的尺寸。換句話說，不同的編碼卷積層所輸出的編碼特徵圖的尺寸可不相同。舉例來說，編碼卷積層211輸出的編碼特徵圖的尺寸可不同於編碼卷積層212輸出的編碼特徵圖的尺寸。編碼卷積網路210可利用不同尺寸的多個編碼卷積層來擷取出原始文件30在多個尺度的時間或空間中的重要特徵。The multiple encoded convolutional layers in the encoded convolutional network 210 may correspond to different sizes. In other words, the encoded feature maps output by different encoding convolutional layers may have different sizes. For example, the size of the encoded feature map output by the encoding convolutional layer 211 may be different from the size of the encoded feature map output by the encoding convolutional layer 212 . The coding convolutional network 210 can utilize multiple coding convolutional layers of different sizes to extract important features of the original document 30 in time or space at multiple scales.

在一實施例中，編碼卷積網路210中的多個編碼卷積層可為移動反向瓶頸卷積（mobile inverted bottleneck convolution，MBConv）層。以編碼卷積層211為例，編碼卷積層211可對原始文件30執行移動反向瓶頸卷積以產生編碼特徵圖。以編碼卷積層212為例，編碼卷積層212可對由編碼卷積層211輸出的編碼特徵圖執行移動反向瓶頸卷積以產生新的編碼特徵圖。In one embodiment, the plurality of encoded convolutional layers in the encoded convolutional network 210 may be mobile inverted bottleneck convolution (MBConv) layers. Taking the encoding convolutional layer 211 as an example, the encoding convolutional layer 211 may perform a moving reverse bottleneck convolution on the original document 30 to generate an encoded feature map. Taking the encoded convolutional layer 212 as an example, the encoded convolutional layer 212 may perform a moving reverse bottleneck convolution on the encoded feature map output by the encoded convolutional layer 211 to generate a new encoded feature map.

解碼卷積網路220可包含多個解碼層，其中所述多個解碼層的數量可依需求而調整，本發明不加以限制。在本實施例中，多個解碼層的數量可為編碼卷積網路210中的多個編碼卷積層的數量減去1。解碼卷積網路220可包含解碼層221、解碼層222、解碼層223、解碼層224、解碼層225、解碼層226以及解碼層227。The decoding convolutional network 220 can include a plurality of decoding layers, wherein the number of the plurality of decoding layers can be adjusted according to requirements, which is not limited in the present invention. In this embodiment, the number of multiple decoding layers may be the number of multiple encoding convolutional layers in the encoding convolutional network 210 minus 1. The decoding convolutional network 220 may include a decoding layer 221 , a decoding layer 222 , a decoding layer 223 , a decoding layer 224 , a decoding layer 225 , a decoding layer 226 , and a decoding layer 227 .

解碼卷積網路220中的一或多個解碼層可與編碼卷積網路210中的一或多個編碼卷積層相對應。在本實施例中，解碼層221可與編碼卷積層217相對應。解碼層222可與編碼卷積層216相對應。解碼層223可與編碼卷積層215相對應。解碼層224可與編碼卷積層214相對應。解碼層225可與編碼卷積層213相對應。解碼層226可與編碼卷積層212相對應。解碼層227可與編碼卷積層211相對應。One or more decoding layers in decoding convolutional network 220 may correspond to one or more encoding convolutional layers in encoding convolutional network 210 . In this embodiment, the decoding layer 221 may correspond to the encoding convolutional layer 217 . Decoding layer 222 may correspond to encoding convolutional layer 216 . Decoding layer 223 may correspond to encoding convolutional layer 215 . Decoding layer 224 may correspond to encoding convolutional layer 214 . Decoding layer 225 may correspond to encoding convolutional layer 213 . Decoding layer 226 may correspond to encoding convolutional layer 212 . Decoding layer 227 may correspond to encoding convolutional layer 211 .

在解碼卷積網路220中，與編碼卷積網路210距離較接近的一或多個解碼層（即：距離編碼卷積網路210的輸入端較接近的一或多個解碼層）可包含編碼卷積層。解碼層中的編碼卷積層可位於解碼層的輸入端或輸出端。解碼層可為編碼卷積層以及對應於編碼卷積層的解碼卷積層的串聯（concatenation）。所述串聯用以補償解碼卷積層在還原資料時造成的損失。解碼卷積層在還原資料時，是基於最小的尺寸執行還原流程，故會使資料中的細節丟失。因此，本發明通過編碼卷積層以及解碼卷積層的串聯來補償細節的損失。在本實施例中，解碼層221可為編碼卷積層217以及對應於編碼卷積層217的解碼卷積層的串聯（concatenation）。解碼層222可為對應於編碼卷積層216的解碼卷積層以及編碼卷積層216的串聯。解碼層223可為對應於編碼卷積層215的解碼卷積層以及編碼卷積層215的串聯。解碼層224可為對應於編碼卷積層214的解碼卷積層以及編碼卷積層214的串聯。解碼層225可為對應於編碼卷積層213的解碼卷積層以及編碼卷積層213的串聯。解碼層226可僅包含對應於編碼卷積層212的編碼卷積層。解碼層227可僅包含對應於編碼卷積層211的編碼卷積層。In the decoding convolutional network 220, one or more decoding layers that are closer to the encoding convolutional network 210 (ie, one or more decoding layers that are closer to the input of the encoding convolutional network 210) may be Contains encoded convolutional layers. The coded convolutional layers in the decoding layers can be at the input or output of the decoding layers. The decoding layer may be a concatenation of an encoding convolutional layer and a decoding convolutional layer corresponding to the encoding convolutional layer. The concatenation is used to compensate for the loss caused by the decoding convolutional layer in restoring the data. When the decoding convolution layer restores the data, the restoration process is performed based on the smallest size, so the details in the data will be lost. Therefore, the present invention compensates for the loss of detail by concatenating the encoding convolutional layers and decoding the convolutional layers. In this embodiment, the decoding layer 221 may be a concatenation of the encoding convolutional layer 217 and the decoding convolutional layer corresponding to the encoding convolutional layer 217 . Decoding layer 222 may be a decoding convolutional layer corresponding to encoding convolutional layer 216 and a concatenation of encoding convolutional layer 216 . Decoding layer 223 may be a decoding convolutional layer corresponding to encoding convolutional layer 215 and a concatenation of encoding convolutional layers 215 . The decoding layer 224 may be a decoding convolutional layer corresponding to the encoding convolutional layer 214 and a concatenation of the encoding convolutional layer 214 . Decoding layer 225 may be a decoding convolutional layer corresponding to encoding convolutional layer 213 and a concatenation of encoding convolutional layer 213 . Decoding layer 226 may only include encoded convolutional layers corresponding to encoded convolutional layers 212 . Decoding layer 227 may only include coded convolutional layers corresponding to coded convolutional layers 211 .

解碼層221可接收由卷積編碼層218輸出的編碼特徵圖，並且對編碼特徵圖進行反卷積運算以產生新的特徵圖。解碼層222可接收由解碼層221輸出的特徵圖，並且對由解碼層221輸出的特徵圖進行反卷積運算以產生新的特徵圖。基於類似的方式，解碼卷積網路220中的解碼層可接收上一解碼層所輸出的特徵圖並根據所接收的特徵圖產生新的特徵圖。在經過多個解碼層的反卷積運算後，解碼層227可對由解碼層226輸出的特徵圖進行反卷積運算以產生新的特徵圖。The decoding layer 221 may receive the encoded feature map output by the convolutional encoding layer 218 and perform a deconvolution operation on the encoded feature map to generate a new feature map. The decoding layer 222 may receive the feature map output by the decoding layer 221, and perform a deconvolution operation on the feature map output by the decoding layer 221 to generate a new feature map. In a similar manner, the decoding layer in the decoding convolutional network 220 may receive the feature map output by the previous decoding layer and generate a new feature map according to the received feature map. After deconvolution operations of multiple decoding layers, the decoding layer 227 may perform deconvolution operations on the feature maps output by the decoding layer 226 to generate new feature maps.

解碼卷積網路220中的多個解碼層可對應於不同的尺寸。換句話說，不同的解碼層所輸出的特徵圖的尺寸可不相同。舉例來說，解碼層221輸出的特徵圖的尺寸可不同於解碼層222輸出的特徵圖的尺寸。解碼卷積網路220可利用不同尺寸的多個解碼層來擷取出原始文件30在多個尺度的時間或空間中的重要特徵。The multiple decoding layers in the decoding convolutional network 220 may correspond to different sizes. In other words, the feature maps output by different decoding layers may have different sizes. For example, the size of the feature map output by the decoding layer 221 may be different from the size of the feature map output by the decoding layer 222. The decoding convolutional network 220 can utilize multiple decoding layers of different sizes to extract important features of the original document 30 in time or space at multiple scales.

在一實施例中，解碼卷積網路220中的多個解碼層可為移動反向瓶頸卷積層。以解碼層221為例，解碼層221可對由編碼卷積層218輸出的特徵圖執行移動反向瓶頸卷積以產生新的特徵圖。以解碼層222為例，解碼層222可對由解碼層221輸出的特徵圖執行移動反向瓶頸卷積以產生新的特徵圖。In one embodiment, the multiple decoding layers in the decoding convolutional network 220 may be moving inverse bottleneck convolutional layers. Taking the decoding layer 221 as an example, the decoding layer 221 may perform a moving inverse bottleneck convolution on the feature map output by the encoding convolutional layer 218 to generate a new feature map. Taking the decoding layer 222 as an example, the decoding layer 222 may perform a moving inverse bottleneck convolution on the feature map output by the decoding layer 221 to generate a new feature map.

第一模型230可為一種神經網路。例如，第一模型230可為上下文分割網路（context segmentation network）。第一模型230的密接金字塔模組231可根據解碼卷積網路220中的一或多個解碼層的輸出產生對應於原始文件30的經分割文件。圖4根據本發明的實施例繪示密接金字塔模組231產生經分割文件70的流程的示意圖。具體來說，在流程（a）中，密接金字塔模組231可取得解碼卷積網路220中距離與密接金字塔模組231較接近的一或多個解碼層（即：距離編碼卷積網路220的輸出端較接近的一或多個解碼層）所輸出的一或多個特徵圖，其中所述一或多個解碼層可包含最接近密接金字塔模組231的解碼層（即：用於產生解碼卷積網路220的輸出的解碼層227）。接著，密接金字塔模組231可分別對取得的特徵圖執行卷積運算，以產生新的特徵圖。The first model 230 may be a kind of neural network. For example, the first model 230 may be a context segmentation network. The contiguous pyramid module 231 of the first model 230 may generate a segmented file corresponding to the original file 30 from the output of one or more decoding layers in the decoding convolutional network 220 . FIG. 4 is a schematic diagram illustrating a process of generating the segmented file 70 by the close pyramid module 231 according to an embodiment of the present invention. Specifically, in the process (a), the contiguous pyramid module 231 can obtain one or more decoding layers in the decoding convolutional network 220 whose distance is closer to that of the contiguous pyramid module 231 (ie, the distance-encoding convolutional network). One or more feature maps output by one or more decoding layers that are closer to the output of 220), wherein the one or more decoding layers may include the decoding layer closest to the close pyramid module 231 (ie: for A decoding layer 227) that produces the output of the decoding convolutional network 220). Next, the close-contact pyramid module 231 may perform convolution operations on the acquired feature maps to generate new feature maps.

在本實施例中，密接金字塔模組231可分別自解碼層227、解碼層225以及解碼層224取得特徵圖53、特徵圖52以及特徵圖51，其中特徵圖53的尺寸可大於特徵圖52，並且特徵圖52的尺寸可大於特徵圖51。密接金字塔模組231可對特徵圖51、特徵圖52以及特徵圖53執行卷積運算以分別產生特徵圖54、特徵圖55以及特徵圖56，其中特徵圖56的尺寸可大於特徵圖55，並且特徵圖55的尺寸可大於特徵圖54。In this embodiment, the contiguous pyramid module 231 can obtain the feature map 53 , the feature map 52 and the feature map 51 from the decoding layer 227 , the decoding layer 225 and the decoding layer 224 respectively, wherein the size of the feature map 53 can be larger than that of the feature map 52 , And the size of the feature map 52 may be larger than that of the feature map 51 . The close pyramid module 231 may perform a convolution operation on the feature map 51, the feature map 52, and the feature map 53 to generate the feature map 54, the feature map 55, and the feature map 56, respectively, wherein the feature map 56 may be larger in size than the feature map 55, and Feature map 55 may be larger in size than feature map 54 .

為了使特徵圖的尺寸相同，在流程（b）中，密接金字塔模組231可對尺寸較小的特徵圖進行上取樣。在本實施例中，密接金字塔模組231可對特徵圖54進行上取樣以產生特徵圖57，其中特徵圖57的尺寸可與特徵圖56的尺寸相同。密接金字塔模組231可對特徵圖55進行上取樣以產生特徵圖58，其中特徵圖58的尺寸可與特徵圖56的尺寸相同。In order to make the size of the feature maps the same, in the process (b), the close pyramid module 231 may upsample the feature maps with smaller size. In this embodiment, the contact pyramid module 231 may upsample the feature map 54 to generate the feature map 57 , wherein the size of the feature map 57 may be the same as the size of the feature map 56 . The contiguous pyramid module 231 may upsample the feature map 55 to generate the feature map 58 , where the size of the feature map 58 may be the same as the size of the feature map 56 .

接著，密接金字塔模組231可將尺寸相同的各個特徵圖相加以產生新的特徵圖。密接金字塔模組231可將根據各個特徵圖所產生的特徵圖以及所述各個特徵圖連接（concat）以產生新的特徵圖。假設密接金字塔模組231欲將N+1個（N為正整數）特徵圖連接，密接金字塔模組231可依照：根據所述各個特徵圖所產生的特徵圖、對應於與第一模型230相距第一距離的解密層的特徵圖、對應於與第一模型230相距第二距離的解密層的特徵圖、…、對應於與第一模型230相距第N距離的解密層的特徵圖，其中第一距離可小於第二距離，並且第二距離可小於第N距離。在本實施例中，密接金字塔模組231可將特徵圖56、特徵圖57和特徵圖58相加以產生特徵圖59。接著，密接金字塔模組231可依序連接特徵圖59、特徵圖56、特徵圖58和特徵圖57以產生特徵圖5。Next, the close pyramid module 231 may add the respective feature maps of the same size to generate a new feature map. The close pyramid module 231 may concatenate the feature maps generated according to the respective feature maps and the respective feature maps to generate a new feature map. Assuming that the close pyramid module 231 wants to connect N+1 (N is a positive integer) feature maps, the close pyramid module 231 can follow: the feature maps generated according to the respective feature maps, corresponding to the distance from the first model 230 The feature map of the decryption layer at the first distance, the feature map of the decryption layer corresponding to the second distance from the first model 230, ..., the feature map of the decryption layer corresponding to the Nth distance from the first model 230, wherein the A distance may be less than the second distance, and the second distance may be less than the Nth distance. In this embodiment, the close-contact pyramid module 231 may add the feature map 56 , the feature map 57 , and the feature map 58 to generate the feature map 59 . Next, the abutment pyramid module 231 may sequentially connect the feature map 59 , the feature map 56 , the feature map 58 , and the feature map 57 to generate the feature map 5 .

在流程（c）中，密接金字塔模組231可將特徵圖輸入至反向殘差塊（inverted residual block，IRB）以擴增對原始文件的空間資訊的補償。密接金字塔模組231可基於不同的擴張率（dilation rate）對反向殘差塊的輸出執行空洞卷積（atrous convolution）運算或可分離卷積（separable convolution，S-CONV）運算以產生多個特徵圖。在本實施例中，密接金字塔模組231可將特徵圖5輸入至反向殘差塊，並且基於擴張率1（D=1）、擴張率2（D=2）、擴張率4（D=4）以及擴張率8（D=8）分別對反向殘差塊的輸出執行空洞卷積運算以產生4個特徵圖，分別為特徵圖61、特徵圖62、特徵圖63以及特徵圖64。亦即，特徵圖61對應於擴張率1、特徵圖62對應於擴張率2、特徵圖63對應於擴張率4並且特徵圖64對應於擴張率8。In the process (c), the close pyramid module 231 may input the feature map to an inverted residual block (IRB) to amplify the compensation for the spatial information of the original document. The close-connected pyramid module 231 may perform an atrous convolution (atrous convolution) operation or a separable convolution (S-CONV) operation on the output of the reverse residual block based on different dilation rates to generate multiple feature map. In this embodiment, the close pyramid module 231 can input the feature map 5 to the inverse residual block, and based on the dilation rate 1 (D=1), the dilation rate 2 (D=2), the dilation rate 4 (D= 4) and a dilation rate of 8 (D=8), respectively, perform a dilated convolution operation on the output of the reverse residual block to generate 4 feature maps, which are feature map 61, feature map 62, feature map 63, and feature map 64. That is, feature map 61 corresponds to expansion rate 1, feature map 62 corresponds to expansion rate 2, feature map 63 corresponds to expansion rate 4, and feature map 64 corresponds to expansion rate 8.

在流程（d）中，密接金字塔模組231可將尺寸相同的各個特徵圖相加以產生新的特徵圖。密接金字塔模組231可將各個特徵圖以及根據所述各個特徵圖所產生的特徵圖連接以產生新的特徵圖。在本實施例中，密接金字塔模組231可將特徵圖61、特徵圖62、特徵圖63和特徵圖64相加以產生特徵圖65。接著，密接金字塔模組231可依序連接特徵圖61、特徵圖62、特徵圖63、特徵圖64以及特徵圖65以產生特徵圖6。密接金字塔模組231可對特徵圖6進行卷積運算以產生經分割文件70。處理器110可通過收發器130輸出經分割文件70。In the process (d), the close pyramid module 231 may add the feature maps with the same size to generate a new feature map. The close pyramid module 231 can connect the respective feature maps and the feature maps generated according to the respective feature maps to generate a new feature map. In this embodiment, the close pyramid module 231 may add the feature map 61 , the feature map 62 , the feature map 63 and the feature map 64 to generate the feature map 65 . Next, the close pyramid module 231 may sequentially connect the feature map 61 , the feature map 62 , the feature map 63 , the feature map 64 , and the feature map 65 to generate the feature map 6 . The contiguous pyramid module 231 may perform a convolution operation on the feature map 6 to generate the segmented file 70 . The processor 110 may output the segmented file 70 through the transceiver 130 .

在一實施例中，第一模型230可進一步地將密接金字塔模組231所輸出的經分割文件70輸入至擠壓和激勵網路232以強化經分割文件70的特徵。擠壓和激勵網路232可根據經分割文件70產生經分割文件40。處理器110可通過收發器130輸出經分割文件40。In one embodiment, the first model 230 may further input the segmented document 70 output by the close pyramid module 231 to the extrusion and excitation network 232 to enhance the features of the segmented document 70 . Squeeze and excite network 232 may generate segmented file 40 from segmented file 70 . The processor 110 may output the segmented file 40 through the transceiver 130 .

第二模型240可為一種神經網路。例如，第二模型240可為邊緣監督網路（edge supervision network）。第二模型240可根據解碼卷積網路220中的一或多個解碼層的輸出產生對應於原始文件30的經分割文件。圖5根據本發明的實施例繪示第二模型240產生經分割文件50的流程的示意圖。具體來說，在流程（A）中，第二模型240可取得解碼卷積網路220中距離與第二模型240較接近的一或多個解碼層（即：距離編碼卷積網路220的輸出端較接近的一或多個解碼層）所輸出的一或多個特徵圖，其中所述一或多個解碼層可包含最接近第二模型240的解碼層（即：用於產生解碼卷積網路220的輸出的解碼層227）。接著，第二模型240可分別對取得的特徵圖執行卷積運算，以產生新的特徵圖。The second model 240 may be a neural network. For example, the second model 240 may be an edge supervision network. The second model 240 may generate a segmented file corresponding to the original file 30 from the output of one or more decoding layers in the decoding convolutional network 220 . FIG. 5 is a schematic diagram illustrating a process of generating the segmented file 50 by the second model 240 according to an embodiment of the present invention. Specifically, in the process (A), the second model 240 can obtain one or more decoding layers in the decoding convolutional network 220 whose distances are closer to the second model 240 (ie: the distance coding convolutional network 220 ) one or more feature maps output by the one or more decoding layers that are closer to the output end), where the one or more decoding layers may include the decoding layer closest to the second model 240 (ie: used to generate the decoded volume) Decoding layer 227) of the output of product network 220). Next, the second model 240 may perform a convolution operation on the acquired feature maps to generate new feature maps.

在本實施例中，第二模型240可分別自解碼層227、解碼層225以及解碼層224取得特徵圖83、特徵圖82以及特徵圖81，其中特徵圖83的尺寸可大於特徵圖82，並且特徵圖82的尺寸可大於特徵圖81。在一實施例中，特徵圖81、特徵圖82和特徵圖83可分別與特徵圖51、特徵圖52和特徵圖53相同。第二模型240可對特徵圖51、特徵圖52以及特徵圖53執行卷積運算以分別產生特徵圖84、特徵圖85以及特徵圖86，其中特徵圖86的尺寸可大於特徵圖85，並且特徵圖85的尺寸可大於特徵圖84。In this embodiment, the second model 240 can obtain the feature map 83 , the feature map 82 , and the feature map 81 from the decoding layer 227 , the decoding layer 225 , and the decoding layer 224 , respectively, wherein the size of the feature map 83 can be larger than that of the feature map 82 , and The size of the feature map 82 may be larger than that of the feature map 81 . In one embodiment, feature map 81, feature map 82, and feature map 83 may be the same as feature map 51, feature map 52, and feature map 53, respectively. The second model 240 may perform a convolution operation on the feature map 51 , the feature map 52 , and the feature map 53 to generate the feature map 84 , the feature map 85 , and the feature map 86 , respectively, wherein the feature map 86 may be larger in size than the feature map 85 , and the features Figure 85 may be larger in size than feature map 84 .

為了使特徵圖的尺寸相同，在流程（B）中，第二模型240可對尺寸較小的特徵圖進行上取樣。在本實施例中，第二模型240可對特徵圖58進行上取樣以產生特徵圖87，其中特徵圖87的尺寸可與特徵圖86的尺寸相同。第二模型240可對特徵圖85進行上取樣以產生特徵圖88，其中特徵圖88的尺寸可與特徵圖86的尺寸相同。In order to make the size of the feature maps the same, in the process (B), the second model 240 may upsample the feature maps with smaller sizes. In this embodiment, the second model 240 may upsample the feature map 58 to generate the feature map 87 , where the feature map 87 may be the same size as the feature map 86 . The second model 240 may upsample the feature map 85 to generate the feature map 88 , where the feature map 88 may be the same size as the feature map 86 .

接著，第二模型240可連接尺寸相同的各個特徵圖以產生新的特徵圖。假設第二模型240欲將M個（M為正整數）特徵圖連接，第二模型240可依照：對應於與第二模型240相距第一距離的解密層的特徵圖、對應於與第二模型240相距第二距離的解密層的特徵圖、…、對應於與第二模型240相距第M距離的解密層的特徵圖的順序來連接所述M個特徵圖，其中第一距離可大於第二距離，並且第二距離可大於第M距離。在本實施例中，第二模型240可依序連接特徵圖87、特徵圖88和特徵圖86以產生特徵圖8。Next, the second model 240 may concatenate the respective feature maps of the same size to generate a new feature map. Assuming that the second model 240 wants to connect M (M is a positive integer) feature maps, the second model 240 can follow: a feature map corresponding to a decryption layer at a first distance from the second model 240, a feature map corresponding to the second model 240 240 the feature maps of the decryption layer at a second distance, ..., corresponding to the order of the feature maps of the decryption layer at the Mth distance from the second model 240 to connect the M feature maps, where the first distance may be greater than the second distance, and the second distance may be greater than the Mth distance. In this embodiment, the second model 240 can connect the feature map 87 , the feature map 88 and the feature map 86 in sequence to generate the feature map 8 .

在流程（C）中，第二模型240可對特徵圖8執行卷積運算以產生特徵圖50。處理器110可通過收發器130輸出特徵圖50。In process (C), the second model 240 may perform a convolution operation on the feature map 8 to generate the feature map 50 . The processor 110 may output the feature map 50 through the transceiver 130 .

神經網路模型200的損失函數L如以下公式所示，其中L1為第一模型230的損失函數，L2為第二模型240的損失函數，n為訓練資料的數量，m為分類的數量

為對應於第i筆訓練資料和第j個分類的預測結果，並且

為對應於第i筆訓練資料和第j個分類的真值（ground-truth）。處理器110可根據損失函數L來訓練神經網路模型200以調整編碼卷積網路210、解碼卷積網路220、第一模型230及/或第二模型240的超參數，藉以最佳化神經網路模型200的效能。

The loss function L of the neural network model 200 is shown in the following formula, where L1 is the loss function of the first model 230, L2 is the loss function of the second model 240, n is the number of training data, and m is the number of classifications

is the prediction result corresponding to the ith training data and the jth classification, and

is the ground-truth corresponding to the ith training data and the jth classification. The processor 110 can train the neural network model 200 according to the loss function L to adjust the hyperparameters of the encoding convolutional network 210, the decoding convolutional network 220, the first model 230 and/or the second model 240 for optimization Performance of Neural Network Model 200.

圖6根據本發明的實施例繪示一種用於文件分割的方法的流程圖，其中所述方法可由如圖1所示的電子裝置100實施。在步驟S601中，取得原始文件和神經網路模型，其中神經網路模型包含第一模型，其中第一模型經配置以執行：取得對應於原始文件的第一尺寸的第一特徵圖和第二尺寸的第二特徵圖，其中第一尺寸大於第二尺寸；對第二特徵圖執行第一上取樣以產生第三尺寸的第三特徵圖，其中第三尺寸等於第一尺寸；連接第一特徵圖和第三特徵圖以產生第四特徵圖；將第四特徵圖輸入至第一反向殘差塊以產生第一輸出，並且對第一輸出執行基於第一擴張率的第一空洞卷積運算以產生第五特徵圖；將第四特徵圖輸入至第二反向殘差塊以產生第二輸出，並且對第二輸出執行基於第二擴張率的第二空洞卷積運算以產生第六特徵圖，其中第二擴張率不同於第一擴張率；連接第五特徵圖和第六特徵圖以產生第七特徵圖；以及對第七特徵圖執行第一卷積運算以產生經分割文件。在步驟S603中，輸出經分割文件。6 illustrates a flowchart of a method for file segmentation according to an embodiment of the present invention, wherein the method may be implemented by the electronic device 100 shown in FIG. 1 . In step S601, an original file and a neural network model are obtained, wherein the neural network model includes a first model, wherein the first model is configured to perform: obtaining a first feature map and a second feature map corresponding to a first size of the original file a second feature map of size, where the first size is greater than the second size; performing a first upsampling on the second feature map to generate a third feature map of a third size, where the third size is equal to the first size; connecting the first features map and a third feature map to generate a fourth feature map; input the fourth feature map to a first inverse residual block to generate a first output, and perform a first dilation rate-based first dilated convolution on the first output operate to generate a fifth feature map; input the fourth feature map to a second inverse residual block to generate a second output, and perform a second dilated convolution operation based on a second dilation rate on the second output to generate a sixth a feature map, wherein the second dilation rate is different from the first dilation rate; concatenating the fifth feature map and the sixth feature map to generate a seventh feature map; and performing a first convolution operation on the seventh feature map to generate a segmented file. In step S603, the divided file is output.

綜上所述，本發明的神經網路模型可通過編碼卷積網路和解碼卷積網路擷取原始文件的特徵以產生多個特徵圖。第一模型可連接多個特徵圖以產生包含了原始文件在多個尺度的時間或空間中的重要特徵的特徵圖。第一模型還可通過反向殘差塊和空洞卷積運算增加特徵圖的通道數量，藉以補償原始文件的空間資訊。另一方面，本發明可根據第一模型和第二模型的損失函數來訓練神經網路模型中的超參數，以使訓練好的神經網路模型具有較佳的效能。本發明提出的神經網路模型的架構可在使用到較少運算資源的情況下產生較準確的文件分割結果。To sum up, the neural network model of the present invention can extract the features of the original document through the encoding convolutional network and the decoding convolutional network to generate multiple feature maps. The first model can concatenate multiple feature maps to generate feature maps that contain important features of the original document at multiple scales in time or space. The first model can also increase the number of channels of the feature map through reverse residual block and dilated convolution operations, thereby compensating for the spatial information of the original file. On the other hand, the present invention can train hyperparameters in the neural network model according to the loss functions of the first model and the second model, so that the trained neural network model has better performance. The architecture of the neural network model proposed by the present invention can generate more accurate file segmentation results under the condition of using less computing resources.

100:電子裝置 110:處理器 120:儲存媒體 200:神經網路模型 210:編碼卷積網路 220:解碼卷積網路 230:第一模型 231:密接金字塔模組 232:擠壓和激勵網路 240:第二模型 130:收發器 30:原始文件 211、212、213、214、215、216、217、218:編碼卷積層 221、222、223、224、225、226、227:解碼層 40、50、70:經分割文件 5、51、52、53、54、55、56、57、58、59、6、61、62、63、64、65、8、81、82、83、84、85、86、87、88:特徵圖 S601、S603:步驟100: Electronics 110: Processor 120: Storage Media 200: Neural Network Models 210: Encoding Convolutional Networks 220: Decoding Convolutional Networks 230: First Model 231: Close Pyramid Module 232: Squeeze and Incentivize the Web 240: Second Model 130: Transceiver 30: Original file 211, 212, 213, 214, 215, 216, 217, 218: encoding convolutional layers 221, 222, 223, 224, 225, 226, 227: decoding layer 40, 50, 70: Split file 5, 51, 52, 53, 54, 55, 56, 57, 58, 59, 6, 61, 62, 63, 64, 65, 8, 81, 82, 83, 84, 85, 86, 87, 88: Feature map S601, S603: steps

圖1根據本發明的實施例繪示一種用於文件分割的電子裝置的示意圖。圖2根據本發明的實施例繪示神經網路模型的示意圖。圖3根據本發明的實施例繪示原始文件和經處理的文件的示意圖。圖4根據本發明的實施例繪示密接金字塔模組產生經分割文件的流程的示意圖。圖5根據本發明的實施例繪示第二模型產生經分割文件的流程的示意圖。圖6根據本發明的實施例繪示一種用於文件分割的方法的流程圖。FIG. 1 is a schematic diagram illustrating an electronic device for file segmentation according to an embodiment of the present invention. FIG. 2 is a schematic diagram illustrating a neural network model according to an embodiment of the present invention. 3 is a schematic diagram illustrating an original document and a processed document according to an embodiment of the present invention. FIG. 4 is a schematic diagram illustrating a process of generating a segmented file by a dense pyramid module according to an embodiment of the present invention. FIG. 5 is a schematic diagram illustrating a process of generating a segmented file by the second model according to an embodiment of the present invention. FIG. 6 is a flowchart illustrating a method for file segmentation according to an embodiment of the present invention.

S601、S603:步驟S601, S603: steps

Claims

An electronic device for file segmentation, comprising: Transceiver, receiving the original file; storage media for storing neural network models; and a processor, coupled to the storage medium and the transceiver, and to access and execute the neural network model, wherein the neural network model includes a first model, wherein the first model is configured to execute: obtaining a first feature map corresponding to a first size of the original document and a second feature map of a second size, wherein the first size is larger than the second size; performing a first upsampling on the second feature map to generate a third feature map of a third size, wherein the third size is equal to the first size; connecting the first feature map and the third feature map to generate a fourth feature map; inputting the fourth feature map to a first inverse residual block to generate a first output, and performing a first dilation rate-based first dilated convolution operation on the first output to generate a fifth feature map; The fourth feature map is input to a second inverse residual block to generate a second output, and a second dilated convolution operation based on a second dilation rate is performed on the second output to generate a sixth feature map, wherein the second expansion rate is different from the first expansion rate; connecting the fifth feature map and the sixth feature map to produce a seventh feature map; and performing a first convolution operation on the seventh feature map to produce a segmented file, wherein The processor outputs the segmented file through the transceiver.

The electronic device of claim 1, wherein the neural network model further comprises a second model, wherein the second model is configured to perform: performing a second upsampling on the second feature map to generate an eighth feature map of a fourth size, wherein the fourth size is equal to the first size; connecting the first feature map and the eighth feature map to generate a ninth feature map; and A second convolution operation is performed on the ninth feature map to generate an output feature map.

The electronic device of claim 2, wherein the first model corresponds to a first loss function, wherein the second model corresponds to a second loss function, wherein the processor combines the first loss function and the adding the second loss functions to generate a third loss function, wherein the processor trains the first model and the second model according to the third loss function.

The electronic device of claim 1, wherein the neural network model further comprises a coding convolutional network, wherein the coding convolutional network comprises a first coding convolutional layer and a second coding convolutional layer, wherein the coding Convolutional Networks are configured to perform: generating a first encoded feature map from the original file and the first encoded convolutional layer; and A second encoded feature map is generated from the first encoded feature map and the second encoded convolutional layer.

The electronic device of claim 4, wherein the neural network model further comprises a decoding convolutional network, wherein the decoding convolutional network comprises a first decoding layer and a second decoding layer, wherein the first decoding The layers include the second encoded convolutional layer and a decoded convolutional layer corresponding to the second encoded convolutional layer, wherein the decoded convolutional network is configured to perform: generating the second feature map from the second encoding feature map and the first decoding layer; and The first feature map is generated from the second feature map and the second decoding layer.

The electronic device of claim 1, wherein the first model is further configured to perform: adding the first feature map and the third feature map to generate a tenth feature map; and The tenth feature map, the first feature map, and the third feature map are concatenated to generate the fourth feature map.

The electronic device of claim 1, wherein the first model is further configured to perform: adding the fifth feature map and the sixth feature map to generate an eleventh feature map; and The fifth feature map, the sixth feature map, and the eleventh feature map are concatenated to generate the seventh feature map.

The electronic device of claim 1, wherein the first model is further configured to perform: performing the first convolution operation on the seventh feature map to generate a twelfth feature map; and The twelfth feature map is input to a squeeze and excitation network to generate the segmented file.

The electronic device of claim 4, wherein the first encoding convolutional layer performs a moving reverse bottleneck convolution on the original document to generate the first encoding feature map.

A method for file segmentation, comprising: Obtain the original document and a neural network model, wherein the neural network model includes a first model, wherein the first model is configured to perform: obtaining a first feature map corresponding to a first size of the original document and a second feature map of a second size, wherein the first size is larger than the second size; performing a first upsampling on the second feature map to generate a third feature map of a third size, wherein the third size is equal to the first size; connecting the first feature map and the third feature map to generate a fourth feature map; inputting the fourth feature map to a first inverse residual block to generate a first output, and performing a first dilation rate-based first dilated convolution operation on the first output to generate a fifth feature map; The fourth feature map is input to a second inverse residual block to generate a second output, and a second dilated convolution operation based on a second dilation rate is performed on the second output to generate a sixth feature map, wherein the second expansion rate is different from the first expansion rate; connecting the fifth feature map and the sixth feature map to produce a seventh feature map; and performing a first convolution operation on the seventh feature map to generate a segmented file; and The split file is output.