TWI766457B - Analysis system and upload method for language - Google Patents
Analysis system and upload method for language Download PDFInfo
- Publication number
- TWI766457B TWI766457B TW109141776A TW109141776A TWI766457B TW I766457 B TWI766457 B TW I766457B TW 109141776 A TW109141776 A TW 109141776A TW 109141776 A TW109141776 A TW 109141776A TW I766457 B TWI766457 B TW I766457B
- Authority
- TW
- Taiwan
- Prior art keywords
- corpus
- unit
- analysis
- module
- transcription
- Prior art date
Links
Images
Landscapes
- Machine Translation (AREA)
Abstract
Description
本發明係有關一種分析系統,尤指一種將上傳、轉錄及分析功能整合於一之用於語言分析系統。 The present invention relates to an analysis system, in particular to a language analysis system integrating uploading, transcription and analysis functions.
兒童語言發展是認知能力發展中一項重要指標,會影響兒童學習和思考表現,根據國內研究調查發現,語言障礙在低年齡時,發生率越高,且語言障礙是特殊教育對象中出現率最高的一類,若兒童語言發展問題能在早期發現並治療,可降低語言障礙的發生率以避免因語言障礙問題而導致的社會問題與情緒困擾。 Children's language development is an important indicator in the development of cognitive ability, which will affect children's learning and thinking performance. According to domestic research, it is found that the incidence of language disorders is higher at a young age, and language disorders have the highest incidence among special education objects. If children's language development problems can be detected and treated at an early stage, the incidence of language barriers can be reduced to avoid social problems and emotional distress caused by language barriers.
兒童語言發展可藉由錄音或錄影記錄兒童平時日常語言使用情形,再經由語言治療師將兒童說出的話語轉錄成文字檔,最後透過電腦系統進行語言樣本分析,計算出兒童語言發展相關指標,讓語言治療師進行評估,找出兒童語言發展現況。 Children's language development can be recorded by audio or video recording of children's daily language use, and then the speech uttered by the child is transcribed into text files by a speech therapist. Finally, language samples are analyzed through a computer system to calculate the indicators related to children's language development. Have a speech therapist do an assessment to find out where the child's language development is.
這幾年來,現有支援華語兒童語言分析系統提供了以下功能,包括制定了一套語言轉錄標準格式;提供分析程式進行語言發展指標分析;提供網頁分享服務等功能;然而現有的系統具有以下問題,包括現有系統尚未有完善的整合,在使用上沒有一致的介面,且在制定的轉錄格式有複雜嚴謹的規 則,致使在初次使用時,需花時間學習,另外,在市面上仍有許多不同格式的語言分析系統,在轉換時需花費時間成本,最後,系統之間因中文字詞間相連的特性,在系統使用時需額外進行斷詞的動作等問題存在。 Over the past few years, the existing language analysis system for Chinese-speaking children has provided the following functions, including formulating a set of standard language transcription formats; providing analysis programs for language development index analysis; providing web page sharing services and other functions; however, the existing systems have the following problems: Including that the existing system has not been fully integrated, there is no consistent interface in use, and there are complex and rigorous regulations in the developed transcription format. Therefore, it takes time to learn when it is used for the first time. In addition, there are still many language analysis systems in different formats on the market, and it takes time and cost to convert. There are problems such as extra word segmentation actions when the system is used.
因此在現有系統對於兒童語言治療師進行華語的語言樣本分析上,仍存在者許多不便之處,如何改善並簡化中間的步驟是個重要的議題。 Therefore, there are still many inconveniences in the analysis of Chinese language samples for child language therapists in the existing system. How to improve and simplify the intermediate steps is an important issue.
針對上述之缺失,本發明之主要目的在於提供一種用於語言之分析系統及上傳分析方法,將SaaS(Service as a service)服務導入至兒童語言樣本分析中,利用網頁技術,將分析三個步驟蒐集、轉錄、分析所需之功能整合為一,實現單一窗口介面便能完成兒童語言樣本分析系統,節省研究人員進行兒童語言樣本分析所需時間與複雜度。 In view of the above deficiencies, the main purpose of the present invention is to provide a language analysis system and upload analysis method, import SaaS (Service as a service) service into the analysis of children's language samples, and use web technology to analyze three steps. The functions required for collection, transcription, and analysis are integrated into one, and a single-window interface can be implemented to complete the child language sample analysis system, saving researchers the time and complexity required for child language sample analysis.
為達成上述之目的,本發明係主要提供一種用於語言之分析系統,係包括一登入模組,該登入模組係用於使用者登入之用;一蒐集模組,係電性連接該登入模組,該蒐集模組係用以蒐集及管理各種不同研究之語料,該蒐集模組更包括一語料單元,係作為語料管理之用,該語料單元經由行列顯示各種已經上傳之語料資料,並透過網頁化來進行管理操作,該管理操作包括修改、分析及刪除程序;一語料紀錄單元,係電性連接該語料單元,用以儲存上傳之語料資料;一語料新增單元,係電性連接該語料單元,該語料新增單元係用於啟動欲上傳新語料之程序;一語料共享單元,係電性連接該語料單元,該語料共享單元用於單一使用者將所上傳之語料資訊進行分享操作;一轉錄模組,係電性連接該蒐集模組,該轉錄模組係用於將上傳之語料影音資料進行轉檔之用,該轉錄模組更包括一上傳單元,於該語料新增單元啟動後,自動連接 該上傳單元,將欲上傳之影音資料利用拖曳或開啟檔案方式之任一種方式進行上傳,於同一頁面中也同時顯示欲輸入該收案檔案之相關資訊欄位;一轉錄單元,係電性連接該上傳單元,該轉錄單元係用於輔助所上傳之語意資料轉錄之用;一分析模組,係電性連接該轉錄模組,該分析模組係用以分析轉錄後之語料檔案,該分析模組更包括:一語料分析單元,該語料分析單元係用以分析該語音檔案,該語料分析單元中具有斷字斷詞之功能,以自動化之方式進行語言樣本分析;及一匯出單元,係電性連接該語料分析單元,該匯出單元係用以接收來自該語料分析單元之分析資料並進行輸出。 In order to achieve the above-mentioned purpose, the present invention mainly provides a language analysis system, which includes a login module, which is used for user login; and a collection module, which is electrically connected to the login Module, the collection module is used to collect and manage the corpus of various studies. The collection module further includes a corpus unit, which is used for corpus management. The corpus unit displays various uploaded data through the rows. The corpus data is managed through web pages, and the management operations include modification, analysis and deletion procedures; a corpus recording unit is electrically connected to the corpus unit to store the uploaded corpus data; a language A corpus adding unit is electrically connected to the corpus unit, and the corpus adding unit is used to start a program for uploading new corpus; a corpus sharing unit is electrically connected to the corpus unit, and the corpus is shared The unit is used for a single user to share the uploaded corpus information; a transcription module is electrically connected to the collection module, and the transcription module is used for transcoding the uploaded corpus audio and video data. , the transcription module further includes an uploading unit, which is automatically connected after the new unit of the corpus is activated The uploading unit uploads the video and audio data to be uploaded by dragging or opening the file, and simultaneously displays the relevant information fields of the received file on the same page; a transcription unit is electrically connected the uploading unit, the transcription unit is used for assisting the transcription of the uploaded semantic data; an analysis module is electrically connected to the transcription module, the analysis module is used for analyzing the transcribed corpus file, the The analysis module further includes: a corpus analysis unit, the corpus analysis unit is used for analyzing the voice file, the corpus analysis unit has the function of hyphenation and word segmentation, and analyzes the language samples in an automated manner; and a The exporting unit is electrically connected to the corpus analysis unit, and the exporting unit is used for receiving and outputting the analysis data from the corpus analyzing unit.
本發明之另一目的在於提供一種用於語言之上傳分析方法,係包括選擇上傳資料(S1);根據上傳資料類型進行轉錄(S2);該上傳資料類型為音訊或影像之任一種,則會提供一轉錄介面進行轉錄(S3);該上傳檔案為指定文件格式,則進行解析文件格式並轉錄(S4);完成上傳轉錄之後,對轉錄後之檔案進行斷詞斷句及分析(S5);輸出該分析結果(S6)。 Another object of the present invention is to provide an upload analysis method for language, which includes selecting upload data (S1); transcribing according to the upload data type (S2); if the upload data type is either audio or video, then A transcription interface is provided for transcription (S3); if the uploaded file is in a specified file format, the file format is parsed and transcribed (S4); after uploading and transcription is completed, word segmentation and analysis are performed on the transcribed file (S5); output The analysis result (S6).
為讓本發明之上述和其他目的、特徵和優點能更明顯易懂,下文特舉較佳實施例,並配合所附圖式,作詳細說明如下。 In order to make the above-mentioned and other objects, features and advantages of the present invention more clearly understood, preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.
1:登入模組 1: Login to the module
11:會員註冊單元 11: Member registration unit
12:會員登入單元 12: Member login unit
2:蒐集模組 2: Collect modules
21:語料單元 21: Corpus Unit
22:語料紀錄單元 22: Corpus Recording Unit
23:語料新增單元 23: New unit of corpus
24:語料共享單元 24: Corpus Sharing Unit
3:轉錄模組 3: Transcription module
31:上傳單元 31: Upload unit
32:轉錄單元 32: Transcription unit
4:分析模組 4: Analysis module
41:語料分析單元 41: Corpus Analysis Unit
42:匯出單元 42: Export unit
第一圖、係為本發明之系統方塊圖。 The first figure is a system block diagram of the present invention.
第二圖、係為本發明之蒐集模組方塊圖。 The second figure is a block diagram of the collection module of the present invention.
第三圖、係為本發明之轉錄模組方塊圖。 The third figure is a block diagram of the transcription module of the present invention.
第四圖、係為本發明之分析模組方塊圖。 The fourth figure is a block diagram of the analysis module of the present invention.
第五圖、係為本發明之上傳分析方法流程圖。 The fifth figure is a flow chart of the upload analysis method of the present invention.
請參閱第一圖,係為本發明之系統方塊圖。本發明之用於語言之分析系統係主要包括一登入模組1、一蒐集模組2、一轉錄模組3及一分析模組4,其中該登入模組1係用於使用者登入之用,該登入模組1更包括一會員註冊單元11及一會員登入單元12,該會員註冊單元11係提供給第一次使用之使用者輸入資料並註冊會員之用,並將該做為會員之資料存入本系統內,該會員登入單元12則用於登入系統之用,以比對是否註冊過會員。
Please refer to the first figure, which is a system block diagram of the present invention. The language analysis system of the present invention mainly includes a
續參閱第一圖。該登入模組1係電性連接該蒐集模組2,配合參閱第二圖之蒐集模組方塊圖,該蒐集模組2係用以蒐集及管理各種不同研究之語料,其中該蒐集模組2更包括一語料單元21,該語料單元21係作為語料管理之用,該語料單元21經由行列顯示各種已經上傳之語料資料,並透過網頁化來進行管理操作,該管理操作包括修改、分析及刪除程序;該語料單元21電性連接一語料紀錄單元22,該語料紀錄單元22係為一資料庫,用以儲存上傳之語料資料;又,該語料單元21電性連接一語料新增單元23及一語料共享單元24,該語料新增單元23係用於啟動欲上傳新語料之程序,於本實施例中,該語料新增單元23經由網頁化之頁面進行操作,而該語料共享單元24則用於單一使用者將所上傳之語料資訊進行分享操作。
Continue to refer to the first figure. The
續參閱第一圖。該蒐集模組2係電性連接該轉錄模組3,該轉錄模組3係用於將上傳之語料影音資料進行轉檔之用,該轉錄模組3更包括一上傳單元31,如第三圖之轉錄模組方塊圖所示,於該語料新增單元23啟動後,系統自動連接該上傳單元31,透過網頁化之操作,將欲上傳之影音資料利用拖曳或開啟檔案方式之任一種方式進行上傳,於同一頁面中也同時顯示欲輸入該收案檔
案之相關資訊欄位,於本實施例中該些欄位包含名稱、歲數、性別、收錄地點、情境介紹、參與人員和收案日期等資料,該些資訊欄位同時具有針對不同之語言樣本分析所需之資訊進行設計,包括增加發話者、語句開始及結束時間、語句內容和語句註解等網頁互動功能;另,該上傳單元31係電性連接一轉錄單元32,該轉錄單元32係用於輔助所上傳之語意資料轉錄之用,於本實施例中,該轉錄單元32係經由語音辨識模型進行轉錄輔助,如Google語音辨識模型,當上傳單元31完成影音上傳後,利用活性語音檢測判斷影片發話之區段,將發話區段切割,再透過該語音辨識模型進行語音辨識;此外,該轉錄模組3為本發明於轉錄後之語料檔案能與傳統語言樣本分析工具接軌,該轉錄模組3具有支援將轉錄完成語句匯出成符合該工具規範或格式之功能。
Continue to refer to the first figure. The collection module 2 is electrically connected to the
請參閱第一圖。該轉錄模組3係電性連接該分析模組4,該分析模組4係用以分析轉錄後之語料檔案,該分析模組4更包括一語料分析單元41,如第四圖之分析模組方塊圖所示,該語料分析單元41係用以分析該語音檔案與轉錄後之檔案,該語料分析單元41中具有斷字斷詞之功能,以自動化之方式進行語言樣本分析,該分析內容包括將該檔案內容之其他檔案之語句分別計算其語言發展之指標,該些指標包括詞類比例、平均語句長度等分析指標,該分析指標之結果則可透過視覺化方式於網頁呈現,增加指標之易讀性,並結合文字探勘技術,展示不同面向資訊;該語料分析單元41係電性連接一匯出單元42,該匯出單元42係用以接收來自該語料分析單元41之分析資料,並以CHILDES和CLSA兩種格式之任一種作為匯出格式。
See the first image. The
請參閱第五圖,係為本發明之上傳分析方法流程圖。本發明之上傳分析方法係包括選擇上傳資料(S1),可被上傳之資料類型包括指定文件格式、音訊或影像之任一種;根據上傳資料類型進行轉錄(S2),其中該上傳資料類型為音訊或影像之任一種,則會提供一轉錄介面進行轉錄(S3),該轉錄介面 係對影音檔案進行欄位資料輸入,該些欄位資料包括發話者、語句開始及結束時間、語句內容和語句註解等,以及影音檔案之個人資料輸入,包括名稱、歲數、性別、收錄地點、情境介紹、參與人員和收案日期等資料,若為該上傳檔案為指定文件格式,則進行解析文件格式並轉錄(S4),包括將文件中所列角色、語句及時間等資訊萃取出來,並進行轉錄;完成上傳轉錄之後,對轉錄後之檔案進行斷詞斷句及分析(S5),該分析內容包括將該檔案內容之其他檔案之語句分別計算其語言發展之指標,該些指標包括詞類比例、平均語句長度等分析指標;最後,輸出該分析結果(S6),該分析結果可利用指定之CHILDES和CLSA兩種格式之任一種作為匯出格式,同時利用視覺化方式於網頁呈現。 Please refer to Fig. 5, which is a flowchart of the upload analysis method of the present invention. The uploading analysis method of the present invention includes selecting uploading data (S1), and the type of data that can be uploaded includes any one of a specified file format, audio or video; transcribing according to the uploading data type (S2), wherein the uploading data type is audio or any of the images, a transcription interface will be provided for transcription (S3), the transcription interface It is the input of field data for the audio and video files, the field data includes the speaker, the start and end time of the sentence, the content of the sentence and the comment of the sentence, etc., as well as the personal data input of the audio and video files, including name, age, gender, recording location, If the uploaded file is in the specified file format, the file format is parsed and transcribed (S4), including extracting the roles, sentences, time and other information listed in the file, and Transcribe; after uploading and transcribing, perform word segmentation and analysis on the transcribed file (S5), the analysis content includes calculating the language development indicators of the sentences in other files of the file content, and these indicators include the proportion of parts of speech , average sentence length and other analysis indicators; finally, output the analysis result (S6), the analysis result can use any one of the two specified CHILDES and CLSA formats as the export format, and at the same time use the visual way to present on the web page.
惟以上所述之實施方式,是為較佳之實施實例,當不能以此限定本發明實施範圍,若依本發明申請專利範圍及說明書內容所作之等效變化或修飾,皆應屬本發明下述之專利涵蓋範圍。 However, the above-mentioned embodiments are preferred implementation examples, which should not limit the scope of the present invention. Any equivalent changes or modifications made according to the scope of the patent application of the present invention and the contents of the description shall belong to the following aspects of the present invention. the scope of patent coverage.
1:登入模組 1: Login to the module
11:會員註冊單元 11: Member registration unit
12:會員登入單元 12: Member login unit
2:蒐集模組 2: Collect modules
3:轉錄模組 3: Transcription module
4:分析模組 4: Analysis module
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109141776A TWI766457B (en) | 2020-11-27 | 2020-11-27 | Analysis system and upload method for language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109141776A TWI766457B (en) | 2020-11-27 | 2020-11-27 | Analysis system and upload method for language |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202221697A TW202221697A (en) | 2022-06-01 |
TWI766457B true TWI766457B (en) | 2022-06-01 |
Family
ID=83062234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109141776A TWI766457B (en) | 2020-11-27 | 2020-11-27 | Analysis system and upload method for language |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI766457B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW200922518A (en) * | 2007-11-22 | 2009-06-01 | hui-zhen Wei | System and method for examination on development of emotion and intellectual development of infant baby |
US9047872B1 (en) * | 2007-02-12 | 2015-06-02 | West Corporation | Automatic speech recognition tuning management |
CN109819127A (en) * | 2019-03-08 | 2019-05-28 | 周诚 | The management method and system of harassing call |
TW202018529A (en) * | 2018-11-08 | 2020-05-16 | 中華電信股份有限公司 | System for inquiry service and method thereof |
-
2020
- 2020-11-27 TW TW109141776A patent/TWI766457B/en active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9047872B1 (en) * | 2007-02-12 | 2015-06-02 | West Corporation | Automatic speech recognition tuning management |
TW200922518A (en) * | 2007-11-22 | 2009-06-01 | hui-zhen Wei | System and method for examination on development of emotion and intellectual development of infant baby |
TW202018529A (en) * | 2018-11-08 | 2020-05-16 | 中華電信股份有限公司 | System for inquiry service and method thereof |
CN109819127A (en) * | 2019-03-08 | 2019-05-28 | 周诚 | The management method and system of harassing call |
Also Published As
Publication number | Publication date |
---|---|
TW202221697A (en) | 2022-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11551567B2 (en) | System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter | |
Meurers et al. | Evidence and interpretation in language learning research: Opportunities for collaboration with computational linguistics | |
Miller et al. | Using language sample analysis to assess spoken language production in adolescents | |
US20180366013A1 (en) | System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter | |
Niemants | The transcription of interpreting data | |
CN111259631B (en) | Referee document structuring method and referee document structuring device | |
TW201312548A (en) | Automatically creating a mapping between text data and audio data | |
Göpferich | Data documentation and data accessibility in translation process research | |
Nagy et al. | 12 Transcription | |
Campillos Llanos | A Spanish learner oral corpus for computer-aided error analysis | |
Rose et al. | Using PhonBank and Phon in studies of phonological development and disorders | |
Wehrmeyer | A corpus for signed language<? br?> interpreting research | |
Bell et al. | Designing learner corpora: Collection, transcription, and annotation | |
Lin et al. | Corpus linguistics | |
JP2016085284A (en) | Program, apparatus and method for estimating evaluation level with respect to learning item on the basis of person's remark | |
CN110008314B (en) | Intention analysis method and device | |
TWI766457B (en) | Analysis system and upload method for language | |
CN117252259A (en) | Deep learning-based natural language understanding method and AI teaching aid system | |
MacWhinney et al. | Fostering human rights through TalkBank | |
Atwell | Using the Web to Model Modern and Qurʾanic Arabic | |
Mollet et al. | Choosing the best tools for comparative analyses of texts | |
UI Dhonnchadha et al. | Issues in designing a corpus of spoken Irish | |
Lyding | Open demands for corpus analysis tools-a user-centered study | |
TWI599897B (en) | Methodologies, systems, computer programs, and human readable help by asking questions Get record media | |
US11989500B2 (en) | Framework agnostic summarization of multi-channel communication |