TWI766457B - Analysis system and upload method for language - Google Patents

Analysis system and upload method for language Download PDF

Info

Publication number
TWI766457B
TWI766457B TW109141776A TW109141776A TWI766457B TW I766457 B TWI766457 B TW I766457B TW 109141776 A TW109141776 A TW 109141776A TW 109141776 A TW109141776 A TW 109141776A TW I766457 B TWI766457 B TW I766457B
Authority
TW
Taiwan
Prior art keywords
corpus
unit
analysis
module
transcription
Prior art date
Application number
TW109141776A
Other languages
Chinese (zh)
Other versions
TW202221697A (en
Inventor
祝國忠
吳幸美
童寶娟
劉佳林
洪世旻
黃思敏
Original Assignee
國立臺北護理健康大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 國立臺北護理健康大學 filed Critical 國立臺北護理健康大學
Priority to TW109141776A priority Critical patent/TWI766457B/en
Application granted granted Critical
Publication of TW202221697A publication Critical patent/TW202221697A/en
Publication of TWI766457B publication Critical patent/TWI766457B/en

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

An analysis system and method for language includes a login module for users to log in. The login module is electrically connected to a collection module that is used to collect and manage various research corpora and conduct management operations through webpage. The management operations include modification, analysis and deletion procedures. The collection module is electrically connected to a transcription module, which is used to the uploaded corpus, audio and video data is used for transcoding, and the relevant information fields of the file to be entered are also displayed on the same page. In addition, the transcription module is electrically connected to an analysis module. It is used to analyze the transcribed corpus files. The analysis module has the function of word hyphenation and analyzes language samples in an automated manner.

Description

用於語言之分析系統及上傳分析方法 Analysis system and upload analysis method for language

本發明係有關一種分析系統,尤指一種將上傳、轉錄及分析功能整合於一之用於語言分析系統。 The present invention relates to an analysis system, in particular to a language analysis system integrating uploading, transcription and analysis functions.

兒童語言發展是認知能力發展中一項重要指標,會影響兒童學習和思考表現,根據國內研究調查發現,語言障礙在低年齡時,發生率越高,且語言障礙是特殊教育對象中出現率最高的一類,若兒童語言發展問題能在早期發現並治療,可降低語言障礙的發生率以避免因語言障礙問題而導致的社會問題與情緒困擾。 Children's language development is an important indicator in the development of cognitive ability, which will affect children's learning and thinking performance. According to domestic research, it is found that the incidence of language disorders is higher at a young age, and language disorders have the highest incidence among special education objects. If children's language development problems can be detected and treated at an early stage, the incidence of language barriers can be reduced to avoid social problems and emotional distress caused by language barriers.

兒童語言發展可藉由錄音或錄影記錄兒童平時日常語言使用情形,再經由語言治療師將兒童說出的話語轉錄成文字檔,最後透過電腦系統進行語言樣本分析,計算出兒童語言發展相關指標,讓語言治療師進行評估,找出兒童語言發展現況。 Children's language development can be recorded by audio or video recording of children's daily language use, and then the speech uttered by the child is transcribed into text files by a speech therapist. Finally, language samples are analyzed through a computer system to calculate the indicators related to children's language development. Have a speech therapist do an assessment to find out where the child's language development is.

這幾年來,現有支援華語兒童語言分析系統提供了以下功能,包括制定了一套語言轉錄標準格式;提供分析程式進行語言發展指標分析;提供網頁分享服務等功能;然而現有的系統具有以下問題,包括現有系統尚未有完善的整合,在使用上沒有一致的介面,且在制定的轉錄格式有複雜嚴謹的規 則,致使在初次使用時,需花時間學習,另外,在市面上仍有許多不同格式的語言分析系統,在轉換時需花費時間成本,最後,系統之間因中文字詞間相連的特性,在系統使用時需額外進行斷詞的動作等問題存在。 Over the past few years, the existing language analysis system for Chinese-speaking children has provided the following functions, including formulating a set of standard language transcription formats; providing analysis programs for language development index analysis; providing web page sharing services and other functions; however, the existing systems have the following problems: Including that the existing system has not been fully integrated, there is no consistent interface in use, and there are complex and rigorous regulations in the developed transcription format. Therefore, it takes time to learn when it is used for the first time. In addition, there are still many language analysis systems in different formats on the market, and it takes time and cost to convert. There are problems such as extra word segmentation actions when the system is used.

因此在現有系統對於兒童語言治療師進行華語的語言樣本分析上,仍存在者許多不便之處,如何改善並簡化中間的步驟是個重要的議題。 Therefore, there are still many inconveniences in the analysis of Chinese language samples for child language therapists in the existing system. How to improve and simplify the intermediate steps is an important issue.

針對上述之缺失,本發明之主要目的在於提供一種用於語言之分析系統及上傳分析方法,將SaaS(Service as a service)服務導入至兒童語言樣本分析中,利用網頁技術,將分析三個步驟蒐集、轉錄、分析所需之功能整合為一,實現單一窗口介面便能完成兒童語言樣本分析系統,節省研究人員進行兒童語言樣本分析所需時間與複雜度。 In view of the above deficiencies, the main purpose of the present invention is to provide a language analysis system and upload analysis method, import SaaS (Service as a service) service into the analysis of children's language samples, and use web technology to analyze three steps. The functions required for collection, transcription, and analysis are integrated into one, and a single-window interface can be implemented to complete the child language sample analysis system, saving researchers the time and complexity required for child language sample analysis.

為達成上述之目的,本發明係主要提供一種用於語言之分析系統,係包括一登入模組,該登入模組係用於使用者登入之用;一蒐集模組,係電性連接該登入模組,該蒐集模組係用以蒐集及管理各種不同研究之語料,該蒐集模組更包括一語料單元,係作為語料管理之用,該語料單元經由行列顯示各種已經上傳之語料資料,並透過網頁化來進行管理操作,該管理操作包括修改、分析及刪除程序;一語料紀錄單元,係電性連接該語料單元,用以儲存上傳之語料資料;一語料新增單元,係電性連接該語料單元,該語料新增單元係用於啟動欲上傳新語料之程序;一語料共享單元,係電性連接該語料單元,該語料共享單元用於單一使用者將所上傳之語料資訊進行分享操作;一轉錄模組,係電性連接該蒐集模組,該轉錄模組係用於將上傳之語料影音資料進行轉檔之用,該轉錄模組更包括一上傳單元,於該語料新增單元啟動後,自動連接 該上傳單元,將欲上傳之影音資料利用拖曳或開啟檔案方式之任一種方式進行上傳,於同一頁面中也同時顯示欲輸入該收案檔案之相關資訊欄位;一轉錄單元,係電性連接該上傳單元,該轉錄單元係用於輔助所上傳之語意資料轉錄之用;一分析模組,係電性連接該轉錄模組,該分析模組係用以分析轉錄後之語料檔案,該分析模組更包括:一語料分析單元,該語料分析單元係用以分析該語音檔案,該語料分析單元中具有斷字斷詞之功能,以自動化之方式進行語言樣本分析;及一匯出單元,係電性連接該語料分析單元,該匯出單元係用以接收來自該語料分析單元之分析資料並進行輸出。 In order to achieve the above-mentioned purpose, the present invention mainly provides a language analysis system, which includes a login module, which is used for user login; and a collection module, which is electrically connected to the login Module, the collection module is used to collect and manage the corpus of various studies. The collection module further includes a corpus unit, which is used for corpus management. The corpus unit displays various uploaded data through the rows. The corpus data is managed through web pages, and the management operations include modification, analysis and deletion procedures; a corpus recording unit is electrically connected to the corpus unit to store the uploaded corpus data; a language A corpus adding unit is electrically connected to the corpus unit, and the corpus adding unit is used to start a program for uploading new corpus; a corpus sharing unit is electrically connected to the corpus unit, and the corpus is shared The unit is used for a single user to share the uploaded corpus information; a transcription module is electrically connected to the collection module, and the transcription module is used for transcoding the uploaded corpus audio and video data. , the transcription module further includes an uploading unit, which is automatically connected after the new unit of the corpus is activated The uploading unit uploads the video and audio data to be uploaded by dragging or opening the file, and simultaneously displays the relevant information fields of the received file on the same page; a transcription unit is electrically connected the uploading unit, the transcription unit is used for assisting the transcription of the uploaded semantic data; an analysis module is electrically connected to the transcription module, the analysis module is used for analyzing the transcribed corpus file, the The analysis module further includes: a corpus analysis unit, the corpus analysis unit is used for analyzing the voice file, the corpus analysis unit has the function of hyphenation and word segmentation, and analyzes the language samples in an automated manner; and a The exporting unit is electrically connected to the corpus analysis unit, and the exporting unit is used for receiving and outputting the analysis data from the corpus analyzing unit.

本發明之另一目的在於提供一種用於語言之上傳分析方法,係包括選擇上傳資料(S1);根據上傳資料類型進行轉錄(S2);該上傳資料類型為音訊或影像之任一種,則會提供一轉錄介面進行轉錄(S3);該上傳檔案為指定文件格式,則進行解析文件格式並轉錄(S4);完成上傳轉錄之後,對轉錄後之檔案進行斷詞斷句及分析(S5);輸出該分析結果(S6)。 Another object of the present invention is to provide an upload analysis method for language, which includes selecting upload data (S1); transcribing according to the upload data type (S2); if the upload data type is either audio or video, then A transcription interface is provided for transcription (S3); if the uploaded file is in a specified file format, the file format is parsed and transcribed (S4); after uploading and transcription is completed, word segmentation and analysis are performed on the transcribed file (S5); output The analysis result (S6).

為讓本發明之上述和其他目的、特徵和優點能更明顯易懂,下文特舉較佳實施例,並配合所附圖式,作詳細說明如下。 In order to make the above-mentioned and other objects, features and advantages of the present invention more clearly understood, preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

1:登入模組 1: Login to the module

11:會員註冊單元 11: Member registration unit

12:會員登入單元 12: Member login unit

2:蒐集模組 2: Collect modules

21:語料單元 21: Corpus Unit

22:語料紀錄單元 22: Corpus Recording Unit

23:語料新增單元 23: New unit of corpus

24:語料共享單元 24: Corpus Sharing Unit

3:轉錄模組 3: Transcription module

31:上傳單元 31: Upload unit

32:轉錄單元 32: Transcription unit

4:分析模組 4: Analysis module

41:語料分析單元 41: Corpus Analysis Unit

42:匯出單元 42: Export unit

第一圖、係為本發明之系統方塊圖。 The first figure is a system block diagram of the present invention.

第二圖、係為本發明之蒐集模組方塊圖。 The second figure is a block diagram of the collection module of the present invention.

第三圖、係為本發明之轉錄模組方塊圖。 The third figure is a block diagram of the transcription module of the present invention.

第四圖、係為本發明之分析模組方塊圖。 The fourth figure is a block diagram of the analysis module of the present invention.

第五圖、係為本發明之上傳分析方法流程圖。 The fifth figure is a flow chart of the upload analysis method of the present invention.

請參閱第一圖,係為本發明之系統方塊圖。本發明之用於語言之分析系統係主要包括一登入模組1、一蒐集模組2、一轉錄模組3及一分析模組4,其中該登入模組1係用於使用者登入之用,該登入模組1更包括一會員註冊單元11及一會員登入單元12,該會員註冊單元11係提供給第一次使用之使用者輸入資料並註冊會員之用,並將該做為會員之資料存入本系統內,該會員登入單元12則用於登入系統之用,以比對是否註冊過會員。 Please refer to the first figure, which is a system block diagram of the present invention. The language analysis system of the present invention mainly includes a login module 1, a collection module 2, a transcription module 3 and an analysis module 4, wherein the login module 1 is used for user login. , the login module 1 further includes a member registration unit 11 and a member login unit 12. The member registration unit 11 is provided for the first-time user to input information and register as a member, and use this as a member's The data is stored in the system, and the member login unit 12 is used to log in to the system to compare whether a member has been registered.

續參閱第一圖。該登入模組1係電性連接該蒐集模組2,配合參閱第二圖之蒐集模組方塊圖,該蒐集模組2係用以蒐集及管理各種不同研究之語料,其中該蒐集模組2更包括一語料單元21,該語料單元21係作為語料管理之用,該語料單元21經由行列顯示各種已經上傳之語料資料,並透過網頁化來進行管理操作,該管理操作包括修改、分析及刪除程序;該語料單元21電性連接一語料紀錄單元22,該語料紀錄單元22係為一資料庫,用以儲存上傳之語料資料;又,該語料單元21電性連接一語料新增單元23及一語料共享單元24,該語料新增單元23係用於啟動欲上傳新語料之程序,於本實施例中,該語料新增單元23經由網頁化之頁面進行操作,而該語料共享單元24則用於單一使用者將所上傳之語料資訊進行分享操作。 Continue to refer to the first figure. The login module 1 is electrically connected to the collection module 2, and the collection module block diagram of the second figure is referred to. The collection module 2 is used to collect and manage various research corpora. The collection module 2 further includes a corpus unit 21, the corpus unit 21 is used for corpus management, the corpus unit 21 displays various uploaded corpus data through rows and columns, and performs management operations through web pages. Including modification, analysis and deletion procedures; the corpus unit 21 is electrically connected to a corpus recording unit 22, and the corpus recording unit 22 is a database for storing uploaded corpus data; and, the corpus unit 21 is electrically connected to a corpus adding unit 23 and a corpus sharing unit 24. The corpus adding unit 23 is used to start a program for uploading a new corpus. In this embodiment, the corpus adding unit 23 The operation is performed through a web page, and the corpus sharing unit 24 is used for a single user to share the uploaded corpus information.

續參閱第一圖。該蒐集模組2係電性連接該轉錄模組3,該轉錄模組3係用於將上傳之語料影音資料進行轉檔之用,該轉錄模組3更包括一上傳單元31,如第三圖之轉錄模組方塊圖所示,於該語料新增單元23啟動後,系統自動連接該上傳單元31,透過網頁化之操作,將欲上傳之影音資料利用拖曳或開啟檔案方式之任一種方式進行上傳,於同一頁面中也同時顯示欲輸入該收案檔 案之相關資訊欄位,於本實施例中該些欄位包含名稱、歲數、性別、收錄地點、情境介紹、參與人員和收案日期等資料,該些資訊欄位同時具有針對不同之語言樣本分析所需之資訊進行設計,包括增加發話者、語句開始及結束時間、語句內容和語句註解等網頁互動功能;另,該上傳單元31係電性連接一轉錄單元32,該轉錄單元32係用於輔助所上傳之語意資料轉錄之用,於本實施例中,該轉錄單元32係經由語音辨識模型進行轉錄輔助,如Google語音辨識模型,當上傳單元31完成影音上傳後,利用活性語音檢測判斷影片發話之區段,將發話區段切割,再透過該語音辨識模型進行語音辨識;此外,該轉錄模組3為本發明於轉錄後之語料檔案能與傳統語言樣本分析工具接軌,該轉錄模組3具有支援將轉錄完成語句匯出成符合該工具規範或格式之功能。 Continue to refer to the first figure. The collection module 2 is electrically connected to the transcription module 3, and the transcription module 3 is used for converting the uploaded corpus audio-visual data. The transcription module 3 further includes an uploading unit 31, as shown in the first As shown in the block diagram of the transcription module in Figure 3, after the corpus adding unit 23 is activated, the system automatically connects to the uploading unit 31, and through the web-based operation, the video and audio data to be uploaded can be dragged or opened by any method. One way to upload, and also display the received file to be input on the same page The relevant information fields of the case, in this embodiment, these fields include name, age, gender, recording location, situation introduction, participants and the date of receipt of the case and other information, and these information fields also have different language samples. Analyze the required information for design, including adding web page interactive functions such as speaker, sentence start and end time, sentence content, and sentence annotation; in addition, the uploading unit 31 is electrically connected to a transcription unit 32, and the transcription unit 32 uses For the purpose of assisting the transcription of the uploaded semantic data, in this embodiment, the transcription unit 32 performs transcription assistance through a speech recognition model, such as the Google speech recognition model. In the uttered section of the video, the uttered section is cut, and then the speech recognition model is used for speech recognition; in addition, the transcription module 3 is that the transcribed corpus file of the present invention can be in line with traditional language sample analysis tools. Module 3 has the function of supporting the export of transcription completed sentences into a standard or format conforming to the tool.

請參閱第一圖。該轉錄模組3係電性連接該分析模組4,該分析模組4係用以分析轉錄後之語料檔案,該分析模組4更包括一語料分析單元41,如第四圖之分析模組方塊圖所示,該語料分析單元41係用以分析該語音檔案與轉錄後之檔案,該語料分析單元41中具有斷字斷詞之功能,以自動化之方式進行語言樣本分析,該分析內容包括將該檔案內容之其他檔案之語句分別計算其語言發展之指標,該些指標包括詞類比例、平均語句長度等分析指標,該分析指標之結果則可透過視覺化方式於網頁呈現,增加指標之易讀性,並結合文字探勘技術,展示不同面向資訊;該語料分析單元41係電性連接一匯出單元42,該匯出單元42係用以接收來自該語料分析單元41之分析資料,並以CHILDES和CLSA兩種格式之任一種作為匯出格式。 See the first image. The transcription module 3 is electrically connected to the analysis module 4, and the analysis module 4 is used to analyze the transcribed corpus file. The analysis module 4 further includes a corpus analysis unit 41, as shown in the fourth figure As shown in the block diagram of the analysis module, the corpus analysis unit 41 is used to analyze the voice file and the transcribed file. The corpus analysis unit 41 has the function of hyphenation and hyphenation, and performs language sample analysis in an automated manner. , the analysis content includes calculating the language development indicators of the sentences in other files of the file content. These indicators include analysis indicators such as part-of-speech ratio, average sentence length, etc. The results of the analysis indicators can be visualized on the webpage. , to increase the readability of the indicators, and combined with text mining technology to display information in different aspects; the corpus analysis unit 41 is electrically connected to an export unit 42, and the export unit 42 is used for receiving data from the corpus analysis unit. 41 analysis data, and use either CHILDES or CLSA format as the export format.

請參閱第五圖,係為本發明之上傳分析方法流程圖。本發明之上傳分析方法係包括選擇上傳資料(S1),可被上傳之資料類型包括指定文件格式、音訊或影像之任一種;根據上傳資料類型進行轉錄(S2),其中該上傳資料類型為音訊或影像之任一種,則會提供一轉錄介面進行轉錄(S3),該轉錄介面 係對影音檔案進行欄位資料輸入,該些欄位資料包括發話者、語句開始及結束時間、語句內容和語句註解等,以及影音檔案之個人資料輸入,包括名稱、歲數、性別、收錄地點、情境介紹、參與人員和收案日期等資料,若為該上傳檔案為指定文件格式,則進行解析文件格式並轉錄(S4),包括將文件中所列角色、語句及時間等資訊萃取出來,並進行轉錄;完成上傳轉錄之後,對轉錄後之檔案進行斷詞斷句及分析(S5),該分析內容包括將該檔案內容之其他檔案之語句分別計算其語言發展之指標,該些指標包括詞類比例、平均語句長度等分析指標;最後,輸出該分析結果(S6),該分析結果可利用指定之CHILDES和CLSA兩種格式之任一種作為匯出格式,同時利用視覺化方式於網頁呈現。 Please refer to Fig. 5, which is a flowchart of the upload analysis method of the present invention. The uploading analysis method of the present invention includes selecting uploading data (S1), and the type of data that can be uploaded includes any one of a specified file format, audio or video; transcribing according to the uploading data type (S2), wherein the uploading data type is audio or any of the images, a transcription interface will be provided for transcription (S3), the transcription interface It is the input of field data for the audio and video files, the field data includes the speaker, the start and end time of the sentence, the content of the sentence and the comment of the sentence, etc., as well as the personal data input of the audio and video files, including name, age, gender, recording location, If the uploaded file is in the specified file format, the file format is parsed and transcribed (S4), including extracting the roles, sentences, time and other information listed in the file, and Transcribe; after uploading and transcribing, perform word segmentation and analysis on the transcribed file (S5), the analysis content includes calculating the language development indicators of the sentences in other files of the file content, and these indicators include the proportion of parts of speech , average sentence length and other analysis indicators; finally, output the analysis result (S6), the analysis result can use any one of the two specified CHILDES and CLSA formats as the export format, and at the same time use the visual way to present on the web page.

惟以上所述之實施方式,是為較佳之實施實例,當不能以此限定本發明實施範圍,若依本發明申請專利範圍及說明書內容所作之等效變化或修飾,皆應屬本發明下述之專利涵蓋範圍。 However, the above-mentioned embodiments are preferred implementation examples, which should not limit the scope of the present invention. Any equivalent changes or modifications made according to the scope of the patent application of the present invention and the contents of the description shall belong to the following aspects of the present invention. the scope of patent coverage.

1:登入模組 1: Login to the module

11:會員註冊單元 11: Member registration unit

12:會員登入單元 12: Member login unit

2:蒐集模組 2: Collect modules

3:轉錄模組 3: Transcription module

4:分析模組 4: Analysis module

Claims (8)

一種用於語言之分析系統,係包括:一登入模組,該登入模組係用於使用者登入之用;一蒐集模組,係電性連接該登入模組,該蒐集模組係用以蒐集及管理各種不同研究之語料,該蒐集模組更包括:一語料單元,係作為語料管理之用,該語料單元經由行列顯示各種已經上傳之語料資料,並透過網頁化來進行管理操作,該管理操作包括修改、分析及刪除程序;一語料紀錄單元,係電性連接該語料單元,用以儲存上傳之語料資料;一語料新增單元,係電性連接該語料單元,該語料新增單元係用於啟動欲上傳新語料之程序;一語料共享單元,係電性連接該語料單元,該語料共享單元用於單一使用者將所上傳之語料資訊進行分享操作;一轉錄模組,係電性連接該蒐集模組,該轉錄模組係用於將上傳之語料影音資料進行轉檔之用,該轉錄模組更包括:一上傳單元,於該語料新增單元啟動後,自動連接該上傳單元,將欲上傳之影音資料利用拖曳或開啟檔案方式之任一種方式上傳至該分析系統,於同一頁面中也同時顯示欲輸入該收案檔案之相關資訊欄位;一轉錄單元,係電性連接該上傳單元,該轉錄單元係用於輔助所上傳之語意資料轉錄之用;一分析模組,係電性連接該轉錄模組,該分析模組係用以分析轉錄後之語料檔案,該分析模組更包括: 一語料分析單元,該語料分析單元係用以分析該語音檔案與轉錄後之檔案,該語料分析單元中具有斷字斷詞之功能,以進行語言樣本分析;及一匯出單元,係電性連接該語料分析單元,該匯出單元係用以接收來自該語料分析單元之分析資料並輸出至一電子裝置。 An analysis system for language, comprising: a login module, which is used for user login; a collection module, which is electrically connected to the login module, and the collection module is used for Collection and management of various research corpora, the collection module further includes: a corpus unit, which is used for corpus management. Perform management operations, which include modifying, analyzing and deleting programs; a corpus recording unit, which is electrically connected to the corpus unit for storing uploaded corpus data; a corpus addition unit, which is electrically connected The corpus unit, the corpus adding unit is used to start the program for uploading new corpus; a corpus sharing unit is electrically connected to the corpus unit, and the corpus sharing unit is used for a single user to upload the uploaded corpus. The corpus information is shared; a transcription module is electrically connected to the collection module, and the transcription module is used for converting the uploaded corpus audio-visual data. The transcription module further includes: a The uploading unit is automatically connected to the uploading unit after the corpus addition unit is activated, and uploads the audio and video data to be uploaded to the analysis system by either dragging or opening the file, and simultaneously displaying the desired input on the same page The relevant information field of the received file; a transcription unit, which is electrically connected to the uploading unit, and the transcription unit is used to assist in the transcription of the uploaded semantic data; an analysis module, which is electrically connected to the transcription module group, the analysis module is used to analyze the transcribed corpus file, and the analysis module further includes: a corpus analysis unit, the corpus analysis unit is used for analyzing the voice file and the transcribed file, and the corpus analysis unit has the function of hyphenation and word segmentation, so as to analyze the language samples; and an export unit, The corpus analysis unit is electrically connected, and the export unit is used for receiving the analysis data from the corpus analysis unit and outputting it to an electronic device. 如請求項1所述之分析系統,其中該些欄位包含名稱、歲數、性別、收錄地點、情境介紹、參與人員和收案日期等資料,該些資訊欄位同時具有針對不同之語言樣本分析所需之資訊進行設計,包括增加包括發話者、語句開始及結束時間、語句內容和語句註解。 The analysis system as claimed in claim 1, wherein the fields include name, age, gender, recording location, situation introduction, participants, and date of receipt, etc., and the information fields also have analysis for different language samples The required information is designed, including the addition of the speaker, the start and end time of the sentence, the content of the sentence and the annotation of the sentence. 如請求項1所述之分析系統,其中該匯出格式係以CHILDES和CLSA兩種格式之任一種作為匯出格式。 The analysis system according to claim 1, wherein the export format is any one of CHILDES and CLSA formats as the export format. 一種用於語言之上傳分析方法,係包括:a.通過一上傳單元,選擇上傳資料(S1);b.通過一轉錄單元,根據該上傳資料類型進行轉錄(S2);c.該上傳資料類型為音訊或影像之任一種,則會提供一轉錄介面進行轉錄(S3);d.該上傳資料的檔案為指定文件格式,則進行解析文件格式並轉錄(S4);e.完成上傳轉錄之後,通過一語料分析單元對轉錄後之檔案進行斷詞斷句及分析(S5);f.通過一匯出單元輸出該分析結果(S6)至一電子裝置。 A method for uploading and analyzing language, comprising: a. selecting uploading data through an uploading unit (S1); b. transcribing according to the uploading data type through a transcription unit (S2); c. the uploading data type If it is either audio or video, a transcription interface will be provided for transcription (S3); d. If the file of the uploaded data is in the specified file format, the file format will be parsed and transcribed (S4); e. After uploading and transcribing, Perform word segmentation and analysis on the transcribed file through a corpus analysis unit (S5); f. output the analysis result through an export unit (S6) to an electronic device. 如請求項4所述之上傳分析方法,其中a步驟中上傳資料類型包括指定文件格式、音訊或影像之任一種。 The upload analysis method according to claim 4, wherein the type of the uploaded data in step a includes any one of a specified file format, audio or video. 如請求項4所述之上傳分析方法,其中c步驟中該轉錄介面係對影音檔案進行欄位資料輸入,該些欄位資料包括發話者、語句開始及結束時間、語句內容和語句註解等,以及影音檔案之個人資料輸入,包括名稱、歲數、性別、收錄地點、情境介紹、參與人員和收案日期等資料。 The upload analysis method according to claim 4, wherein in step c, the transcription interface is to input field data for the audio and video files, and the field data includes the speaker, the start and end time of the sentence, the content of the sentence, and the annotation of the sentence, etc., And the input of personal data of the audio and video files, including name, age, gender, recording location, situation introduction, participants and the date of receipt of the case. 如請求項4所述之上傳分析方法,其中d步驟中係將文件中所列角色、語句及時間等資訊萃取出來,並進行轉錄。 The uploading analysis method according to claim 4, wherein in step d, information such as roles, sentences, and time listed in the file is extracted and transcribed. 如請求項4所述之上傳分析方法,其中f步驟中該分析結果可利用指定之CHILDES和CLSA兩種格式之任一種作為匯出格式。 The upload analysis method according to claim 4, wherein in step f, the analysis result can use any one of the two specified CHILDES and CLSA formats as the export format.
TW109141776A 2020-11-27 2020-11-27 Analysis system and upload method for language TWI766457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW109141776A TWI766457B (en) 2020-11-27 2020-11-27 Analysis system and upload method for language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW109141776A TWI766457B (en) 2020-11-27 2020-11-27 Analysis system and upload method for language

Publications (2)

Publication Number Publication Date
TW202221697A TW202221697A (en) 2022-06-01
TWI766457B true TWI766457B (en) 2022-06-01

Family

ID=83062234

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109141776A TWI766457B (en) 2020-11-27 2020-11-27 Analysis system and upload method for language

Country Status (1)

Country Link
TW (1) TWI766457B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200922518A (en) * 2007-11-22 2009-06-01 hui-zhen Wei System and method for examination on development of emotion and intellectual development of infant baby
US9047872B1 (en) * 2007-02-12 2015-06-02 West Corporation Automatic speech recognition tuning management
CN109819127A (en) * 2019-03-08 2019-05-28 周诚 The management method and system of harassing call
TW202018529A (en) * 2018-11-08 2020-05-16 中華電信股份有限公司 System for inquiry service and method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9047872B1 (en) * 2007-02-12 2015-06-02 West Corporation Automatic speech recognition tuning management
TW200922518A (en) * 2007-11-22 2009-06-01 hui-zhen Wei System and method for examination on development of emotion and intellectual development of infant baby
TW202018529A (en) * 2018-11-08 2020-05-16 中華電信股份有限公司 System for inquiry service and method thereof
CN109819127A (en) * 2019-03-08 2019-05-28 周诚 The management method and system of harassing call

Also Published As

Publication number Publication date
TW202221697A (en) 2022-06-01

Similar Documents

Publication Publication Date Title
US11551567B2 (en) System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
Meurers et al. Evidence and interpretation in language learning research: Opportunities for collaboration with computational linguistics
Miller et al. Using language sample analysis to assess spoken language production in adolescents
US20180366013A1 (en) System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
Niemants The transcription of interpreting data
CN111259631B (en) Referee document structuring method and referee document structuring device
TW201312548A (en) Automatically creating a mapping between text data and audio data
Göpferich Data documentation and data accessibility in translation process research
Nagy et al. 12 Transcription
Campillos Llanos A Spanish learner oral corpus for computer-aided error analysis
Rose et al. Using PhonBank and Phon in studies of phonological development and disorders
Wehrmeyer A corpus for signed language<? br?> interpreting research
Bell et al. Designing learner corpora: Collection, transcription, and annotation
Lin et al. Corpus linguistics
JP2016085284A (en) Program, apparatus and method for estimating evaluation level with respect to learning item on the basis of person&#39;s remark
CN110008314B (en) Intention analysis method and device
TWI766457B (en) Analysis system and upload method for language
CN117252259A (en) Deep learning-based natural language understanding method and AI teaching aid system
MacWhinney et al. Fostering human rights through TalkBank
Atwell Using the Web to Model Modern and Qurʾanic Arabic
Mollet et al. Choosing the best tools for comparative analyses of texts
UI Dhonnchadha et al. Issues in designing a corpus of spoken Irish
Lyding Open demands for corpus analysis tools-a user-centered study
TWI599897B (en) Methodologies, systems, computer programs, and human readable help by asking questions Get record media
US11989500B2 (en) Framework agnostic summarization of multi-channel communication