TW201520996A - Audiovisual automatic scoring and training system for presentation skill - Google Patents

Audiovisual automatic scoring and training system for presentation skill Download PDF

Info

Publication number
TW201520996A
TW201520996A TW102143729A TW102143729A TW201520996A TW 201520996 A TW201520996 A TW 201520996A TW 102143729 A TW102143729 A TW 102143729A TW 102143729 A TW102143729 A TW 102143729A TW 201520996 A TW201520996 A TW 201520996A
Authority
TW
Taiwan
Prior art keywords
feature
image
audio
database
processor
Prior art date
Application number
TW102143729A
Other languages
Chinese (zh)
Other versions
TWI528336B (en
Inventor
Yow-Jyy Lee
Original Assignee
Nat Taichung University Science & Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nat Taichung University Science & Technology filed Critical Nat Taichung University Science & Technology
Priority to TW102143729A priority Critical patent/TWI528336B/en
Publication of TW201520996A publication Critical patent/TW201520996A/en
Application granted granted Critical
Publication of TWI528336B publication Critical patent/TWI528336B/en

Links

Abstract

This invention provides an audiovisual automatic scoring and training system for presentation skill, which is characterized by having an image capturing module, a sound capturing module, a processor, a database, and a display. Image and sound characteristics of a target object are captured by the image capturing module and the sound capturing module and then transmitted to the processor for analysis. The processor further matches and compares the image and the tone quality of the target object with a sample image and a sample sound source in the database and then generates a score based on a predetermined parameter provided by the database. The score is stored in the database and then shown on the display.

Description

演講技巧之影音自動評分與培訓系統 Audio and video automatic scoring and training system for presentation skills

本發明涉及一種能自動為演講、演說評定分數的演講技巧之影音自動評分與培訓系統。 The invention relates to an automatic audio and video scoring and training system for a speech skill which can automatically score points for speeches and speeches.

學校的文學院語文類科系,為了解學生學習的成果,最簡單的方法就是用演講了,用演講來驗收學生的語言能力、儀態及對主題是否有深度的了解,是最有效率的方式。 In the school's liberal arts department, the easiest way to understand the results of student learning is to use lectures. It is the most efficient way to use lectures to check students' language ability, manners and deep understanding of the subject. .

目前班級為單位的演講,通常耗時費力,大約要佔據6~8的課堂時間才能聽完演講,而老師一個人在聽完班級的演講後,在給演講內容評分時,容易受到前一場演講的印象影響,造成評分上的不公平。 At present, the lectures of the class are usually time-consuming and laborious. It takes about 6-8 classes to listen to the lectures. The teacher is easy to receive the previous speech after listening to the lectures. Impressions affect the scores that are unfair.

再者,一般學校老師不會只帶一個班級,同時聽取大量的演講,對老師來說也是一種負擔,容易造成失誤。 Moreover, the general school teacher will not only bring a class, but also listen to a large number of speeches, which is also a burden for the teacher, which is easy to cause mistakes.

有鑒於此,如何提供一種能自動評分,且不用花費太多時間,便能快速為演講評分的演講技巧之影音自動評分與培訓系統,便成為本發明欲改進的課題。 In view of this, how to provide an audio-visual automatic scoring and training system capable of automatically scoring and quickly scoring a speech without spending too much time becomes a subject to be improved by the present invention.

本發明目的在於提供一種能自動為演講、演說評定分數的演講技巧之影音自動評分與培訓系統。 The object of the present invention is to provide an audio-visual automatic scoring and training system for a presentation skill that can automatically score points for speeches and speeches.

為解決上述問題及達到本發明的目的,本發明的技術手段是這樣實現的,為一種演講技巧之影音自動評分與培訓系統,其特徵在於包 括:所述影像擷取模組(1)、一聲音擷取模組(2)、一處理器(3)、一資料庫(4)以及顯示器(5);一影像擷取模組(1),其包含一攝影裝置(11),其用於偵測一特定區域內的目標物影像,並將所擷取到目標物影像的特徵(12),傳送至處理器(3)進行分析;所述聲音擷取模組(2),其用於偵測一特定區域內的音源(21),並且擷取該音源(21)的頻譜大小與演講段落,並將其傳送至處理器(3)進行分析;所述處理器(3),與影像擷取模組(1)、聲音擷取模組(2)、資料庫(4)及顯示器(5)連結,具有邏輯運算、整合、輸入及輸出等功能,為提供目標物影像、音色進行匹配比對後,並根據資料庫(4)所提供一預設參數,給予一評分數值,並將該評分數值建立於資料庫(4),再將評分顯示於一顯示器(5)上;所述資料庫(4),提供存與讀取,包含有一樣本音訊、一演講者音訊、一樣本演講者影像與一演講者影像,該資料庫(4)能供演講者預先輸人一樣本音訊及一樣本演講者影像的預設參數,配合儲存該聲音擷取模組及影像擷取模組,所接收到的演講者音訊及影像,並將該演講者音訊及影像回饋給處理器進行交叉比對;藉此,透過本創作能用於評估演講者台上表現是否穩健,進而發展成一訓練課程的指導原則。 In order to solve the above problems and achieve the object of the present invention, the technical means of the present invention is realized in this way, and is an audio-visual automatic scoring and training system for speaking skills, which is characterized by The image capturing module (1), a sound capturing module (2), a processor (3), a database (4), and a display (5); an image capturing module (1) ), comprising a photographing device (11) for detecting an object image in a specific area, and transmitting the feature (12) captured to the object image to the processor (3) for analysis; The sound capturing module (2) is configured to detect a sound source (21) in a specific area, and capture the spectrum size and the speech passage of the sound source (21), and transmit the same to the processor (3) Performing an analysis; the processor (3) is connected to the image capturing module (1), the sound capturing module (2), the data library (4), and the display (5), and has logical operations, integration, and input. And output and other functions, in order to provide target image and tone matching, and according to a preset parameter provided by the database (4), give a score value, and establish the score value in the database (4), The score is displayed on a display (5); the database (4) provides storage and reading, and includes a sample audio, a speaker audio, and the same speaker image and performance. The speaker image, the database (4) can be used by the presenter to input the same preset parameters of the audio and the same speaker image, and the stored sound capturing module and the image capturing module are received. The speaker's audio and video, and the speaker's audio and video are fed back to the processor for cross-comparison; thereby, through this creation can be used to evaluate whether the speaker's performance on the stage is stable, and then develop into a training course guiding principle.

更優選的是,所述目標物影像的特徵(12),是為下列之一,臉部特徵、軀幹特徵、手勢特徵。 More preferably, the feature (12) of the target image is one of the following, a facial feature, a torso feature, and a gesture feature.

更優選的是,所述特徵(12),是為臉部特徵時,該處理器(3)判斷臉部特徵的方法,是為下列之一或其組合:表情是否僵硬不自然、是否面帶微笑、眼神是否閃爍不定、眼神是否逃避觀眾、眼神是否照顧到全部的觀眾。 More preferably, when the feature (12) is a facial feature, the processor (3) determines the facial feature by one or a combination of the following: whether the expression is stiff and unnatural, whether or not the face is worn. Smile, whether your eyes are flickering, whether your eyes are escaping from the audience, and whether your eyes are taking care of all the audience.

更優選的是,所述特徵(12),是為軀幹特徵時,該處理器(3)判斷軀幹特徵的方法,是為下列之一或其組合:站姿是否挺拔、是否挺胸另與肩膀呈一直線、上半身是否放鬆但直挺、下半身是否站姿穩定平穩、 是否隨意晃動、穿著是否正式得體。 More preferably, the feature (12) is a method for determining the torso feature of the processor (3) when it is a torso feature, and is one of the following or a combination thereof: whether the standing posture is tall and straight, whether the chest is strong or not. Is it straight, whether the upper body is relaxed but straight, and the lower body is stable and stable? Whether it is free to shake and wear is formal and decent.

更優選的是,所述特徵(12),是為手勢特徵時,該處理器(3)判斷手勢特徵的方法,是為下列之一或其組合:手勢是否流暢、手是否與臉部特徵接觸、手是否與軀幹特徵接觸。 More preferably, when the feature (12) is a gesture feature, the processor (3) determines the gesture feature by one or a combination of the following: whether the gesture is smooth, whether the hand is in contact with the facial feature. Whether the hand is in contact with the torso features.

更優選的是,所述音源(21),其是由下列特徵組成,聲調特徵、音色特徵、語文字義特徵。 More preferably, the sound source (21) is composed of the following features, a tonal feature, a timbre feature, and a linguistic feature.

更優選的是,所述處理器(3),其對該音源(21)進行判斷時,判斷的特徵細分為下列幾種方式:聲調是否平穩、咬字是否清晰、音量是否充足、聲調速度是否保持在110wpm到130wpm之間。 More preferably, the processor (3), when determining the sound source (21), determines the feature subdivided into the following ways: whether the tone is smooth, whether the bite is clear, whether the volume is sufficient, and whether the tone speed is maintained. Between 110wpm and 130wpm.

更優選的是,所述資料庫(4),更能存放大量演講錄像,再由評分者設定為五級評分,配合本發明生成一組評分單元,讓評分者能快速對演講進行評分。 More preferably, the database (4) is more capable of storing a large number of speech recordings, and is set by the rating person as a five-level rating, and the present invention generates a set of scoring units, so that the rating person can quickly rate the speech.

更優選的是,所述資料庫(4),能記錄該目標物影像常出現的特徵(12),並將其輸入至資料庫(4),以作為下次評分的依據。 More preferably, the database (4) is capable of recording features (12) that are often present in the target image and inputting them into the database (4) for use as a basis for the next rating.

更優選的是,所述資料庫(4),能記錄該目標物音源(21)的特點,並將其輸入至資料庫(4),以作為下次評分的依據。 More preferably, the database (4) is capable of recording the characteristics of the target sound source (21) and inputting it to the database (4) for use as a basis for the next rating.

更優選的是,所述處理器(3)及該資料庫(4),更能與網際網路連接,以供遠端操作使用;所述資料庫(4),更能配合網際網路連接位於遠端的攝影裝置(11)與聲音擷取模組(2),將遠端處通過前述攝影裝置(11)與聲音擷取模組(2),而所擷取到影像的特徵(12)與音源(21),回饋給處理器(3)進行交叉比對,並將評分顯示於位於遠端的顯示器(5)上。 More preferably, the processor (3) and the database (4) are more connectable to the Internet for remote operation; the database (4) is more compatible with the Internet connection. The remotely located camera device (11) and the sound capture module (2) pass the camera device (11) and the sound capture module (2) at the distal end to capture the features of the image (12) With the sound source (21), the processor (3) is fed back for cross-comparison and the score is displayed on the remotely located display (5).

更優選的是,所述顯示器(5),不僅僅能顯示分數,更能同步將影像擷取模組(1)所錄製的影像於顯示器(5)播放,讓演講者能修正自身的儀態。 More preferably, the display (5) not only can display the score, but also can synchronously play the image recorded by the image capturing module (1) on the display (5), so that the speaker can correct his own state.

與現有技術相比,本發明的作用及效果如下: Compared with the prior art, the functions and effects of the present invention are as follows:

第一點:本發明中,影像擷取模組(1)配合聲音擷取模組(2),能將演講者的影像及聲音擷取下來,之後再藉由處理器(3)比對資料庫(4)內容,藉此對演講者的演講評分,再透過顯示器(5)將分數顯示給評審或演講者,能自動、快速地為演講評分,且不用花費太多時間,降低評審的負擔,更得以免除評分上不公平的問題。 The first point: in the present invention, the image capturing module (1) cooperates with the sound capturing module (2) to capture the image and sound of the speaker, and then compare the data by the processor (3). Library (4) content, which scores the speaker's speech, and then displays the score to the reviewer or speaker through the display (5), which can automatically and quickly score the lecture without spending too much time and reducing the burden of the review. It is also possible to avoid the problem of unfairness in scoring.

第二點:本發明中,處理器(3)配合資料庫(4)作業,不僅僅只能比對演講者影像及聲音,更能將所錄製的演講者影像及聲音進行編輯,以將其中的特徵(12)及音源(21)設定評分標準,並依據該評分標準給分。 The second point: In the present invention, the processor (3) works in conjunction with the database (4), and can not only compare the image and sound of the speaker, but also edit the recorded image and sound of the present speaker to The feature (12) and the sound source (21) set the scoring standard and give points according to the scoring standard.

1‧‧‧影像擷取模組 1‧‧‧Image capture module

11‧‧‧攝影裝置 11‧‧‧Photographing device

12‧‧‧特徵 12‧‧‧Characteristics

2‧‧‧聲音擷取模組 2‧‧‧Sound capture module

21‧‧‧音源 21‧‧‧ source

3‧‧‧處理器 3‧‧‧ Processor

4‧‧‧資料庫 4‧‧‧Database

5‧‧‧顯示器 5‧‧‧ display

第1圖:本發明第一實施例的示意圖。 Fig. 1 is a schematic view showing a first embodiment of the present invention.

第2圖:本發明第一實施例的系統架構圖。 Figure 2 is a diagram showing the system architecture of the first embodiment of the present invention.

第3圖:本發明第一實施例的流程圖。 Figure 3 is a flow chart showing a first embodiment of the present invention.

第4圖:本發明第一實施例的實施示意圖。 Figure 4 is a schematic view showing the implementation of the first embodiment of the present invention.

以下依據圖面所示的實施例詳細說明如後:如第1圖至第4圖所示,圖中揭示出一種演講技巧之影音自動評分與培訓系統,其特徵在於包括:一影像擷取模組(1)、一聲音擷取模組(2)、一處理器(3)、一資料庫(4)以及顯示器(5);所述影像擷取模組(1),其包含一攝影裝置(11),其用於偵測一特定區域內的目標物影像,並將所擷取到目標物影像的特徵(12),傳送至處理器(3)進行分析;所述聲音擷取模組(2),其用於偵測一特定區域內的音源(21),並且擷取該音源(21)的頻譜大小與演講段落,並將其傳送至處理器(3)進行分析;所述處理器(3),與影像擷 取模組(1)、聲音擷取模組(2)、資料庫(4)及顯示器(5)連結,具有邏輯運算、整合、輸入及輸出等功能,為提供目標物影像、音色進行匹配比對後,並根據資料庫(4)所提供一預設參數,給予一評分數值,並將該評分數值建立於資料庫(4),再將評分顯示於一顯示器(5)上;所述資料庫(4),提供存與讀取,包含有一樣本音訊、一演講者音訊、一樣本演講者影像與一演講者影像,該資料庫(4)能供演講者預先輸人一樣本音訊及一樣本演講者影像的預設參數,配合儲存該聲音擷取模組及影像擷取模組,所接收到的演講者音訊及影像,並將該演講者音訊及影像回饋給處理器進行交叉比對;藉此,透過本創作能用於評估演講者台上表現是否穩健,進而發展成一訓練課程的指導原則。 The following is a detailed description of the following embodiments according to the drawings: as shown in FIG. 1 to FIG. 4, the figure discloses a video and audio automatic scoring and training system for speaking skills, which is characterized in that: an image capturing mode is included. a group (1), a sound capturing module (2), a processor (3), a database (4), and a display (5); the image capturing module (1) comprising a photographing device (11), which is used for detecting a target image in a specific area, and transmitting the captured feature (12) to the processor (3) for analysis; the sound capturing module (2) for detecting a sound source (21) in a specific area, and taking the spectrum size of the sound source (21) and the speech paragraph, and transmitting it to the processor (3) for analysis; (3), with images撷 The module (1), the sound capture module (2), the database (4) and the display (5) are connected, and have functions of logic operation, integration, input and output, etc., to provide a matching ratio of the target image and the sound color. Afterwards, according to a preset parameter provided by the database (4), a score value is given, and the score value is established in the database (4), and the score is displayed on a display (5); Library (4), providing storage and reading, including a sample audio, a speaker audio, the same speaker image and a speaker image, the database (4) can be used for the speaker to pre-input the same audio and Pre-set parameters of a sample of the presenter image, together with storing the sound capture module and the image capture module, the received speaker audio and image, and feeding the speaker's audio and image back to the processor for crossover ratio Yes, through this creation can be used to assess whether the speaker's performance on the stage is stable, and then develop into a guiding principle of the training course.

其中,透過影像擷取模組(1)及聲音擷取模組(2),能自行錄製影像及聲音,再透過處理器(3)根據影像及聲音,設定評分及給分的數據資料,再將該數據資料存入資料庫(4),提供下次評分使用。 Through the image capture module (1) and the sound capture module (2), the image and sound can be recorded by themselves, and then the processor (3) can set the score and the data of the score according to the image and sound, and then The data is stored in the database (4) for the next rating.

其次,影像擷取模組(1)及聲音擷取模組(2)能透過藍芽裝置連接處理器(3)及資料庫(4),故能利用手機取代替影像擷取模組(1)及聲音擷取模組(2),方便本發明攜帶及裝設。 Secondly, the image capture module (1) and the sound capture module (2) can be connected to the processor (3) and the database (4) through the Bluetooth device, so that the image capture module can be replaced by the mobile phone (1) And the sound capture module (2) facilitates carrying and mounting of the present invention.

上述中,所述特徵(12),其是為下列之一,臉部特徵、軀幹特徵、手勢特徵。 In the above, the feature (12) is one of the following, a facial feature, a torso feature, and a gesture feature.

其中,人在緊張、興奮的時候有很各種表現,最容易判斷的特徵(12)細分為臉部、軀幹、手勢,藉此判斷演講人情緒上是否穩定,同時對演講人進行評分。 Among them, people have a variety of performances when they are nervous and excited. The most easily judged features (12) are subdivided into faces, torso, and gestures to judge whether the speaker is emotionally stable and to rate the speaker.

上述中,所述特徵(12),是為臉部特徵時,該處理器(3)判斷臉部特徵的方法,是為下列之一或其組合:表情是否僵硬不自然、是否面帶微笑、眼神是否閃爍不定、眼神是否逃避觀眾、眼神是否照顧到全部的 觀眾。 In the above, when the feature (12) is a facial feature, the processor (3) determines the facial feature by one or a combination of the following: whether the expression is stiff and unnatural, whether it is smiling, or not, Whether the eyes are flickering, whether the eyes are escaping from the audience, whether the eyes take care of all Audience.

其中,利用眼睛瞳孔與瞳孔之間的距離判斷,瞳孔消失在螢幕上的頻率判斷,能簡單的抓出眼神是否閃爍不定、眼神是否面對避觀眾、眼神是否照顧到全部的觀眾的狀況,再配合嘴部表情判斷,來對演講人整體表情進行評分。 Among them, judging by the distance between the pupil of the eye and the pupil, the frequency of the pupil disappearing on the screen can easily grasp whether the eyes are flickering, whether the eyes are facing the audience, and whether the eyes take care of the situation of all the audience, and then Match the expression of the mouth to judge the overall expression of the speaker.

上述中,所述特徵(12),是為軀幹特徵時,該處理器(3)判斷軀幹特徵的方法,是為下列之一或其組合:站姿是否挺拔、是否挺胸另與肩膀呈一直線、上半身是否放鬆但直挺、下半身是否站姿穩定平穩、是否隨意晃動、穿著是否正式得體。 In the above, the feature (12) is a method for determining the torso feature when the processor (3) is a torso feature, and is one of the following or a combination thereof: whether the standing posture is straight, whether the chest is straight and the shoulder is in line. Whether the upper body is relaxed but straight, whether the lower body is stable and stable, whether it is swaying at random, and whether it is officially decent.

其中,人體的軀幹部位會因為緊張,讓身體不自覺的產生晃動、抖動,藉此抒發緊張的情緒,故將軀幹特徵藉由處理器(3)判斷,對演講人進行評分。 Among them, the body part of the human body will be shaken and shaken unconsciously because of the tension, so that the nervous emotions are generated, so the trunk characteristics are judged by the processor (3), and the speaker is scored.

上述中,所述特徵(12),是為手勢特徵時,該處理器(3)判斷手勢特徵的方法,是為下列之一或其組合:手勢是否流暢、手是否與臉部特徵接觸、手是否與軀幹特徵接觸。 In the above, the feature (12) is a method for determining the gesture feature when the processor (3) is a gesture feature, and is one of the following or a combination thereof: whether the gesture is smooth, whether the hand is in contact with the facial feature, the hand Whether it is in contact with the torso features.

其中,偵測手與手之間的距離作為判斷的依據,並能根據聲音起伏,來對手勢揮動的頻率進行給分。 Among them, the distance between the hand and the hand is detected as the basis for the judgment, and the frequency of the gesture waving can be given according to the fluctuation of the sound.

上述中,所述音源(21),其是由下列特徵組成,聲調特徵、音色特徵、語文字義特徵。 In the above, the sound source (21) is composed of the following features, a tonal feature, a timbre feature, and a linguistic feature.

其中,根據聲音的音色、聲調及咬字是否清晰,作為辨識音源(21)的重點特徵。 Among them, according to the sound, tone and biting of the sound is clear, as the key feature of the identification sound source (21).

上述中,所所述處理器(3),其對該音源(21)進行判斷時,判斷的特徵細分為下列幾種方式:聲調是否平穩、咬字是否清晰、音量是否充足、聲調速度是否保持在110wpm到130wpm之間。[註:wpm單詞每分 鐘,英文:Words per minute,縮寫wpm]。 In the above, the processor (3), when determining the sound source (21), the determined features are subdivided into the following ways: whether the tone is smooth, whether the biting is clear, whether the volume is sufficient, and whether the tone speed is maintained. 110wpm to 130wpm. [Note: wpm words per minute Clock, English: Words per minute, abbreviation wpm].

其中,處理器(3)藉由比對資料庫(4)中的演講聲調、音色樣本、與文字義模組,去判斷音源(21)的特徵,並依據上述特徵來進行評分或建立評分參數。 The processor (3) determines the characteristics of the sound source (21) by comparing the speech tone, the tone sample, and the text sense module in the database (4), and performs scoring or establishing a scoring parameter according to the above features.

上述中,所述資料庫(4),更能存放大量演講錄像,再由評分者設定為五級評分,配合本發明生成一組評分單元,讓評分者能快速對演講進行評分。 In the above, the database (4) can store a large number of speech recordings, and the rating person sets a five-level rating, and the present invention generates a set of scoring units, so that the rating person can quickly score the speech.

其中,直接透過影像擷取模組(1)及聲音擷取模組(2)將影片存入資料庫(4)中,再透過處理器(3)編輯,將各種特徵、音源數據化,並加以設定評分,讓演講者能透過本發明自行設定樣本演講者及樣本音訊。 The video capture module (1) and the sound capture module (2) are directly stored in the database (4), and then edited by the processor (3) to digitize various features and sound sources, and The scoring is set so that the presenter can set the sample speaker and sample audio through the invention.

上述中,所述處理器(3)及該資料庫(4),更能與網際網路連接,以供遠端操作使用;所述資料庫(4),更能配合網際網路連接位於遠端的攝影裝置(11)與聲音擷取模組(2),將遠端處通過前述攝影裝置(11)與聲音擷取模組(2),而所擷取到影像的特徵(12)與音源(21),回饋給處理器(3)進行交叉比對,並將評分顯示於位於遠端的顯示器(5)上。 In the above, the processor (3) and the database (4) are more connectable to the Internet for remote operation; the database (4) is more suitable for the Internet connection. The photographic device (11) and the sound capturing module (2) pass the photographic device (11) and the sound capturing module (2) at the distal end, and the features (12) of the image are captured. The sound source (21) is fed back to the processor (3) for cross-comparison and the score is displayed on the remotely located display (5).

其中,直接透過網際網路連接處理器(3)及資料庫(4),能達成遠距離教學使用的目的,也讓學生能透過網路進行遠端的演講練習。 Among them, directly connecting the processor (3) and the database (4) through the Internet can achieve the purpose of long-distance teaching, and also enables students to conduct remote speaking exercises through the Internet.

其次,透過網路連結,讓本發明更為方便應用,甚至能行動應用程式[Mobile application,簡稱Mobile App、或App]的形式,將本發明透過手機實施。 Secondly, the present invention is more convenient to apply through a network connection, and even the form of a mobile application (Mobile App, or App) can be implemented by using a mobile phone.

上述中,所述顯示器(5),不僅僅能顯示分數,更能同步將影像擷取模組(1)所錄製的影像於顯示器(5)播放,讓演講者能修正自身的儀態。 In the above, the display (5) not only can display the score, but also can synchronously play the image recorded by the image capturing module (1) on the display (5), so that the speaker can correct his own state.

其中,透過影像擷取模組(1),於錄製的同時將影像顯示於 顯示器(5)上,讓演講者能透過顯示器(5)的影像,即時修正自身儀態。 The image capturing module (1) is used to display the image at the same time as the recording On the display (5), the speaker can instantly correct his or her own state through the image of the display (5).

以上依據圖式所示的實施例詳細說明本發明的構造、特徵及作用效果;惟以上所述僅為本發明之較佳實施例,並非用以限定實施範圍,因此舉凡與本發明意旨相符的修飾性變化,只要在均等範圍內都應涵屬於本發明專利範疇。 The embodiments, features, and effects of the present invention are described in detail above with reference to the embodiments shown in the drawings. The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Modification changes shall fall within the scope of the invention as long as they are within the same scope.

1‧‧‧影像擷取模組 1‧‧‧Image capture module

11‧‧‧攝影裝置 11‧‧‧Photographing device

12‧‧‧特徵 12‧‧‧Characteristics

2‧‧‧聲音擷取模組 2‧‧‧Sound capture module

21‧‧‧音源 21‧‧‧ source

3‧‧‧處理器 3‧‧‧ Processor

4‧‧‧資料庫 4‧‧‧Database

5‧‧‧顯示器 5‧‧‧ display

Claims (12)

一種演講技巧之影音自動評分與培訓系統,其特徵在於包括:一影像擷取模組(1)、一聲音擷取模組(2)、一處理器(3)、一資料庫(4)以及顯示器(5);所述影像擷取模組(1),包含一攝影裝置(11),其用於偵測一特定區域內的目標物影像,並將所擷取到目標物影像的特徵(12),傳送至處理器(3)進行分析;所述聲音擷取模組(2),其用於偵測一特定區域內的音源(21),並且擷取該音源(21)的頻譜大小與演講段落,並將其傳送至處理器(3)進行分析;所述處理器(3),與影像擷取模組(1)、聲音擷取模組(2)、資料庫(4)及顯示器(5)連結,具有邏輯運算、整合、輸入及輸出等功能,為提供目標物影像、音色進行匹配比對後,並根據資料庫(4)所提供一預設參數,給予所述評分數值,並將該評分數值建立於資料庫(4),再將評分顯示於一顯示器(5)上;所述資料庫(4),提供存與讀取,包含有一樣本音訊、一演講者音訊、一樣本演講者影像與一演講者影像,該資料庫(4)能供演講者預先輸人一樣本音訊及一樣本演講者影像的預設參數,配合儲存該聲音擷取模組及影像擷取模組,所接收到的演講者音訊及影像,並將該演講者音訊及影像回饋給處理器進行交叉比對;藉此,透過本創作能用於評估演講者台上表現是否穩健,進而發展成一訓練課程的指導原則。 An audio and video automatic scoring and training system for speech skills, comprising: an image capturing module (1), a sound capturing module (2), a processor (3), a database (4), and a display (5); the image capture module (1) includes a photographing device (11) for detecting an image of a target in a specific area and extracting features of the target image ( 12), sent to the processor (3) for analysis; the sound capture module (2) is configured to detect a sound source (21) in a specific area, and capture the spectrum size of the sound source (21) And the speech paragraph, and transmitted to the processor (3) for analysis; the processor (3), and the image capture module (1), the sound capture module (2), the database (4) and The display (5) is connected, and has functions of logic operation, integration, input and output, and is matched to provide target image and tone, and is given a predetermined parameter according to a preset parameter provided by the database (4). And the score value is established in the database (4), and the score is displayed on a display (5); the database (4) provides storage and reading, including There is a sample audio, a speaker audio, the same speaker image and a speaker image. The database (4) can be used by the speaker to pre-record the same audio and the preset parameters of the present speaker image. The sound capture module and the image capture module receive the speaker audio and image, and feed the speaker audio and image back to the processor for cross comparison; thereby, the creation can be used for evaluation Whether the speaker's performance on the stage is stable, and then developed into a guiding principle of the training course. 如請求項1所述的演講技巧之影音自動評分與培訓系統,其中:所述目標物影像的特徵(12)是為下列之一,臉部特徵、軀幹特徵、手勢特徵。 The audio-visual automatic scoring and training system of the presentation technique according to claim 1, wherein: the feature (12) of the target image is one of the following, a facial feature, a torso feature, and a gesture feature. 如請求項2所述的演講技巧之影音自動評分與培訓系統,其中:所述特徵(12),是為臉部特徵時,該處理器(3)判斷臉部特徵的方法,是為下列 之一或其組合:表情是否僵硬不自然、是否面帶微笑、眼神是否閃爍不定、眼神是否逃避觀眾、眼神是否照顧到全部的觀眾。 An audio-visual automatic scoring and training system for a presentation skill according to claim 2, wherein: the feature (12) is a method for determining a facial feature by the processor (3) when the facial feature is One or a combination: whether the expression is stiff and unnatural, whether it is smiling, whether the eyes are flickering, whether the eyes are escaping from the audience, and whether the eyes take care of all the audience. 如請求項2所述的演講技巧之影音自動評分與培訓系統,其中:所述特徵(12),是為軀幹特徵時,該處理器(3)判斷軀幹特徵的方法,是為下列之一或其組合:站姿是否挺拔、是否挺胸另與肩膀呈一直線、上半身是否放鬆但直挺、下半身是否站姿穩定平穩、是否隨意晃動、穿著是否正式得體。 An audio-visual automatic scoring and training system for a presentation skill according to claim 2, wherein: the feature (12) is a method for determining a torso feature when the processor (3) is a torso feature, and is one of the following or The combination: whether the standing posture is straight, whether the chest is in line with the shoulder, whether the upper body is relaxed but straight, whether the lower body is stable and stable, whether it is swaying at random, and whether the dress is officially decent. 如請求項2所述的演講技巧之影音自動評分與培訓系統,其中:所述特徵(12),是為手勢特徵時,該處理器(3)判斷手勢特徵的方法,是為下列之一或其組合:手勢是否流暢、手是否與臉部特徵接觸、手是否與軀幹特徵接觸。 The audio-visual automatic scoring and training system of the presentation skill described in claim 2, wherein: the feature (12) is a method for determining a gesture feature when the gesture is a gesture feature, or is one of the following or The combination: whether the gesture is smooth, whether the hand is in contact with the facial features, and whether the hand is in contact with the torso features. 如請求項1所述的演講技巧之影音自動評分與培訓系統,其中:所述音源(21)是由下列特徵組成,聲調特徵、音色特徵、語文字義特徵。 The audio-visual automatic scoring and training system of the presentation technique according to claim 1, wherein the sound source (21) is composed of the following features, a tonal feature, a timbre feature, and a linguistic feature. 如請求項6所述的演講技巧之影音自動評分與培訓系統,其中:所述處理器(3),其對該音源(21)進行判斷時,判斷的特徵細分為下列幾種方式:聲調是否平穩、咬字是否清晰、音量是否充足、聲調速度是否保持在110wpm到130wpm之間。 The audio-visual automatic scoring and training system of the speaking skill according to claim 6, wherein: the processor (3), when determining the sound source (21), the determined features are subdivided into the following ways: whether the tone is Whether it is smooth, whether the bite is clear, the volume is sufficient, and whether the tone speed is maintained between 110wpm and 130wpm. 如請求項1所述的演講技巧之影音自動評分與培訓系統,其中:所述資料庫(4),更能存放大量演講錄像,再由評分者設定為五級評分,配合本發明生成一組評分單元,讓評分者能快速對演講進行評分。 The audio-visual automatic scoring and training system of the speaking skill described in claim 1, wherein: the database (4) can store a large number of speech recordings, and then the rating person sets a five-level rating, and the invention generates a group. The scoring unit allows scorers to quickly rate the presentation. 如請求項1所述的演講技巧之影音自動評分與培訓系統,其中:所述資料庫(4),能記錄該目標物影像常出現的特徵(12),並將其輸入至資料庫(4),以作為下次評分的依據。 An audio-visual automatic scoring and training system for presentation skills as claimed in claim 1, wherein: the database (4) is capable of recording features (12) that are often present in the target image and inputting them into a database (4) ), as the basis for the next rating. 如請求項1所述的演講技巧之影音自動評分與培訓系統,其中:所述 資料庫(4),能記錄該目標物音源(21)的特點,並將其輸入至資料庫(4),以作為下次評分的依據。 An audio-visual automatic scoring and training system for presentation skills as claimed in claim 1, wherein: The database (4) can record the characteristics of the target sound source (21) and input it into the database (4) as the basis for the next score. 如請求項1所述的演講技巧之影音自動評分與培訓系統,其中:所述處理器(3)及該資料庫(4),更能與網際網路連接,以供遠端操作使用;所述資料庫(4),更能配合網際網路連接位於遠端的攝影裝置(11)與聲音擷取模組(2),將遠端處通過前述攝影裝置(11)與聲音擷取模組(2),而所擷取到影像的特徵(12)與音源(21),回饋給處理器(3)進行交叉比對,並將評分顯示於位於遠端的顯示器(5)上。 The audio-visual automatic scoring and training system for the presentation technique described in claim 1, wherein: the processor (3) and the database (4) are more connectable to the Internet for remote operation; The database (4) is further adapted to connect the remotely located camera device (11) and the sound capture module (2) with the Internet, and the remote end passes through the aforementioned camera device (11) and the sound capture module. (2), and the feature (12) captured by the image and the sound source (21) are fed back to the processor (3) for cross-comparison, and the score is displayed on the remotely located display (5). 如請求項1所述的演講技巧之影音自動評分與培訓系統,其中:所述顯示器(5),不僅僅能顯示分數,更能同步將影像擷取模組(1)所錄製的影像於顯示器(5)播放,讓演講者能修正自身的儀態。 The audio-visual automatic scoring and training system of the speaking skill described in claim 1, wherein: the display (5) can not only display the score, but also synchronously record the image recorded by the image capturing module (1) on the display. (5) Play, let the speaker can correct his own manner.
TW102143729A 2013-11-29 2013-11-29 Speech skills of audio and video automatic assessment and training system TWI528336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW102143729A TWI528336B (en) 2013-11-29 2013-11-29 Speech skills of audio and video automatic assessment and training system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW102143729A TWI528336B (en) 2013-11-29 2013-11-29 Speech skills of audio and video automatic assessment and training system

Publications (2)

Publication Number Publication Date
TW201520996A true TW201520996A (en) 2015-06-01
TWI528336B TWI528336B (en) 2016-04-01

Family

ID=53935096

Family Applications (1)

Application Number Title Priority Date Filing Date
TW102143729A TWI528336B (en) 2013-11-29 2013-11-29 Speech skills of audio and video automatic assessment and training system

Country Status (1)

Country Link
TW (1) TWI528336B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583363A (en) * 2018-11-27 2019-04-05 湖南视觉伟业智能科技有限公司 The method and system of speaker's appearance body movement are improved based on human body critical point detection
CN111612352A (en) * 2020-05-22 2020-09-01 北京易华录信息技术股份有限公司 Student expression ability assessment method and device
TWI784243B (en) * 2020-03-03 2022-11-21 國立臺灣師範大學 Method of taekwondo poomsae movement detection and comparison

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583363A (en) * 2018-11-27 2019-04-05 湖南视觉伟业智能科技有限公司 The method and system of speaker's appearance body movement are improved based on human body critical point detection
TWI784243B (en) * 2020-03-03 2022-11-21 國立臺灣師範大學 Method of taekwondo poomsae movement detection and comparison
CN111612352A (en) * 2020-05-22 2020-09-01 北京易华录信息技术股份有限公司 Student expression ability assessment method and device

Also Published As

Publication number Publication date
TWI528336B (en) 2016-04-01

Similar Documents

Publication Publication Date Title
CN107203953B (en) Teaching system based on internet, expression recognition and voice recognition and implementation method thereof
WO2021232775A1 (en) Video processing method and apparatus, and electronic device and storage medium
US10395545B2 (en) Analyzing speech delivery
WO2019024247A1 (en) Data exchange network-based online teaching evaluation system and method
CN107945625A (en) A kind of pronunciation of English test and evaluation system
US10026329B2 (en) Intralingual supertitling in language acquisition
JP2003228272A (en) Educational material learning system
CN109727167A (en) A kind of teaching auxiliary system
CN109889881B (en) Teacher classroom teaching data acquisition system
US20110082698A1 (en) Devices, Systems and Methods for Improving and Adjusting Communication
US20210287561A1 (en) Lecture support system, judgement apparatus, lecture support method, and program
Teófilo et al. Exploring virtual reality to enable deaf or hard of hearing accessibility in live theaters: A case study
CN109817244A (en) Oral evaluation method, apparatus, equipment and storage medium
TWI528336B (en) Speech skills of audio and video automatic assessment and training system
US10580434B2 (en) Information presentation apparatus, information presentation method, and non-transitory computer readable medium
CN109754653B (en) Method and system for personalized teaching
TW202008293A (en) System and method for monitoring qualities of teaching and learning
JP7066115B2 (en) Public speaking support device and program
WO2017143951A1 (en) Expression feedback method and smart robot
CN110853624A (en) Speech rehabilitation training system
JP7130290B2 (en) information extractor
CN206348971U (en) One kind speech training electronics
US10593366B2 (en) Substitution method and device for replacing a part of a video sequence
JP6285377B2 (en) Communication skill evaluation feedback device, communication skill evaluation feedback method, and communication skill evaluation feedback program
JP2022075662A (en) Information extraction apparatus

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees