WO2022198798A1 - Intelligent children accompanying education robot - Google Patents

Intelligent children accompanying education robot Download PDF

Info

Publication number
WO2022198798A1
WO2022198798A1 PCT/CN2021/098302 CN2021098302W WO2022198798A1 WO 2022198798 A1 WO2022198798 A1 WO 2022198798A1 CN 2021098302 W CN2021098302 W CN 2021098302W WO 2022198798 A1 WO2022198798 A1 WO 2022198798A1
Authority
WO
WIPO (PCT)
Prior art keywords
mouth
opening angle
mouth shape
training text
image data
Prior art date
Application number
PCT/CN2021/098302
Other languages
French (fr)
Chinese (zh)
Inventor
阳传红
Original Assignee
湖南中凯智创科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 湖南中凯智创科技有限公司 filed Critical 湖南中凯智创科技有限公司
Publication of WO2022198798A1 publication Critical patent/WO2022198798A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the invention relates to the field of robotics technology, in particular to an intelligent child-accompanying educational robot.
  • the purpose of the present invention is to disclose an intelligent children's companion education robot, so as to realize the training and interaction of the mouth shape in the process of children's pronunciation.
  • the present invention discloses an intelligent child-accompanying educational robot, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the following when executing the computer program: step:
  • the training text includes at least two sentences with different overall mouth shapes, and each sentence includes at least two Chinese characters with different mouth shapes;
  • the audio data stream is sliced according to the standard audio feature corresponding to the Chinese character and the punctuation distribution of the training text and each Chinese character, and obtains the time stamp information of each audio data slice corresponding to a single Chinese character one-to-one;
  • each image data slice screen the image data frames in the middle 1/3 time period, identify and extract the open mouth contour information from the screened image data frames, and determine according to the mouth contour information
  • the coordinate positions of each feature point, the feature points at least include points A and B at the corners of the mouth on both sides of the inner side of the mouth shape and points C and D at the middle of the inner side of the upper and lower lips; and according to the coordinates of points A, B and points C and D
  • the mouth shape opening angle of each image frame is calculated from the position; the average value of the mouth shape opening angle calculated in the same slice is taken as the final value of the mouth shape opening angle of the mapped Chinese characters;
  • the actual mouth opening angle data sequence and the standard opening angle data sequence corresponding to the training text are compared and analyzed according to the changing trend of the opening angle of the mouth between adjacent Chinese characters and the changing trend of the opening angle of the mouth between adjacent sentences. , determine the single word and sentence whose mouth shape is to be corrected, and output and display the judgment result to the user through the display screen; wherein, the mouth shape opening angle of a single sentence as a whole takes the absolute change range of the adjacent opening angles of each Chinese character under its jurisdiction.
  • the average or root mean square of the values, the mouth opening angle is any one of ⁇ CAD, ⁇ CBD, ⁇ ACB or ⁇ ADB in the rhombus structure composed of ABCD.
  • the present invention first performs a comparative analysis on the change trend of the mouth shape opening angle between adjacent sentences to obtain the sentence whose mouth shape needs to be corrected, and then In the sentence whose mouth shape is to be corrected, the Chinese character whose mouth shape is to be corrected is obtained according to the changing trend of the mouth opening angle between adjacent Chinese characters; The changing trend of mouth opening angle between adjacent Chinese characters can be used to obtain the Chinese characters whose mouth shape needs to be corrected.
  • the robot processor of the present invention also implements the following steps when executing the computer program:
  • the evaluation result is specifically performing rating and score calculation according to the statistical correlation value range and gradient.
  • the training text is downloaded remotely via the network, and when downloading the training text, the standard opening angle data sequence information corresponding to the training text, the standard audio feature information corresponding to each Chinese character, and the single words and sentences are downloaded synchronously.
  • the standard mouth shape point explanation video; for the intelligent children to accompany the educational robot to carry out the comparative analysis and processing of the local stand-alone; and the processor also implements the following steps when executing the computer program:
  • the audio and video data are collected synchronously, and the slices of the face image data stream are synchronized based on the audio slices, which ensures the accuracy of the slices of the face image data stream; at the same time, considering that Chinese characters are pronounced at the beginning and the end respectively during the pronunciation process
  • the most expressive intermediate image data stream is selected for the series calculation of mouth opening angle, and based on the continuous mouth opening angle data sequence before and after and the standard opening angle data sequence corresponding to the training text according to the corresponding data sequence.
  • the change trend of mouth opening angle between adjacent Chinese characters and the changing trend of mouth opening angle between adjacent sentences are compared and analyzed, which ensures the validity and reliability of the final judgment result.
  • FIG. 1 is a schematic flowchart of steps implemented by the processor of an intelligent child-accompanying education robot in a preferred embodiment of the present invention when a corresponding computer program is executed.
  • the present embodiment discloses an intelligent educational robot for accompanying children, including a memory, a processor, and a computer program stored in the memory and running on the processor. As shown in Figure 1, the robot processor of this embodiment implements the following steps when executing the computer program:
  • Step S1 retrieve training text and display the training text to the user through the display screen, where the training text includes at least two sentences with different overall mouth shapes, and each sentence includes at least two Chinese characters with different mouth shapes.
  • the training text is carefully arranged content (that is, corresponding to the subsequent adjacent Chinese characters and mouth openings between adjacent sentences) based on experts in related fields such as acoustics and lip language representation, based on which the training effect can be obviously evaluated and followed up. Featured content with obvious changes in corners). Usually it can be downloaded from the cloud server based on the C/S architecture.
  • the training text of this embodiment is downloaded remotely via the network, and when downloading the training text, the standard opening angle data sequence information corresponding to the training text, the standard audio feature information corresponding to each Chinese character, and the single character and An explanation video of the standard mouth shape points of the sentence; for the intelligent children to accompany the educational robot for local stand-alone based on a series of data processing such as comparative analysis in the subsequent steps.
  • the standard opening angle data sequence information corresponding to the training text can also be obtained after recording by real experts in related fields such as acoustics and lip language representation, and through background data calibration processing.
  • the standard opening angle data sequence in this step can also be calculated based on the method of converting audio information into mouth shape markers in the multimodal interaction.
  • Step S2 Collect the audio data stream synchronized by the user and the face image data stream including the mouth shape.
  • the audio data stream can be collected through a microphone, and the face image data stream can be collected through the video recording function of the camera module.
  • Step S3 Slice the audio data stream according to the distribution of Chinese characters and punctuation marks of the training text and the standard audio features corresponding to each Chinese character, and obtain timestamp information of each audio data slice corresponding to a single Chinese character one-to-one.
  • the content of Chinese characters and punctuation marks based on the training text is known, and the standard audio features corresponding to each Chinese character are also known; combined with the spectrum analysis and slicing technology in the process of speech-to-Chinese character conversion in existing speech recognition, the The timestamp information of the audio data slice corresponding to each Chinese character can be quickly obtained.
  • Step S4 Slice the face image data stream according to the timestamp information of each audio data slice, and establish a mapping relationship between each image data slice and corresponding Chinese characters.
  • Step S5 for each image data slice, screen the image data frames in the middle 1/3 time period, identify and extract the open mouth contour information from the screened out image data frames, according to the mouth shape.
  • the contour information determines the coordinate position of each feature point, and the feature point includes at least points A and B at the corners of the mouth on both sides of the inner side of the mouth shape and points C and D at the middle of the inner side of the upper and lower lips; and according to the points A, B and C,
  • the coordinate position of point D calculates the mouth opening angle of each image frame; the average value of the mouth opening angle calculated in the same slice is taken as the final value of the mouth opening angle of the mapped Chinese characters.
  • the image data frame at the middle 1/3 time in each image data slice is screened, which is equivalent to dividing the image data slice into three equal parts, and the middle segment image data with the most language expressiveness is selected from the head and tail.
  • the flow performs a series of calculations for the mouth opening angle.
  • a human face has 68 feature points, while the number of key feature points for mouth shape is only 20.
  • points A and B are symmetrical with respect to the midpoint O of the mouth profile, and points C and D are also symmetrical with respect to the midpoint O of the mouth profile.
  • ABCD is regarded as a rhombus structure; correspondingly,
  • the mouth opening angle of the present invention can be defined as any one of ⁇ CAD, ⁇ CBD, ⁇ ACB or ⁇ ADB.
  • extracting the mouth shape contour information from the face image is a technology well known to those skilled in the art, and will not be described repeatedly.
  • Step S6 forming a mouth opening angle data sequence corresponding to the training text with the calculated final values of the mouth opening angles in chronological order.
  • Step S7 with the actual mouth shape opening angle data sequence and the standard opening angle data sequence corresponding to the described training text according to the mouth shape opening angle change trend between adjacent Chinese characters and the mouth shape opening angle change trend between adjacent sentences Carrying out comparative analysis, judging the words and sentences whose mouth shape needs to be corrected, and outputting and displaying the judgment result to the user through the display screen.
  • the mouth opening angle of a single sentence as a whole takes the average value or the root mean square of the absolute value of the variation amplitudes of the adjacent opening angles of each Chinese character under its jurisdiction.
  • the comparative analysis of the changing trend of mouth opening angle can specifically be as follows: a single Chinese character or sentence is used to represent the order of time sequence as the abscissa, and the mouth opening angle value corresponding to the whole Chinese character or sentence is the ordinate. Establish a two-dimensional coordinate system; compare the change trend curve of mouth opening angle (usually connected by several segments) and the standard mouth opening angle change trend curve actually sampled in the two-dimensional coordinate system, so as to obtain the word to be corrected or sentences.
  • this step in the comparative analysis process of the change trend of the mouth shape opening angle between adjacent sentences, or in the comparative analysis process of the change trend of the mouth shape opening angle between adjacent Chinese characters in the sentence; Different thresholds are set respectively.
  • the actual change trend between adjacent Chinese characters or sentences is calculated by sampling and compared with the standard, when the change trend exceeds the set threshold of the limited deviation ratio range, it can be determined as the Chinese character to be corrected or sentence.
  • a comparative analysis is first performed on the change trend of the mouth shape opening angle between adjacent sentences to obtain a sentence whose mouth shape needs to be corrected, and then In the sentence whose mouth shape needs to be corrected, the Chinese character whose mouth shape needs to be corrected is obtained according to the changing trend of the mouth shape opening angle between adjacent Chinese characters; The user responds only by clicking to generate a corresponding command, otherwise the subsequent steps are not processed), and the Chinese character whose mouth shape needs to be corrected is obtained according to the change trend of the mouth shape opening angle between adjacent Chinese characters.
  • the sentences to be corrected that are generally concerned by users and the key Chinese characters to be corrected in the sentences to be corrected can be quickly responded to and located, and the memory load and CPU resource consumption are also effectively reduced.
  • the robot processor of the present invention also implements the following steps when executing the computer program:
  • Step S8 Calculate the correlation between the actual mouth opening angle data sequence and the standard mouth opening angle data sequence, and give an evaluation result corresponding to the entire training text according to the correlation calculation result.
  • the evaluation result is specifically performing rating and score calculation according to the statistical correlation value range and gradient.
  • the calculation method of the specific correlation may adopt the Pearson correlation coefficient method. as well as
  • Step S9 After judging the word and sentence whose mouth shape is to be corrected, preload the standard lip shape point commentary video of the corresponding word and sentence into the memory to play the corresponding correction content in real time according to the user's corresponding selection instruction.
  • the audio and video data are collected synchronously, and the slices of the face image data stream are synchronized based on the audio slices, which ensures the accuracy of the slices of the face image data stream; at the same time, considering that Chinese characters are pronounced at the beginning and the end respectively during the pronunciation process
  • the most expressive intermediate image data stream is selected for the series calculation of mouth opening angle, and based on the continuous mouth opening angle data sequence before and after and the standard opening angle data sequence corresponding to the training text according to the corresponding data sequence.
  • the change trend of mouth opening angle between adjacent Chinese characters and the changing trend of mouth opening angle between adjacent sentences are compared and analyzed, which ensures the validity and reliability of the final judgment result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention relates to the technical field of robots. Disclosed is an intelligent children accompanying education robot, so as to realize the training and interaction of a mouth shape during a pronunciation process of a child. When executing a corresponding computer program, a processor of the robot of the present invention implements the following steps: during a training process, synchronously collecting audio and video data, and synchronizing a slice of a facial image data stream on the basis of a slice of audio, thereby ensuring the accuracy of the slice of the facial image data stream; and in consideration of the fact that the beginning and the end of a Chinese character pronunciation process are respectively transition stages of pronunciation and stopping, selecting a middle-section image data stream showing the greatest language expression to perform a series of calculations of a mouth shape opening angle, and performing comparative analysis according to a mouth shape opening angle change trend between adjacent Chinese characters and a mouth shape opening angle change trend between adjacent sentences and on the basis of a sequentially continuous mouth shape opening angle data sequence, and a standard opening angle data sequence corresponding to training text. The effectiveness and reliability of a final determination result are ensured.

Description

智能儿童陪伴教育机器人Intelligent Children Accompanying Education Robot 技术领域technical field
本发明涉及机器人技术领域,尤其涉及一种智能儿童陪伴教育机器人。The invention relates to the field of robotics technology, in particular to an intelligent child-accompanying educational robot.
背景技术Background technique
当前,随着人脸识别技术、语音及图像识别技术、视频交互及大数据分析技术的不断成熟,这些技术与家庭机器人主要应用场景可以做到高度耦合,为用户提供良好的使用体验。同时,技术的进步使得机器人的生产成本不断降低,为规模化提供了可能性。At present, with the continuous maturity of face recognition technology, voice and image recognition technology, video interaction and big data analysis technology, these technologies can be highly coupled with the main application scenarios of home robots to provide users with a good experience. At the same time, technological progress has made the production cost of robots continue to decrease, providing the possibility for scale.
2019年,被视为儿童机器人元年,随后儿童机器人被更多人熟知,并开始井喷式发展。儿童陪伴机器人单价也从几百至几万元人民币不等。儿童教育重在内容和交互方式。传统儿童玩具的语音对话主要功能在讲故事,打着陪伴的口号,实际上没有实际的功能。而智能机器人添加了更多人性化功能,与孩子充分互动,符合孩子的行为习惯,具备语音对话、讲故事、背古诗、唱儿歌、互动等功能,颠覆传统早教,提升孩子表达、逻辑、音乐、艺术等多方面的能力,是孩子的贴心小伙伴和家庭教师。In 2019, it was regarded as the first year of children's robots, and then children's robots became more well-known and began to develop explosively. The unit price of children's companion robots also ranges from a few hundred to tens of thousands of yuan. Children's education focuses on content and interaction. The main function of the voice dialogue of traditional children's toys is to tell stories, under the slogan of companionship, but actually has no actual function. The intelligent robot adds more humanized functions, fully interacts with children, conforms to children's behavior habits, and has functions such as voice dialogue, storytelling, reciting ancient poems, singing children's songs, and interacting, subverting traditional early education, and improving children's expression, logic and music. , art and other abilities, is the child's intimate companion and tutor.
目前,语音识别与交互技术已非常成熟;但在演说、主持等广泛的儿童兴趣爱好中,发音过程中的嘴型也非常重要,不同的汉字往往对应不同的嘴型;即使同一汉字,由于多音字、不同使用场景中的感情色彩、声调等变化也存在嘴型上的差异,目前的机器人中还缺乏对儿童发音的嘴型进行训练和交互的功能。At present, speech recognition and interaction technology is very mature; but in a wide range of children's hobbies such as speech and hosting, the mouth shape during the pronunciation process is also very important. Different Chinese characters often correspond to different mouth shapes; even the same Chinese character, due to many There are also differences in mouth shapes in phonetics, emotional colors, and tones in different usage scenarios. At present, robots lack the function of training and interacting with children's mouth shapes.
发明内容SUMMARY OF THE INVENTION
本发明目的在于公开一种智能儿童陪伴教育机器人,以实现儿童发音过程中的嘴型进行训练和交互。The purpose of the present invention is to disclose an intelligent children's companion education robot, so as to realize the training and interaction of the mouth shape in the process of children's pronunciation.
为达上述目的,本发明公开一种智能儿童陪伴教育机器人,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现下述步骤:In order to achieve the above purpose, the present invention discloses an intelligent child-accompanying educational robot, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the following when executing the computer program: step:
调取训练文本并经显示屏向用户显示所述训练文本,所述训练文本包括至少两个嘴型整体变化幅度不同的句子,且各句子中包括至少两个嘴型不同的汉字;retrieving the training text and displaying the training text to the user through the display screen, the training text includes at least two sentences with different overall mouth shapes, and each sentence includes at least two Chinese characters with different mouth shapes;
采集用户同步的音频数据流和包括嘴型的人脸图像数据流;Collect user-synchronized audio data streams and face image data streams including mouth shapes;
将所述音频数据流根据所述训练文本的汉字和标点符号分布情况及各汉字所对应的标准 音频特征进行切片,获取与单个汉字一一对应的各音频数据切片的时间戳信息;The audio data stream is sliced according to the standard audio feature corresponding to the Chinese character and the punctuation distribution of the training text and each Chinese character, and obtains the time stamp information of each audio data slice corresponding to a single Chinese character one-to-one;
根据各所述音频数据切片的时间戳信息对所述人脸图像数据流进行切片,建立各图像数据切片与对应汉字之间的映射关系;Slice the face image data stream according to the timestamp information of each audio data slice, and establish a mapping relationship between each image data slice and the corresponding Chinese character;
对每个图像数据切片,筛选处于中间1/3时间段的图像数据帧,从所筛选出的各图像数据帧中识别并提取出张开的嘴型轮廓信息,根据所述嘴型轮廓信息确定各特征点的坐标位置,所述特征点至少包括嘴型内侧两边嘴角处的A、B点以及上下嘴唇内侧中间处的C、D点;并根据所述A、B点及C、D点坐标位置计算各图像帧的嘴型张角;取同一切片中所计算嘴型张角的平均值为相映射汉字的嘴型张角最终值;For each image data slice, screen the image data frames in the middle 1/3 time period, identify and extract the open mouth contour information from the screened image data frames, and determine according to the mouth contour information The coordinate positions of each feature point, the feature points at least include points A and B at the corners of the mouth on both sides of the inner side of the mouth shape and points C and D at the middle of the inner side of the upper and lower lips; and according to the coordinates of points A, B and points C and D The mouth shape opening angle of each image frame is calculated from the position; the average value of the mouth shape opening angle calculated in the same slice is taken as the final value of the mouth shape opening angle of the mapped Chinese characters;
按时间先后顺序,将计算得出的各所述嘴型张角最终值形成对应所述训练文本的嘴型张角数据序列;In chronological order, the calculated final values of the mouth opening angles are formed into a mouth opening angle data sequence corresponding to the training text;
将实际的嘴型张角数据序列与所述训练文本对应的标准张角数据序列根据相邻汉字之间的嘴型张角变化趋势和相邻句子之间的嘴型张角变化趋势进行对比分析,判断出嘴型待纠正的单字和句子,并将判断结果经所述显示屏向用户输出并显示;其中,单个句子整体的嘴型张角取所管辖的各汉字相邻张角变化幅度绝对值的平均值或均方根,嘴型张角为ABCD组成的菱形结构中∠CAD、∠CBD、∠ACB或∠ADB中的任意一个。The actual mouth opening angle data sequence and the standard opening angle data sequence corresponding to the training text are compared and analyzed according to the changing trend of the opening angle of the mouth between adjacent Chinese characters and the changing trend of the opening angle of the mouth between adjacent sentences. , determine the single word and sentence whose mouth shape is to be corrected, and output and display the judgment result to the user through the display screen; wherein, the mouth shape opening angle of a single sentence as a whole takes the absolute change range of the adjacent opening angles of each Chinese character under its jurisdiction. The average or root mean square of the values, the mouth opening angle is any one of ∠CAD, ∠CBD, ∠ACB or ∠ADB in the rhombus structure composed of ABCD.
优选地,本发明在基于嘴型张角变化趋势进行对比分析的过程中,先在相邻句子之间的嘴型张角变化趋势进行对比分析,得出嘴型待纠正的句子,然后再在嘴型待纠正的句子中,根据相邻汉字之间的嘴型张角变化趋势得出嘴型待纠正的汉字;最后根据是否接受到用户的相应请求以确定是否在剩余的句子中,根据相邻汉字之间的嘴型张角变化趋势得出嘴型待纠正的汉字。Preferably, in the process of comparative analysis based on the change trend of the mouth shape opening angle, the present invention first performs a comparative analysis on the change trend of the mouth shape opening angle between adjacent sentences to obtain the sentence whose mouth shape needs to be corrected, and then In the sentence whose mouth shape is to be corrected, the Chinese character whose mouth shape is to be corrected is obtained according to the changing trend of the mouth opening angle between adjacent Chinese characters; The changing trend of mouth opening angle between adjacent Chinese characters can be used to obtain the Chinese characters whose mouth shape needs to be corrected.
优选地,本发明机器人处理器执行所述计算机程序时还实现下述步骤:Preferably, the robot processor of the present invention also implements the following steps when executing the computer program:
计算实际的嘴型张角数据序列与标准张角数据序列之间的相关性,根据相关性计算结果给出对应整个训练文本的测评结果。例如:所述评测结果具体为根据统计的相关性取值范围和梯度进行评级和分值计算。Calculate the correlation between the actual mouth opening angle data sequence and the standard opening angle data sequence, and give the evaluation result corresponding to the entire training text according to the correlation calculation result. For example, the evaluation result is specifically performing rating and score calculation according to the statistical correlation value range and gradient.
优选地,所述训练文本经网络远程下载,并在下载所述训练文本时,同步下载所述训练文本所对应的标准张角数据序列信息、各汉字所对应的标准音频特征信息以及单字和句子的标准嘴型要点解说视频;以供所述智能儿童陪伴教育机器人进行本地单机的对比分析处理;且所述处理器执行所述计算机程序时还实现下述步骤:Preferably, the training text is downloaded remotely via the network, and when downloading the training text, the standard opening angle data sequence information corresponding to the training text, the standard audio feature information corresponding to each Chinese character, and the single words and sentences are downloaded synchronously. The standard mouth shape point explanation video; for the intelligent children to accompany the educational robot to carry out the comparative analysis and processing of the local stand-alone; and the processor also implements the following steps when executing the computer program:
在判断出嘴型待纠正的单字和句子后,将相对应单字和句子的标准嘴型要点解说视频预加载至内存以根据用户相应的选择指令实时播放相对应的纠偏内容。After judging the words and sentences whose mouth shape needs to be corrected, pre-load the standard lip shape point explanation video of the corresponding word and sentence into the memory to play the corresponding correction content in real time according to the user's corresponding selection instruction.
本发明具有以下有益效果:The present invention has the following beneficial effects:
在训练过程中,同步采集音视频数据,并基于音频的切片来同步人脸图像数据流的切片,确保了人脸图像数据流切片的精准度;同时考虑到汉字在发音过程中首尾分别是发和收的过渡阶段,选取最具语言表现力的中间段图像数据流进行嘴型张角的系列计算,并基于前后连续的嘴型张角数据序列与训练文本对应的标准张角数据序列根据相邻汉字之间的嘴型张角变化趋势和相邻句子之间的嘴型张角变化趋势进行对比分析;确保了最终判断结果的有效性和可靠性。In the training process, the audio and video data are collected synchronously, and the slices of the face image data stream are synchronized based on the audio slices, which ensures the accuracy of the slices of the face image data stream; at the same time, considering that Chinese characters are pronounced at the beginning and the end respectively during the pronunciation process In the transitional stage of integration and collection, the most expressive intermediate image data stream is selected for the series calculation of mouth opening angle, and based on the continuous mouth opening angle data sequence before and after and the standard opening angle data sequence corresponding to the training text according to the corresponding data sequence. The change trend of mouth opening angle between adjacent Chinese characters and the changing trend of mouth opening angle between adjacent sentences are compared and analyzed, which ensures the validity and reliability of the final judgment result.
下面将参照附图,对本发明作进一步详细的说明。The present invention will be described in further detail below with reference to the accompanying drawings.
附图说明Description of drawings
构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings constituting a part of the present application are used to provide further understanding of the present invention, and the exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached image:
图1是本发明优选实施例的智能儿童陪伴教育机器人处理器执行相应计算机程序时实现的步骤流程示意图。FIG. 1 is a schematic flowchart of steps implemented by the processor of an intelligent child-accompanying education robot in a preferred embodiment of the present invention when a corresponding computer program is executed.
具体实施方式Detailed ways
以下结合附图对本发明的实施例进行详细说明,但是本发明可以由权利要求限定和覆盖的多种不同方式实施。The embodiments of the present invention are described in detail below with reference to the accompanying drawings, but the present invention can be implemented in many different ways as defined and covered by the claims.
实施例1Example 1
本实施例公开一种智能儿童陪伴教育机器人,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序。如图1所示,本实施例机器人处理器执行所述计算机程序时实现下述步骤:The present embodiment discloses an intelligent educational robot for accompanying children, including a memory, a processor, and a computer program stored in the memory and running on the processor. As shown in Figure 1, the robot processor of this embodiment implements the following steps when executing the computer program:
步骤S1、调取训练文本并经显示屏向用户显示所述训练文本,所述训练文本包括至少两个嘴型整体变化幅度不同的句子,且各句子中包括至少两个嘴型不同的汉字。Step S1: Retrieve training text and display the training text to the user through the display screen, where the training text includes at least two sentences with different overall mouth shapes, and each sentence includes at least two Chinese characters with different mouth shapes.
本实施例中,训练文本为经过声学和唇语表现学等相关领域专家基于能明显评测和跟进训练效果的精心编排的内容(即对应后续的相邻汉字及相邻句子之间嘴型张角变化趋势明显的精选内容)。通常可基于C/S架构从云端服务器下载。In this embodiment, the training text is carefully arranged content (that is, corresponding to the subsequent adjacent Chinese characters and mouth openings between adjacent sentences) based on experts in related fields such as acoustics and lip language representation, based on which the training effect can be obviously evaluated and followed up. Featured content with obvious changes in corners). Usually it can be downloaded from the cloud server based on the C/S architecture.
优选地,本实施例训练文本经网络远程下载,并在下载所述训练文本时,同步下载所述训练文本所对应的标准张角数据序列信息、各汉字所对应的标准音频特征信息以及单字和句子的标准嘴型要点解说视频;以供智能儿童陪伴教育机器人进行本地单机基于后续步骤中诸如对比分析等一系列数据处理。其中,优选地,训练文本所对应的标准张角数据序列信息也同样可通过声学和唇语表现学等相关领域专家真人录制并经过后台数据标定处理后得出。作 为一种变形,该步骤中的标准张角数据序列也可基于多模态交互中将音频信息转换为嘴型标记点的方式计算得出。Preferably, the training text of this embodiment is downloaded remotely via the network, and when downloading the training text, the standard opening angle data sequence information corresponding to the training text, the standard audio feature information corresponding to each Chinese character, and the single character and An explanation video of the standard mouth shape points of the sentence; for the intelligent children to accompany the educational robot for local stand-alone based on a series of data processing such as comparative analysis in the subsequent steps. Wherein, preferably, the standard opening angle data sequence information corresponding to the training text can also be obtained after recording by real experts in related fields such as acoustics and lip language representation, and through background data calibration processing. As a variant, the standard opening angle data sequence in this step can also be calculated based on the method of converting audio information into mouth shape markers in the multimodal interaction.
步骤S2、采集用户同步的音频数据流和包括嘴型的人脸图像数据流。Step S2: Collect the audio data stream synchronized by the user and the face image data stream including the mouth shape.
在该步骤中,音频数据流的可通过麦克风进行采集,人脸图像数据流可通过摄像模块的录像功能进行采集。In this step, the audio data stream can be collected through a microphone, and the face image data stream can be collected through the video recording function of the camera module.
步骤S3、将所述音频数据流根据所述训练文本的汉字和标点符号分布情况及各汉字所对应的标准音频特征进行切片,获取与单个汉字一一对应的各音频数据切片的时间戳信息。Step S3: Slice the audio data stream according to the distribution of Chinese characters and punctuation marks of the training text and the standard audio features corresponding to each Chinese character, and obtain timestamp information of each audio data slice corresponding to a single Chinese character one-to-one.
在该步骤中,基于训练文本的汉字和标点符号内容已知、且各汉字所对应的标准音频特征也已知;结合现有的语音识别中的语音转汉字过程中的频谱分析和切片技术可以快捷地得出各汉字所对应音频数据切片的时间戳信息。In this step, the content of Chinese characters and punctuation marks based on the training text is known, and the standard audio features corresponding to each Chinese character are also known; combined with the spectrum analysis and slicing technology in the process of speech-to-Chinese character conversion in existing speech recognition, the The timestamp information of the audio data slice corresponding to each Chinese character can be quickly obtained.
步骤S4、根据各所述音频数据切片的时间戳信息对所述人脸图像数据流进行切片,建立各图像数据切片与对应汉字之间的映射关系。Step S4: Slice the face image data stream according to the timestamp information of each audio data slice, and establish a mapping relationship between each image data slice and corresponding Chinese characters.
在该步骤中,由于前述采集的音频数据流和人脸图像数据流是同步的;因此,以各音频切片所对应的时间戳信息对人脸图像数据流进行切片之后所形成的与相应的汉字一一对应的映射关系也是精准地。In this step, since the previously collected audio data stream and the face image data stream are synchronized; therefore, the corresponding Chinese characters formed after slicing the face image data stream with the timestamp information corresponding to each audio slice The one-to-one mapping relationship is also precise.
步骤S5、对每个图像数据切片,筛选处于中间1/3时间段的图像数据帧,从所筛选出的各图像数据帧中识别并提取出张开的嘴型轮廓信息,根据所述嘴型轮廓信息确定各特征点的坐标位置,所述特征点至少包括嘴型内侧两边嘴角处的A、B点以及上下嘴唇内侧中间处的C、D点;并根据所述A、B点及C、D点坐标位置计算各图像帧的嘴型张角;取同一切片中所计算嘴型张角的平均值为相映射汉字的嘴型张角最终值。Step S5, for each image data slice, screen the image data frames in the middle 1/3 time period, identify and extract the open mouth contour information from the screened out image data frames, according to the mouth shape. The contour information determines the coordinate position of each feature point, and the feature point includes at least points A and B at the corners of the mouth on both sides of the inner side of the mouth shape and points C and D at the middle of the inner side of the upper and lower lips; and according to the points A, B and C, The coordinate position of point D calculates the mouth opening angle of each image frame; the average value of the mouth opening angle calculated in the same slice is taken as the final value of the mouth opening angle of the mapped Chinese characters.
在该步骤中,筛选各图像数据切片中处于中间1/3时间处的图像数据帧,即相当于将图片数据切片分成三等分,掐头去尾取最具语言表现力的中间段图像数据流进行嘴型张角的系列计算。通常,人脸有68个特征点,而嘴型关键特征点的数量仅为20个。通常在训练过程中,A与B点相对于嘴型轮廓中点O对称,C与D点也相对于嘴型轮廓中点O对称,本发明中,将ABCD视为菱形结构;相对应地,本发明嘴型张角可以定义为∠CAD、∠CBD、∠ACB或∠ADB中的任意一个。其中,从人脸图像中提取嘴型轮廓信息为本领域技术人员所熟知的技术,不做赘述。In this step, the image data frame at the middle 1/3 time in each image data slice is screened, which is equivalent to dividing the image data slice into three equal parts, and the middle segment image data with the most language expressiveness is selected from the head and tail. The flow performs a series of calculations for the mouth opening angle. Usually, a human face has 68 feature points, while the number of key feature points for mouth shape is only 20. Usually in the training process, points A and B are symmetrical with respect to the midpoint O of the mouth profile, and points C and D are also symmetrical with respect to the midpoint O of the mouth profile. In the present invention, ABCD is regarded as a rhombus structure; correspondingly, The mouth opening angle of the present invention can be defined as any one of ∠CAD, ∠CBD, ∠ACB or ∠ADB. Among them, extracting the mouth shape contour information from the face image is a technology well known to those skilled in the art, and will not be described repeatedly.
步骤S6、按时间先后顺序,将计算得出的各所述嘴型张角最终值形成对应所述训练文本的嘴型张角数据序列。Step S6 , forming a mouth opening angle data sequence corresponding to the training text with the calculated final values of the mouth opening angles in chronological order.
步骤S7、将实际的嘴型张角数据序列与所述训练文本对应的标准张角数据序列根据相邻汉字之间的嘴型张角变化趋势和相邻句子之间的嘴型张角变化趋势进行对比分析,判断出嘴 型待纠正的单字和句子,并将判断结果经所述显示屏向用户输出并显示。Step S7, with the actual mouth shape opening angle data sequence and the standard opening angle data sequence corresponding to the described training text according to the mouth shape opening angle change trend between adjacent Chinese characters and the mouth shape opening angle change trend between adjacent sentences Carrying out comparative analysis, judging the words and sentences whose mouth shape needs to be corrected, and outputting and displaying the judgment result to the user through the display screen.
在该步骤中,单个句子整体的嘴型张角取所管辖的各汉字相邻张角变化幅度绝对值的平均值或均方根。在本实施例中,嘴型张角变化趋势的对比分析具体可:以单个汉字或句子用于表征时间先后的排序为横坐标,以相对应汉字或句子整体的嘴型张角值为纵坐标建立二维坐标系;对比该二维坐标系中实际采样的嘴型张角变化趋势曲线(通常由数段折线连接而成)与标准嘴型张角变化趋势曲线,从而得出待纠正的单字或句子。In this step, the mouth opening angle of a single sentence as a whole takes the average value or the root mean square of the absolute value of the variation amplitudes of the adjacent opening angles of each Chinese character under its jurisdiction. In this embodiment, the comparative analysis of the changing trend of mouth opening angle can specifically be as follows: a single Chinese character or sentence is used to represent the order of time sequence as the abscissa, and the mouth opening angle value corresponding to the whole Chinese character or sentence is the ordinate. Establish a two-dimensional coordinate system; compare the change trend curve of mouth opening angle (usually connected by several segments) and the standard mouth opening angle change trend curve actually sampled in the two-dimensional coordinate system, so as to obtain the word to be corrected or sentences.
可选地,在该步骤中,在相邻句子之间的嘴型张角变化趋势的对比分析过程中,或句子中相邻汉字之间的嘴型张角变化趋势的对比分析过程中;可以分别设置不同的阈值,当采样计算得出相邻汉字或句子之间的实际变化趋势相比于标准的变化趋势超过设定的限定偏离比例范围的阈值时,即可判定为待纠正的汉字或句子。Optionally, in this step, in the comparative analysis process of the change trend of the mouth shape opening angle between adjacent sentences, or in the comparative analysis process of the change trend of the mouth shape opening angle between adjacent Chinese characters in the sentence; Different thresholds are set respectively. When the actual change trend between adjacent Chinese characters or sentences is calculated by sampling and compared with the standard, when the change trend exceeds the set threshold of the limited deviation ratio range, it can be determined as the Chinese character to be corrected or sentence.
优选地,本实施例在基于嘴型张角变化趋势进行对比分析的过程中,先在相邻句子之间的嘴型张角变化趋势进行对比分析,得出嘴型待纠正的句子,然后再在嘴型待纠正的句子中,根据相邻汉字之间的嘴型张角变化趋势得出嘴型待纠正的汉字;最后根据是否接受到用户的相应请求以确定是否在剩余的句子中(即用户通过点击产生相应的指令才响应,否则不进行后续步骤处理),根据相邻汉字之间的嘴型张角变化趋势得出嘴型待纠正的汉字。藉此,可以区分不同用户的不同需求;快速响应和定位用户普遍关注的待纠正的句子和待纠正的句子中待纠正的重点汉字,同时还有效降低了内存的负荷和CPU的资源消耗。Preferably, in the process of comparative analysis based on the change trend of the mouth shape opening angle in this embodiment, a comparative analysis is first performed on the change trend of the mouth shape opening angle between adjacent sentences to obtain a sentence whose mouth shape needs to be corrected, and then In the sentence whose mouth shape needs to be corrected, the Chinese character whose mouth shape needs to be corrected is obtained according to the changing trend of the mouth shape opening angle between adjacent Chinese characters; The user responds only by clicking to generate a corresponding command, otherwise the subsequent steps are not processed), and the Chinese character whose mouth shape needs to be corrected is obtained according to the change trend of the mouth shape opening angle between adjacent Chinese characters. In this way, different needs of different users can be distinguished; the sentences to be corrected that are generally concerned by users and the key Chinese characters to be corrected in the sentences to be corrected can be quickly responded to and located, and the memory load and CPU resource consumption are also effectively reduced.
优选地,本发明机器人处理器执行所述计算机程序时还实现下述步骤:Preferably, the robot processor of the present invention also implements the following steps when executing the computer program:
步骤S8、计算实际的嘴型张角数据序列与标准张角数据序列之间的相关性,根据相关性计算结果给出对应整个训练文本的测评结果。例如:所述评测结果具体为根据统计的相关性取值范围和梯度进行评级和分值计算。可选地,具体相关性的计算方法可以采用皮尔逊相关系数法。以及Step S8: Calculate the correlation between the actual mouth opening angle data sequence and the standard mouth opening angle data sequence, and give an evaluation result corresponding to the entire training text according to the correlation calculation result. For example, the evaluation result is specifically performing rating and score calculation according to the statistical correlation value range and gradient. Optionally, the calculation method of the specific correlation may adopt the Pearson correlation coefficient method. as well as
步骤S9、在判断出嘴型待纠正的单字和句子后,将相对应单字和句子的标准嘴型要点解说视频预加载至内存以根据用户相应的选择指令实时播放相对应的纠偏内容。Step S9: After judging the word and sentence whose mouth shape is to be corrected, preload the standard lip shape point commentary video of the corresponding word and sentence into the memory to play the corresponding correction content in real time according to the user's corresponding selection instruction.
综上,本发明实施例所公开的技术方案,至少具有以下有益效果:To sum up, the technical solutions disclosed in the embodiments of the present invention have at least the following beneficial effects:
在训练过程中,同步采集音视频数据,并基于音频的切片来同步人脸图像数据流的切片,确保了人脸图像数据流切片的精准度;同时考虑到汉字在发音过程中首尾分别是发和收的过渡阶段,选取最具语言表现力的中间段图像数据流进行嘴型张角的系列计算,并基于前后连续的嘴型张角数据序列与训练文本对应的标准张角数据序列根据相邻汉字之间的嘴型张角变化趋势和相邻句子之间的嘴型张角变化趋势进行对比分析;确保了最终判断结果的有效性和可靠性。In the training process, the audio and video data are collected synchronously, and the slices of the face image data stream are synchronized based on the audio slices, which ensures the accuracy of the slices of the face image data stream; at the same time, considering that Chinese characters are pronounced at the beginning and the end respectively during the pronunciation process In the transitional stage of integration and collection, the most expressive intermediate image data stream is selected for the series calculation of mouth opening angle, and based on the continuous mouth opening angle data sequence before and after and the standard opening angle data sequence corresponding to the training text according to the corresponding data sequence. The change trend of mouth opening angle between adjacent Chinese characters and the changing trend of mouth opening angle between adjacent sentences are compared and analyzed, which ensures the validity and reliability of the final judgment result.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims (5)

  1. 一种智能儿童陪伴教育机器人,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现下述步骤:An intelligent child-accompanying educational robot, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the following steps when executing the computer program:
    调取训练文本并经显示屏向用户显示所述训练文本,所述训练文本包括至少两个嘴型整体变化幅度不同的句子,且各句子中包括至少两个嘴型不同的汉字;retrieving the training text and displaying the training text to the user through the display screen, the training text includes at least two sentences with different overall mouth shapes, and each sentence includes at least two Chinese characters with different mouth shapes;
    采集用户同步的音频数据流和包括嘴型的人脸图像数据流;Collect user-synchronized audio data streams and face image data streams including mouth shapes;
    将所述音频数据流根据所述训练文本的汉字和标点符号分布情况及各汉字所对应的标准音频特征进行切片,获取与单个汉字一一对应的各音频数据切片的时间戳信息;The audio data stream is sliced according to the distribution of Chinese characters and punctuation marks of the training text and the standard audio feature corresponding to each Chinese character, and the time stamp information of each audio data slice corresponding to a single Chinese character is obtained;
    根据各所述音频数据切片的时间戳信息对所述人脸图像数据流进行切片,建立各图像数据切片与对应汉字之间的映射关系;Slice the face image data stream according to the timestamp information of each audio data slice, and establish a mapping relationship between each image data slice and the corresponding Chinese character;
    对每个图像数据切片,筛选处于中间1/3时间段的图像数据帧,从所筛选出的各图像数据帧中识别并提取出张开的嘴型轮廓信息,根据所述嘴型轮廓信息确定各特征点的坐标位置,所述特征点至少包括嘴型内侧两边嘴角处的A、B点以及上下嘴唇内侧中间处的C、D点;并根据所述A、B点及C、D点坐标位置计算各图像帧的嘴型张角;取同一切片中所计算嘴型张角的平均值为相映射汉字的嘴型张角最终值;For each image data slice, screen the image data frames in the middle 1/3 time period, identify and extract the open mouth contour information from the screened image data frames, and determine according to the mouth contour information The coordinate positions of each feature point, the feature points at least include points A and B at the corners of the mouth on both sides of the inner side of the mouth shape and points C and D at the middle of the inner side of the upper and lower lips; and according to the coordinates of points A, B and points C and D The mouth shape opening angle of each image frame is calculated from the position; the average value of the mouth shape opening angle calculated in the same slice is taken as the final value of the mouth shape opening angle of the mapped Chinese characters;
    按时间先后顺序,将计算得出的各所述嘴型张角最终值形成对应所述训练文本的嘴型张角数据序列;In chronological order, the calculated final values of the mouth opening angles are formed into a mouth opening angle data sequence corresponding to the training text;
    将实际的嘴型张角数据序列与所述训练文本对应的标准张角数据序列根据相邻汉字之间的嘴型张角变化趋势和相邻句子之间的嘴型张角变化趋势进行对比分析,判断出嘴型待纠正的单字和句子,并将判断结果经所述显示屏向用户输出并显示;其中,单个句子整体的嘴型张角取所管辖的各汉字相邻张角变化幅度绝对值的平均值或均方根,所述嘴型张角为ABCD组成的菱形结构中∠CAD、∠CBD、∠ACB或∠ADB中的任意一个。The actual mouth opening angle data sequence and the standard opening angle data sequence corresponding to the training text are compared and analyzed according to the changing trend of the opening angle of the mouth between adjacent Chinese characters and the changing trend of the opening angle of the mouth between adjacent sentences. , determine the single word and sentence whose mouth shape is to be corrected, and output and display the judgment result to the user through the display screen; wherein, the mouth shape opening angle of a single sentence as a whole takes the absolute change range of the adjacent opening angles of each Chinese character under its jurisdiction. The average value or root mean square of the values, the mouth opening angle is any one of ∠CAD, ∠CBD, ∠ACB or ∠ADB in the rhombus structure composed of ABCD.
  2. 根据权利要求1所述的智能儿童陪伴教育机器人,其特征在于,在基于嘴型张角变化趋势进行对比分析的过程中,先在相邻句子之间的嘴型张角变化趋势进行对比分析,得出嘴型待纠正的句子,然后再在嘴型待纠正的句子中,根据相邻汉字之间的嘴型张角变化趋势得出嘴型待纠正的汉字;最后根据是否接受到用户的相应请求以确定是否在剩余的句子中,根据相邻汉字之间的嘴型张角变化趋势得出嘴型待纠正的汉字。The intelligent child-accompanying educational robot according to claim 1, wherein, in the process of carrying out comparative analysis based on the change trend of mouth opening angle, comparative analysis is first performed on the changing trend of mouth opening angle between adjacent sentences, Get the sentence with the mouth shape to be corrected, and then in the sentence with the mouth shape to be corrected, according to the change trend of the mouth shape opening angle between adjacent Chinese characters to get the Chinese character to be corrected; Request to determine whether the Chinese characters whose mouth shapes are to be corrected are obtained according to the changing trend of mouth shape opening angles between adjacent Chinese characters in the remaining sentences.
  3. 根据权利要求2所述的智能儿童陪伴教育机器人,其特征在于,所述处理器执行所述计算机程序时还实现下述步骤:The intelligent child-accompanying educational robot according to claim 2, wherein the processor also implements the following steps when executing the computer program:
    计算实际的嘴型张角数据序列与标准张角数据序列之间的相关性,根据相关性计算结果给出对应整个训练文本的测评结果。Calculate the correlation between the actual mouth opening angle data sequence and the standard opening angle data sequence, and give the evaluation result corresponding to the entire training text according to the correlation calculation result.
  4. 根据权利要求3所述的所述的智能儿童陪伴教育机器人,其特征在于,所述评测结果 具体为根据统计的相关性取值范围和梯度进行评级和分值计算。The intelligent child-accompanying educational robot according to claim 3, wherein the evaluation result is specifically to perform rating and score calculation according to the statistical correlation value range and gradient.
  5. 根据权利要求1至4任一所述的智能儿童陪伴教育机器人,其特征在于,所述训练文本经网络远程下载,并在下载所述训练文本时,同步下载所述训练文本所对应的标准张角数据序列信息、各汉字所对应的标准音频特征信息以及单字和句子的标准嘴型要点解说视频;以供所述智能儿童陪伴教育机器人进行本地单机的对比分析处理;且所述处理器执行所述计算机程序时还实现下述步骤:The intelligent child-accompanying educational robot according to any one of claims 1 to 4, wherein the training text is downloaded remotely via a network, and when the training text is downloaded, the standard sheet corresponding to the training text is downloaded synchronously Corner data sequence information, standard audio feature information corresponding to each Chinese character, and standard mouth shape key point explanation video for single words and sentences; for the intelligent child companion education robot to perform local stand-alone comparative analysis and processing; and the processor executes all When the computer program is described, the following steps are also implemented:
    在判断出嘴型待纠正的单字和句子后,将相对应单字和句子的标准嘴型要点解说视频预加载至内存以根据用户相应的选择指令实时播放相对应的纠偏内容。After judging the words and sentences whose mouth shape needs to be corrected, pre-load the standard lip shape point explanation video of the corresponding word and sentence into the memory to play the corresponding correction content in real time according to the user's corresponding selection instruction.
PCT/CN2021/098302 2021-03-22 2021-06-04 Intelligent children accompanying education robot WO2022198798A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110304626.1A CN112949554B (en) 2021-03-22 2021-03-22 Intelligent children accompanying education robot
CN202110304626.1 2021-03-22

Publications (1)

Publication Number Publication Date
WO2022198798A1 true WO2022198798A1 (en) 2022-09-29

Family

ID=76227595

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/098302 WO2022198798A1 (en) 2021-03-22 2021-06-04 Intelligent children accompanying education robot

Country Status (2)

Country Link
CN (1) CN112949554B (en)
WO (1) WO2022198798A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359059A (en) * 2022-10-20 2022-11-18 一道新能源科技(衢州)有限公司 Solar cell performance testing method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9548048B1 (en) * 2015-06-19 2017-01-17 Amazon Technologies, Inc. On-the-fly speech learning and computer model generation using audio-visual synchronization
CN111429885A (en) * 2020-03-02 2020-07-17 北京理工大学 Method for mapping audio clip to human face-mouth type key point
CN112037788A (en) * 2020-09-10 2020-12-04 中航华东光电(上海)有限公司 Voice correction fusion technology

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940939A (en) * 2017-03-16 2017-07-11 牡丹江师范学院 Oral English Teaching servicing unit and its method
CN107424450A (en) * 2017-08-07 2017-12-01 英华达(南京)科技有限公司 Pronunciation correction system and method
CN108492641A (en) * 2018-03-26 2018-09-04 贵州西西沃教育科技股份有限公司 A kind of English phonetic learning system
CN109034037A (en) * 2018-07-19 2018-12-18 江苏黄金屋教育发展股份有限公司 On-line study method based on artificial intelligence
CN109389098B (en) * 2018-11-01 2020-04-28 重庆中科云从科技有限公司 Verification method and system based on lip language identification
CN111951629A (en) * 2019-05-16 2020-11-17 上海流利说信息技术有限公司 Pronunciation correction system, method, medium and computing device
CN111950327A (en) * 2019-05-16 2020-11-17 上海流利说信息技术有限公司 Mouth shape correcting method, mouth shape correcting device, mouth shape correcting medium and computing equipment
CN112001323A (en) * 2020-08-25 2020-11-27 成都威爱新经济技术研究院有限公司 Digital virtual human mouth shape driving method based on pinyin or English phonetic symbol reading method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9548048B1 (en) * 2015-06-19 2017-01-17 Amazon Technologies, Inc. On-the-fly speech learning and computer model generation using audio-visual synchronization
CN111429885A (en) * 2020-03-02 2020-07-17 北京理工大学 Method for mapping audio clip to human face-mouth type key point
CN112037788A (en) * 2020-09-10 2020-12-04 中航华东光电(上海)有限公司 Voice correction fusion technology

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359059A (en) * 2022-10-20 2022-11-18 一道新能源科技(衢州)有限公司 Solar cell performance testing method and system

Also Published As

Publication number Publication date
CN112949554B (en) 2022-02-08
CN112949554A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN106448288A (en) Interactive English learning system and method
CN111915707B (en) Mouth shape animation display method and device based on audio information and storage medium
CN107133481A (en) The estimation of multi-modal depression and sorting technique based on DCNN DNN and PV SVM
CN110717018A (en) Industrial equipment fault maintenance question-answering system based on knowledge graph
US9443193B2 (en) Systems and methods for generating automated evaluation models
CN109326162A (en) A kind of spoken language exercise method for automatically evaluating and device
CN108549628A (en) The punctuate device and method of streaming natural language information
WO2022198798A1 (en) Intelligent children accompanying education robot
CN107240394A (en) A kind of dynamic self-adapting speech analysis techniques for man-machine SET method and system
TWI294107B (en) A pronunciation-scored method for the application of voice and image in the e-learning
CN104347071A (en) Method and system for generating oral test reference answer
CN104505089B (en) Spoken error correction method and equipment
CN114936787A (en) Online student teaching intelligent analysis management cloud platform based on artificial intelligence
CN102339605A (en) Fundamental frequency extraction method and system based on prior surd and sonant knowledge
CN111785236A (en) Automatic composition method based on motivational extraction model and neural network
CN111681680B (en) Method, system, device and readable storage medium for acquiring audio frequency by video recognition object
CN115240710A (en) Neural network-based multi-scale fusion pronunciation evaluation model optimization method
CN114241835A (en) Student spoken language quality evaluation method and device
CN113593326A (en) English pronunciation teaching device and method
Ping English Speech Recognition Method Based on HMM Technology
He et al. Automatic generation algorithm analysis of dance movements based on music–action association
CN111128181A (en) Recitation question evaluation method, device and equipment
TWI743798B (en) Method and apparatus for chinese multiple speech recognition
TW201411577A (en) Voice processing method of point-to-read device
Xie The Application of Intelligent Speech Recognition Technology in Japanese Learning System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932414

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932414

Country of ref document: EP

Kind code of ref document: A1