CN112949554B

CN112949554B - Intelligent children accompanying education robot

Info

Publication number: CN112949554B
Application number: CN202110304626.1A
Authority: CN
Inventors: 阳传红
Original assignee: Hunan Zhongkai Zhichuang Technology Co ltd
Current assignee: Hunan Zhongkai Zhichuang Technology Co ltd
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2022-02-08
Anticipated expiration: 2041-03-22
Also published as: WO2022198798A1; CN112949554A

Abstract

The invention relates to the technical field of robots, and discloses an intelligent child accompanying education robot, which is used for training and interacting the mouth shape of a child in the pronunciation process. The processor of the robot of the invention realizes the following steps when executing the corresponding computer program: in the training process, audio and video data are synchronously acquired, and the slices of the face image data stream are synchronized based on the audio slices, so that the accuracy of the slices of the face image data stream is ensured; meanwhile, considering that the Chinese characters are respectively transmitted and received in the beginning and the end in the pronunciation process, selecting middle section image data stream with most language expression to carry out serial calculation of mouth opening angles, and carrying out comparative analysis according to the mouth opening angle change trend between adjacent Chinese characters and the mouth opening angle change trend between adjacent sentences based on the front and back continuous mouth opening angle data sequence and the standard opening angle data sequence corresponding to the training text; the validity and reliability of the final judgment result are ensured.

Description

Intelligent children accompanying education robot

Technical Field

The invention relates to the technical field of robots, in particular to an intelligent child accompanying education robot.

Background

Currently, with the continuous maturity of face recognition technology, voice and image recognition technology, video interaction and big data analysis technology, these technologies can achieve high coupling with the main application scene of the family robot, and provide good use experience for the user. Meanwhile, the production cost of the robot is continuously reduced due to the technical progress, and the possibility is provided for large-scale production.

In 2019, considered a primitive year for children, then children were more familiar with it and began blowout-type development. The unit price of the child accompanying robot is also different from hundreds to tens of thousands of RMB. Childhood education focuses on content and interaction patterns. The voice conversation of the traditional children toy mainly has the functions of telling stories and playing accompanying slogans, and has no actual function. And intelligent robot has added more humanized functions, and is abundant interactive with child, accords with child's behavioral habit, possesses functions such as voice conversation, story telling, carry the ancient poetry on the back, sing the children song, interdynamic, overturns traditional early education, promotes many-sided ability such as child's expression, logic, music, art, is child's attentive buddies and family teacher.

At present, the voice recognition and interaction technology is mature; however, in the wide interest and hobbies of children such as speech, host and the like, the mouth shape in the pronunciation process is very important, and different Chinese characters are corresponding to different mouth shapes; even if the same Chinese character is used, the mouth shapes of the robot are different due to the variation of polyphone characters, emotional colors, tones and the like in different use scenes, and the current robot is lack of the function of training and interacting the mouth shapes of the pronunciations of the children.

Disclosure of Invention

The invention aims to disclose an intelligent child accompanying education robot, which is used for training and interacting the mouth shape of a child in the pronunciation process.

In order to achieve the above object, the present invention discloses an intelligent robot for child accompanying education, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor executes the computer program to realize the following steps:

calling a training text and displaying the training text to a user through a display screen, wherein the training text comprises at least two sentences with different mouth shape overall change amplitudes, and each sentence comprises at least two Chinese characters with different mouth shapes;

collecting an audio data stream synchronized by a user and a face image data stream comprising a mouth shape;

slicing the audio data stream according to the distribution condition of the Chinese characters and the punctuation marks of the training text and the standard audio characteristics corresponding to the Chinese characters to obtain the time stamp information of each audio data slice corresponding to a single Chinese character one by one;

slicing the face image data stream according to the timestamp information of each audio data slice, and establishing a mapping relation between each image data slice and a corresponding Chinese character;

screening image data frames in the middle 1/3 time period for each image data slice, identifying and extracting open mouth contour information from each screened image data frame, and determining the coordinate position of each characteristic point according to the mouth contour information, wherein the characteristic points at least comprise A, B points at the mouth corners at the inner sides of the mouth and C, D points at the middle positions of the inner sides of the upper lip and the lower lip; calculating the mouth opening angle of each image frame according to the coordinate positions of the A, B point and the C, D point; taking the average value of the calculated mouth opening angles in the same slice as the final value of the mouth opening angles of the Chinese characters in the same mapping;

according to the time sequence, forming a mouth opening angle data sequence corresponding to the training text by the calculated final value of each mouth opening angle;

comparing and analyzing the actual mouth flare angle data sequence and the standard flare angle data sequence corresponding to the training text according to the mouth flare angle variation trend between adjacent Chinese characters and the mouth flare angle variation trend between adjacent sentences, judging the single character and sentence to be corrected of the mouth, and outputting and displaying the judgment result to the user through the display screen; the overall mouth flare angle of a single sentence is the average value or the root mean square of the absolute value of the change amplitude of the adjacent flare angles of each Chinese character governed, and the mouth flare angle is any one of & lt CAD, & lt CBD, & lt ACB or & lt ADB in a quadrilateral formed by taking ACBD as a vertex.

Preferably, in the process of carrying out comparative analysis based on the mouth opening angle variation trend, firstly carrying out comparative analysis on the mouth opening angle variation trend between adjacent sentences to obtain the sentence with the mouth shape to be corrected, and then obtaining the Chinese character with the mouth shape to be corrected according to the mouth opening angle variation trend between the adjacent Chinese characters in the sentence with the mouth shape to be corrected; and finally, determining whether the Chinese characters are in the rest sentences or not according to whether the corresponding request of the user is received or not, and obtaining the Chinese characters to be corrected in the mouth shape according to the mouth shape opening angle variation trend between the adjacent Chinese characters.

Preferably, the robot processor of the present invention, when executing the computer program, further performs the steps of:

and calculating the correlation between the actual mouth flare angle data sequence and the standard flare angle data sequence, and giving an evaluation result corresponding to the whole training text according to the correlation calculation result. For example: and the evaluation result is specifically grade and score calculation according to the statistical relevance value range and gradient.

Preferably, the training text is downloaded remotely through a network, and when the training text is downloaded, standard flare angle data sequence information corresponding to the training text, standard audio characteristic information corresponding to each Chinese character and a standard mouth type comment video of a single character and a sentence are synchronously downloaded; the intelligent child accompanying education robot carries out local stand-alone comparison analysis processing; and the processor, when executing the computer program, further implements the following steps:

after the single characters and sentences to be corrected of the mouth shape are judged, the standard mouth shape comment videos corresponding to the single characters and the sentences are preloaded into the memory so as to play the corresponding correction contents in real time according to the corresponding selection instructions of the user.

The invention has the following beneficial effects:

in the training process, audio and video data are synchronously acquired, and the slices of the face image data stream are synchronized based on the audio slices, so that the accuracy of the slices of the face image data stream is ensured; meanwhile, considering that the Chinese characters are respectively transmitted and received in the beginning and the end in the pronunciation process, selecting middle section image data stream with most language expression to carry out serial calculation of mouth opening angles, and carrying out comparative analysis according to the mouth opening angle change trend between adjacent Chinese characters and the mouth opening angle change trend between adjacent sentences based on the front and back continuous mouth opening angle data sequence and the standard opening angle data sequence corresponding to the training text; the validity and reliability of the final judgment result are ensured.

The present invention will be described in further detail below with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart illustrating steps implemented when a processor of an intelligent child accompanying education robot executes a corresponding computer program according to a preferred embodiment of the present invention.

Detailed Description

The embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims.

Example 1

The embodiment discloses an intelligent child accompanying education robot which comprises a memory, a processor and a computer program, wherein the computer program is stored on the memory and can run on the processor. As shown in fig. 1, the robot processor of the present embodiment implements the following steps when executing the computer program:

and step S1, calling a training text and displaying the training text to a user through a display screen, wherein the training text comprises at least two sentences with different mouth overall change amplitudes, and each sentence comprises at least two Chinese characters with different mouth shapes.

In this embodiment, the training text is elaborately arranged content (i.e., selected content corresponding to subsequent adjacent Chinese characters and adjacent sentences with an obvious mouth opening angle variation trend) based on which the training effect can be obviously evaluated and followed by experts in the relevant fields such as acoustics and lip language expressiveness. And can be downloaded from a cloud server based on a C/S architecture.

Preferably, the training text of this embodiment is downloaded remotely via a network, and when the training text is downloaded, the standard flare angle data sequence information corresponding to the training text, the standard audio feature information corresponding to each Chinese character, and the standard mouth type comment video of the single character and sentence are downloaded synchronously; the intelligent child accompanying education robot can perform a series of data processing such as comparison and analysis in subsequent steps based on a local single machine. Preferably, the standard field angle data sequence information corresponding to the training text can also be obtained by recording experts in related fields such as acoustics and lip language expressiveness and performing background data calibration processing. As a variant, the standard opening angle data sequence in this step can also be calculated based on the way the audio information is converted into mouth-shaped marker points in the multimodal interaction.

And step S2, acquiring an audio data stream synchronized by a user and a face image data stream comprising a mouth shape.

In the step, the audio data stream can be collected through a microphone, and the human face image data stream can be collected through the video recording function of the camera module.

And step S3, slicing the audio data stream according to the distribution condition of the Chinese characters and the punctuations of the training text and the standard audio characteristics corresponding to the Chinese characters, and acquiring the time stamp information of each audio data slice corresponding to a single Chinese character one by one.

In the step, the content of Chinese characters and punctuation marks based on the training text is known, and the standard audio features corresponding to the Chinese characters are also known; the time stamp information of the audio data slice corresponding to each Chinese character can be quickly obtained by combining the frequency spectrum analysis and the slicing technology in the process of converting the voice into the Chinese character in the existing voice recognition.

And step S4, slicing the face image data stream according to the time stamp information of each audio data slice, and establishing the mapping relation between each image data slice and the corresponding Chinese character.

In the step, the acquired audio data stream and the face image data stream are synchronous; therefore, the mapping relation which is formed after the face image data stream is sliced according to the time stamp information corresponding to each audio slice and corresponds to the corresponding Chinese characters one by one is also accurate.

S5, screening image data frames in the middle 1/3 time period for each image data slice, identifying and extracting open mouth contour information from each screened image data frame, and determining the coordinate position of each characteristic point according to the mouth contour information, wherein the characteristic points at least comprise A, B points at the mouth corners at the inner sides of the mouth and C, D points at the middle positions of the inner sides of the upper lip and the lower lip; calculating the mouth opening angle of each image frame according to the coordinate positions of the A, B point and the C, D point; and taking the average value of the mouth opening angles calculated in the same slice as the final value of the mouth opening angles of the Chinese characters in the same mapping.

In this step, the image data frames at the middle 1/3 time in each image data slice are filtered, that is, the image data slices are divided into three equal parts, and the image data streams of the middle segment with the most language expression are taken by pinching the head and the tail to perform the series of calculations of the mouth opening angle. In general, there are 68 feature points for a human face, and the number of key feature points for a mouth is only 20. In the training process, points a and B are generally symmetrical with respect to the center point O of the mouth-shaped contour, and points C and D are also symmetrical with respect to the center point O of the mouth-shaped contour, that is, in a quadrilateral formed by using ACBD as a vertex, the lengths of the side AC and the side BC are approximately equal, and the lengths of the side AD and the side BD are approximately equal; correspondingly, the nozzle type opening angle can be defined as any one of < CAD, < CBD, < ACB or < ADB. Extracting the mouth contour information from the face image is a technique well known to those skilled in the art and will not be described in detail.

And step S6, forming the calculated final values of the mouth opening angles into a mouth opening angle data sequence corresponding to the training text according to the time sequence.

And step S7, comparing and analyzing the actual mouth opening angle data sequence and the standard opening angle data sequence corresponding to the training text according to the mouth opening angle variation trend between adjacent Chinese characters and the mouth opening angle variation trend between adjacent sentences, judging the single character and sentence of which the mouth shape is to be corrected, and outputting and displaying the judgment result to the user through the display screen.

In the step, the mouth opening angle of the whole single sentence is the average value or the root mean square of the absolute value of the variation amplitude of the adjacent opening angles of each Chinese character administered. In this embodiment, the comparative analysis of the mouth opening angle variation trend specifically includes: the sequencing of single Chinese characters or sentences used for representing time sequence is used as an abscissa, and a two-dimensional coordinate system is established by using the mouth opening angle value of the whole corresponding Chinese character or sentence as an ordinate; comparing the actually sampled mouth opening angle variation trend curve (usually formed by connecting a plurality of broken lines) in the two-dimensional coordinate system with the standard mouth opening angle variation trend curve, thereby obtaining the single character or sentence to be corrected.

Optionally, in this step, during the comparative analysis of the variation trend of the mouth opening angle between adjacent sentences, or during the comparative analysis of the variation trend of the mouth opening angle between adjacent Chinese characters in a sentence; different threshold values can be set respectively, and when the actual variation trend between adjacent Chinese characters or sentences obtained by sampling calculation exceeds the threshold value of the set limited deviation proportion range compared with the standard variation trend, the Chinese characters or sentences to be corrected can be judged.

Preferably, in the process of performing comparative analysis based on the mouth opening angle variation trend, the mouth opening angle variation trend between adjacent sentences is firstly performed comparative analysis to obtain the sentence to be corrected of the mouth shape, and then the Chinese character to be corrected of the mouth shape is obtained in the sentence to be corrected of the mouth shape according to the mouth opening angle variation trend between the adjacent Chinese characters; and finally, determining whether the Chinese characters are in the rest sentences or not according to whether the corresponding requests of the user are received or not (namely, the user generates corresponding instructions by clicking to respond, otherwise, the subsequent steps are not carried out), and obtaining the Chinese characters to be corrected in the mouth shape according to the mouth shape opening angle variation trend between the adjacent Chinese characters. Therefore, different requirements of different users can be distinguished; the method has the advantages that the sentence to be corrected and key Chinese characters to be corrected in the sentence to be corrected, which are generally concerned by a user, are quickly responded and positioned, and simultaneously, the load of a memory and the resource consumption of a CPU are effectively reduced.

and step S8, calculating the correlation between the actual mouth opening angle data sequence and the standard opening angle data sequence, and giving the evaluation result corresponding to the whole training text according to the correlation calculation result. For example: and the evaluation result is specifically grade and score calculation according to the statistical relevance value range and gradient. Alternatively, the calculation method of the specific correlation may employ a pearson correlation coefficient method. And

step S9, after the single character and sentence to be corrected are judged, the standard mouth comment video corresponding to the single character and sentence is preloaded to the memory so as to play the corresponding correction content in real time according to the corresponding selection instruction of the user.

In summary, the technical solution disclosed in the embodiment of the present invention has at least the following beneficial effects:

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An intelligent child companion educational robot comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of:

comparing and analyzing the actual mouth flare angle data sequence and the standard flare angle data sequence corresponding to the training text according to the mouth flare angle variation trend between adjacent Chinese characters and the mouth flare angle variation trend between adjacent sentences, judging the single character and sentence to be corrected of the mouth, and outputting and displaying the judgment result to the user through the display screen; the method comprises the steps that the overall nozzle type flare angle of a single sentence is the average value or the root mean square of the absolute value of the change amplitude of the adjacent flare angles of each Chinese character administered, and the nozzle type flare angle is any one of & lt CAD, & lt CBD, & lt ACB or & lt ADB in a quadrangle formed by taking ACBD as a vertex;

wherein, the comparative analysis of the mouth opening angle variation trend specifically comprises the following steps: the sequencing of single Chinese characters or sentences used for representing time sequence is used as an abscissa, and a two-dimensional coordinate system is established by using the mouth opening angle value of the whole corresponding Chinese character or sentence as an ordinate; comparing the actually sampled mouth opening angle change trend curve with the standard mouth opening angle change trend curve in the two-dimensional coordinate system, thereby obtaining single characters or sentences to be corrected; and in the comparative analysis process of the mouth opening angle variation trend between adjacent sentences or between adjacent Chinese characters in the sentences; different threshold values are respectively set, and when the actual variation trend between adjacent Chinese characters or sentences obtained through sampling calculation exceeds the threshold value of the set limited deviation proportion range compared with the standard variation trend, the Chinese characters or sentences to be corrected are judged.

2. The intelligent robot for accompanying education of children as claimed in claim 1, wherein in the process of performing comparative analysis based on the variation trend of the mouth opening angle, firstly performing comparative analysis on the variation trend of the mouth opening angle between adjacent sentences to obtain sentences of which the mouth is to be corrected, and then obtaining Chinese characters of which the mouth is to be corrected according to the variation trend of the mouth opening angle between adjacent Chinese characters in the sentences of which the mouth is to be corrected; and finally, determining whether the Chinese characters are in the rest sentences or not according to whether the corresponding request of the user is received or not, and obtaining the Chinese characters to be corrected in the mouth shape according to the mouth shape opening angle variation trend between the adjacent Chinese characters.

3. The intelligent child companion educational robot of claim 2, wherein the processor, when executing the computer program, further performs the steps of:

and calculating the correlation between the actual mouth flare angle data sequence and the standard flare angle data sequence, and giving an evaluation result corresponding to the whole training text according to the correlation calculation result.

4. The intelligent child accompany education robot as claimed in claim 3, wherein the evaluation result is specifically a rating and score calculation based on statistical relevance value range and gradient.

5. The intelligent children accompany education robot as claimed in any one of claims 1 to 4, wherein the training text is downloaded remotely via a network, and when the training text is downloaded, standard flare data sequence information corresponding to the training text, standard audio feature information corresponding to each Chinese character, and a standard mouth type comment video of a single character and a sentence are downloaded synchronously; the intelligent child accompanying education robot carries out local stand-alone comparison analysis processing; and the processor, when executing the computer program, further implements the following steps: