WO2022198798A1

WO2022198798A1 - Intelligent children accompanying education robot

Info

Publication number: WO2022198798A1
Application number: PCT/CN2021/098302
Authority: WO
Inventors: 阳传红
Original assignee: 湖南中凯智创科技有限公司
Priority date: 2021-03-22
Filing date: 2021-06-04
Publication date: 2022-09-29
Also published as: CN112949554B; CN112949554A

Abstract

The present invention relates to the technical field of robots. Disclosed is an intelligent children accompanying education robot, so as to realize the training and interaction of a mouth shape during a pronunciation process of a child. When executing a corresponding computer program, a processor of the robot of the present invention implements the following steps: during a training process, synchronously collecting audio and video data, and synchronizing a slice of a facial image data stream on the basis of a slice of audio, thereby ensuring the accuracy of the slice of the facial image data stream; and in consideration of the fact that the beginning and the end of a Chinese character pronunciation process are respectively transition stages of pronunciation and stopping, selecting a middle-section image data stream showing the greatest language expression to perform a series of calculations of a mouth shape opening angle, and performing comparative analysis according to a mouth shape opening angle change trend between adjacent Chinese characters and a mouth shape opening angle change trend between adjacent sentences and on the basis of a sequentially continuous mouth shape opening angle data sequence, and a standard opening angle data sequence corresponding to training text. The effectiveness and reliability of a final determination result are ensured.

Description

Intelligent Children Accompanying Education Robot

technical field

The invention relates to the field of robotics technology, in particular to an intelligent child-accompanying educational robot.

Background technique

At present, with the continuous maturity of face recognition technology, voice and image recognition technology, video interaction and big data analysis technology, these technologies can be highly coupled with the main application scenarios of home robots to provide users with a good experience. At the same time, technological progress has made the production cost of robots continue to decrease, providing the possibility for scale.

In 2019, it was regarded as the first year of children's robots, and then children's robots became more well-known and began to develop explosively. The unit price of children's companion robots also ranges from a few hundred to tens of thousands of yuan. Children's education focuses on content and interaction. The main function of the voice dialogue of traditional children's toys is to tell stories, under the slogan of companionship, but actually has no actual function. The intelligent robot adds more humanized functions, fully interacts with children, conforms to children's behavior habits, and has functions such as voice dialogue, storytelling, reciting ancient poems, singing children's songs, and interacting, subverting traditional early education, and improving children's expression, logic and music. , art and other abilities, is the child's intimate companion and tutor.

At present, speech recognition and interaction technology is very mature; but in a wide range of children's hobbies such as speech and hosting, the mouth shape during the pronunciation process is also very important. Different Chinese characters often correspond to different mouth shapes; even the same Chinese character, due to many There are also differences in mouth shapes in phonetics, emotional colors, and tones in different usage scenarios. At present, robots lack the function of training and interacting with children's mouth shapes.

SUMMARY OF THE INVENTION

The purpose of the present invention is to disclose an intelligent children's companion education robot, so as to realize the training and interaction of the mouth shape in the process of children's pronunciation.

In order to achieve the above purpose, the present invention discloses an intelligent child-accompanying educational robot, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the following when executing the computer program: step:

retrieving the training text and displaying the training text to the user through the display screen, the training text includes at least two sentences with different overall mouth shapes, and each sentence includes at least two Chinese characters with different mouth shapes;

Collect user-synchronized audio data streams and face image data streams including mouth shapes;

The audio data stream is sliced according to the standard audio feature corresponding to the Chinese character and the punctuation distribution of the training text and each Chinese character, and obtains the time stamp information of each audio data slice corresponding to a single Chinese character one-to-one;

Slice the face image data stream according to the timestamp information of each audio data slice, and establish a mapping relationship between each image data slice and the corresponding Chinese character;

For each image data slice, screen the image data frames in the middle 1/3 time period, identify and extract the open mouth contour information from the screened image data frames, and determine according to the mouth contour information The coordinate positions of each feature point, the feature points at least include points A and B at the corners of the mouth on both sides of the inner side of the mouth shape and points C and D at the middle of the inner side of the upper and lower lips; and according to the coordinates of points A, B and points C and D The mouth shape opening angle of each image frame is calculated from the position; the average value of the mouth shape opening angle calculated in the same slice is taken as the final value of the mouth shape opening angle of the mapped Chinese characters;

In chronological order, the calculated final values of the mouth opening angles are formed into a mouth opening angle data sequence corresponding to the training text;

The actual mouth opening angle data sequence and the standard opening angle data sequence corresponding to the training text are compared and analyzed according to the changing trend of the opening angle of the mouth between adjacent Chinese characters and the changing trend of the opening angle of the mouth between adjacent sentences. , determine the single word and sentence whose mouth shape is to be corrected, and output and display the judgment result to the user through the display screen; wherein, the mouth shape opening angle of a single sentence as a whole takes the absolute change range of the adjacent opening angles of each Chinese character under its jurisdiction. The average or root mean square of the values, the mouth opening angle is any one of ∠CAD, ∠CBD, ∠ACB or ∠ADB in the rhombus structure composed of ABCD.

Preferably, in the process of comparative analysis based on the change trend of the mouth shape opening angle, the present invention first performs a comparative analysis on the change trend of the mouth shape opening angle between adjacent sentences to obtain the sentence whose mouth shape needs to be corrected, and then In the sentence whose mouth shape is to be corrected, the Chinese character whose mouth shape is to be corrected is obtained according to the changing trend of the mouth opening angle between adjacent Chinese characters; The changing trend of mouth opening angle between adjacent Chinese characters can be used to obtain the Chinese characters whose mouth shape needs to be corrected.

Preferably, the robot processor of the present invention also implements the following steps when executing the computer program:

Calculate the correlation between the actual mouth opening angle data sequence and the standard opening angle data sequence, and give the evaluation result corresponding to the entire training text according to the correlation calculation result. For example, the evaluation result is specifically performing rating and score calculation according to the statistical correlation value range and gradient.

Preferably, the training text is downloaded remotely via the network, and when downloading the training text, the standard opening angle data sequence information corresponding to the training text, the standard audio feature information corresponding to each Chinese character, and the single words and sentences are downloaded synchronously. The standard mouth shape point explanation video; for the intelligent children to accompany the educational robot to carry out the comparative analysis and processing of the local stand-alone; and the processor also implements the following steps when executing the computer program:

After judging the words and sentences whose mouth shape needs to be corrected, pre-load the standard lip shape point explanation video of the corresponding word and sentence into the memory to play the corresponding correction content in real time according to the user's corresponding selection instruction.

The present invention has the following beneficial effects:

In the training process, the audio and video data are collected synchronously, and the slices of the face image data stream are synchronized based on the audio slices, which ensures the accuracy of the slices of the face image data stream; at the same time, considering that Chinese characters are pronounced at the beginning and the end respectively during the pronunciation process In the transitional stage of integration and collection, the most expressive intermediate image data stream is selected for the series calculation of mouth opening angle, and based on the continuous mouth opening angle data sequence before and after and the standard opening angle data sequence corresponding to the training text according to the corresponding data sequence. The change trend of mouth opening angle between adjacent Chinese characters and the changing trend of mouth opening angle between adjacent sentences are compared and analyzed, which ensures the validity and reliability of the final judgment result.

The present invention will be described in further detail below with reference to the accompanying drawings.

Description of drawings

The accompanying drawings constituting a part of the present application are used to provide further understanding of the present invention, and the exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached image:

FIG. 1 is a schematic flowchart of steps implemented by the processor of an intelligent child-accompanying education robot in a preferred embodiment of the present invention when a corresponding computer program is executed.

Detailed ways

The embodiments of the present invention are described in detail below with reference to the accompanying drawings, but the present invention can be implemented in many different ways as defined and covered by the claims.

Example 1

The present embodiment discloses an intelligent educational robot for accompanying children, including a memory, a processor, and a computer program stored in the memory and running on the processor. As shown in Figure 1, the robot processor of this embodiment implements the following steps when executing the computer program:

Step S1: Retrieve training text and display the training text to the user through the display screen, where the training text includes at least two sentences with different overall mouth shapes, and each sentence includes at least two Chinese characters with different mouth shapes.

In this embodiment, the training text is carefully arranged content (that is, corresponding to the subsequent adjacent Chinese characters and mouth openings between adjacent sentences) based on experts in related fields such as acoustics and lip language representation, based on which the training effect can be obviously evaluated and followed up. Featured content with obvious changes in corners). Usually it can be downloaded from the cloud server based on the C/S architecture.

Preferably, the training text of this embodiment is downloaded remotely via the network, and when downloading the training text, the standard opening angle data sequence information corresponding to the training text, the standard audio feature information corresponding to each Chinese character, and the single character and An explanation video of the standard mouth shape points of the sentence; for the intelligent children to accompany the educational robot for local stand-alone based on a series of data processing such as comparative analysis in the subsequent steps. Wherein, preferably, the standard opening angle data sequence information corresponding to the training text can also be obtained after recording by real experts in related fields such as acoustics and lip language representation, and through background data calibration processing. As a variant, the standard opening angle data sequence in this step can also be calculated based on the method of converting audio information into mouth shape markers in the multimodal interaction.

Step S2: Collect the audio data stream synchronized by the user and the face image data stream including the mouth shape.

In this step, the audio data stream can be collected through a microphone, and the face image data stream can be collected through the video recording function of the camera module.

Step S3: Slice the audio data stream according to the distribution of Chinese characters and punctuation marks of the training text and the standard audio features corresponding to each Chinese character, and obtain timestamp information of each audio data slice corresponding to a single Chinese character one-to-one.

In this step, the content of Chinese characters and punctuation marks based on the training text is known, and the standard audio features corresponding to each Chinese character are also known; combined with the spectrum analysis and slicing technology in the process of speech-to-Chinese character conversion in existing speech recognition, the The timestamp information of the audio data slice corresponding to each Chinese character can be quickly obtained.

Step S4: Slice the face image data stream according to the timestamp information of each audio data slice, and establish a mapping relationship between each image data slice and corresponding Chinese characters.

In this step, since the previously collected audio data stream and the face image data stream are synchronized; therefore, the corresponding Chinese characters formed after slicing the face image data stream with the timestamp information corresponding to each audio slice The one-to-one mapping relationship is also precise.

Step S5, for each image data slice, screen the image data frames in the middle 1/3 time period, identify and extract the open mouth contour information from the screened out image data frames, according to the mouth shape. The contour information determines the coordinate position of each feature point, and the feature point includes at least points A and B at the corners of the mouth on both sides of the inner side of the mouth shape and points C and D at the middle of the inner side of the upper and lower lips; and according to the points A, B and C, The coordinate position of point D calculates the mouth opening angle of each image frame; the average value of the mouth opening angle calculated in the same slice is taken as the final value of the mouth opening angle of the mapped Chinese characters.

In this step, the image data frame at the middle 1/3 time in each image data slice is screened, which is equivalent to dividing the image data slice into three equal parts, and the middle segment image data with the most language expressiveness is selected from the head and tail. The flow performs a series of calculations for the mouth opening angle. Usually, a human face has 68 feature points, while the number of key feature points for mouth shape is only 20. Usually in the training process, points A and B are symmetrical with respect to the midpoint O of the mouth profile, and points C and D are also symmetrical with respect to the midpoint O of the mouth profile. In the present invention, ABCD is regarded as a rhombus structure; correspondingly, The mouth opening angle of the present invention can be defined as any one of ∠CAD, ∠CBD, ∠ACB or ∠ADB. Among them, extracting the mouth shape contour information from the face image is a technology well known to those skilled in the art, and will not be described repeatedly.

Step S6 , forming a mouth opening angle data sequence corresponding to the training text with the calculated final values of the mouth opening angles in chronological order.

Step S7, with the actual mouth shape opening angle data sequence and the standard opening angle data sequence corresponding to the described training text according to the mouth shape opening angle change trend between adjacent Chinese characters and the mouth shape opening angle change trend between adjacent sentences Carrying out comparative analysis, judging the words and sentences whose mouth shape needs to be corrected, and outputting and displaying the judgment result to the user through the display screen.

In this step, the mouth opening angle of a single sentence as a whole takes the average value or the root mean square of the absolute value of the variation amplitudes of the adjacent opening angles of each Chinese character under its jurisdiction. In this embodiment, the comparative analysis of the changing trend of mouth opening angle can specifically be as follows: a single Chinese character or sentence is used to represent the order of time sequence as the abscissa, and the mouth opening angle value corresponding to the whole Chinese character or sentence is the ordinate. Establish a two-dimensional coordinate system; compare the change trend curve of mouth opening angle (usually connected by several segments) and the standard mouth opening angle change trend curve actually sampled in the two-dimensional coordinate system, so as to obtain the word to be corrected or sentences.

Optionally, in this step, in the comparative analysis process of the change trend of the mouth shape opening angle between adjacent sentences, or in the comparative analysis process of the change trend of the mouth shape opening angle between adjacent Chinese characters in the sentence; Different thresholds are set respectively. When the actual change trend between adjacent Chinese characters or sentences is calculated by sampling and compared with the standard, when the change trend exceeds the set threshold of the limited deviation ratio range, it can be determined as the Chinese character to be corrected or sentence.

Preferably, in the process of comparative analysis based on the change trend of the mouth shape opening angle in this embodiment, a comparative analysis is first performed on the change trend of the mouth shape opening angle between adjacent sentences to obtain a sentence whose mouth shape needs to be corrected, and then In the sentence whose mouth shape needs to be corrected, the Chinese character whose mouth shape needs to be corrected is obtained according to the changing trend of the mouth shape opening angle between adjacent Chinese characters; The user responds only by clicking to generate a corresponding command, otherwise the subsequent steps are not processed), and the Chinese character whose mouth shape needs to be corrected is obtained according to the change trend of the mouth shape opening angle between adjacent Chinese characters. In this way, different needs of different users can be distinguished; the sentences to be corrected that are generally concerned by users and the key Chinese characters to be corrected in the sentences to be corrected can be quickly responded to and located, and the memory load and CPU resource consumption are also effectively reduced.

Step S8: Calculate the correlation between the actual mouth opening angle data sequence and the standard mouth opening angle data sequence, and give an evaluation result corresponding to the entire training text according to the correlation calculation result. For example, the evaluation result is specifically performing rating and score calculation according to the statistical correlation value range and gradient. Optionally, the calculation method of the specific correlation may adopt the Pearson correlation coefficient method. as well as

Step S9: After judging the word and sentence whose mouth shape is to be corrected, preload the standard lip shape point commentary video of the corresponding word and sentence into the memory to play the corresponding correction content in real time according to the user's corresponding selection instruction.

To sum up, the technical solutions disclosed in the embodiments of the present invention have at least the following beneficial effects:

The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

An intelligent child-accompanying educational robot, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the following steps when executing the computer program:

retrieving the training text and displaying the training text to the user through the display screen, the training text includes at least two sentences with different overall mouth shapes, and each sentence includes at least two Chinese characters with different mouth shapes;

Collect user-synchronized audio data streams and face image data streams including mouth shapes;

The audio data stream is sliced according to the distribution of Chinese characters and punctuation marks of the training text and the standard audio feature corresponding to each Chinese character, and the time stamp information of each audio data slice corresponding to a single Chinese character is obtained;

Slice the face image data stream according to the timestamp information of each audio data slice, and establish a mapping relationship between each image data slice and the corresponding Chinese character;

For each image data slice, screen the image data frames in the middle 1/3 time period, identify and extract the open mouth contour information from the screened image data frames, and determine according to the mouth contour information The coordinate positions of each feature point, the feature points at least include points A and B at the corners of the mouth on both sides of the inner side of the mouth shape and points C and D at the middle of the inner side of the upper and lower lips; and according to the coordinates of points A, B and points C and D The mouth shape opening angle of each image frame is calculated from the position; the average value of the mouth shape opening angle calculated in the same slice is taken as the final value of the mouth shape opening angle of the mapped Chinese characters;

In chronological order, the calculated final values of the mouth opening angles are formed into a mouth opening angle data sequence corresponding to the training text;

The actual mouth opening angle data sequence and the standard opening angle data sequence corresponding to the training text are compared and analyzed according to the changing trend of the opening angle of the mouth between adjacent Chinese characters and the changing trend of the opening angle of the mouth between adjacent sentences. , determine the single word and sentence whose mouth shape is to be corrected, and output and display the judgment result to the user through the display screen; wherein, the mouth shape opening angle of a single sentence as a whole takes the absolute change range of the adjacent opening angles of each Chinese character under its jurisdiction. The average value or root mean square of the values, the mouth opening angle is any one of ∠CAD, ∠CBD, ∠ACB or ∠ADB in the rhombus structure composed of ABCD.
The intelligent child-accompanying educational robot according to claim 1, wherein, in the process of carrying out comparative analysis based on the change trend of mouth opening angle, comparative analysis is first performed on the changing trend of mouth opening angle between adjacent sentences, Get the sentence with the mouth shape to be corrected, and then in the sentence with the mouth shape to be corrected, according to the change trend of the mouth shape opening angle between adjacent Chinese characters to get the Chinese character to be corrected; Request to determine whether the Chinese characters whose mouth shapes are to be corrected are obtained according to the changing trend of mouth shape opening angles between adjacent Chinese characters in the remaining sentences.
The intelligent child-accompanying educational robot according to claim 2, wherein the processor also implements the following steps when executing the computer program:

Calculate the correlation between the actual mouth opening angle data sequence and the standard opening angle data sequence, and give the evaluation result corresponding to the entire training text according to the correlation calculation result.
The intelligent child-accompanying educational robot according to claim 3, wherein the evaluation result is specifically to perform rating and score calculation according to the statistical correlation value range and gradient.
The intelligent child-accompanying educational robot according to any one of claims 1 to 4, wherein the training text is downloaded remotely via a network, and when the training text is downloaded, the standard sheet corresponding to the training text is downloaded synchronously Corner data sequence information, standard audio feature information corresponding to each Chinese character, and standard mouth shape key point explanation video for single words and sentences; for the intelligent child companion education robot to perform local stand-alone comparative analysis and processing; and the processor executes all When the computer program is described, the following steps are also implemented:

After judging the words and sentences whose mouth shape needs to be corrected, pre-load the standard lip shape point explanation video of the corresponding word and sentence into the memory to play the corresponding correction content in real time according to the user's corresponding selection instruction.