CN111081089A

CN111081089A - Dictation control method and device based on facial feature information

Info

Publication number: CN111081089A
Application number: CN201910387691.8A
Authority: CN
Inventors: 崔颖
Original assignee: Shenzhen China Star Optoelectronics Technology Co Ltd
Current assignee: TCL China Star Optoelectronics Technology Co Ltd
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2020-04-28
Anticipated expiration: 2039-05-10
Also published as: CN111081089B

Abstract

The invention discloses a dictation control method and a dictation control device based on facial feature information, wherein the method comprises the following steps: outputting dictation contents corresponding to the dictation instruction by voice according to the detected dictation instruction; collecting facial feature information of a dictating person in the process of outputting dictation contents by voice; judging whether the collected facial feature information is matched with preset facial feature information, wherein the preset facial feature information is facial feature information when the dictating person does not confirm dictation content; and when the judgment result is yes, outputting target prompt information corresponding to the dictation content to prompt the dictation content to a dictating person. Therefore, by implementing the embodiment of the invention, whether the dictating person knows the specific dictation content can be intelligently judged according to the facial feature information collected when the dictation content is output by voice, if not, the dictating person can quickly and accurately identify the dictation content by outputting the prompt information corresponding to the dictation content, and the dictation effect and the dictation accuracy of the dictating person are improved.

Description

Dictation control method and device based on facial feature information

Technical Field

The invention relates to the technical field of intelligent terminal equipment, in particular to a dictation control method and device based on facial feature information.

Background

At present, based on the reason that at least one reader is required to read the dictation content, the traditional dictation mode can not meet the dictation requirements of the dictating person anywhere and anytime, and based on the traditional dictation mode, some dictation applications, dictation terminals, learning applications with the dictation function and the like appear on the market, so that the intelligent dictation mode is brought to the dictating person, and the dictating person can practice dictation anytime anywhere. Taking the dictation terminal as an example, when a dictating person needs to practice dictation, the dictating person only needs to open the dictation terminal and select the dictation content, and the dictation terminal automatically reads the dictation content, so that the dictating person finishes the dictation practice.

Practice shows that due to the existence of content with similar pronunciation but different meaning as dictation content, limited dictation capability of a dictating person, and other factors, the current intelligent dictation mode can cause the dictating person to not distinguish the actual dictation content, such as: the dictation contents reported and read by the dictation terminal are 'plants', and the dictating person considers the dictation contents as 'jobs', so that the dictation effect and the dictation accuracy are reduced.

Disclosure of Invention

The embodiment of the invention discloses a dictation control method and device based on facial feature information, which can improve the dictation effect and dictation accuracy of a dictating person.

The first aspect of the embodiment of the invention discloses a dictation control method based on facial feature information, which comprises the following steps:

according to the detected dictation instruction, outputting dictation contents corresponding to the dictation instruction in a voice mode;

collecting facial feature information of a dictating person in the process of outputting the dictation contents by voice;

judging whether the collected facial feature information is matched with preset facial feature information, wherein the preset facial feature information is facial feature information when the dictating person does not confirm the dictation content;

and when the collected facial feature information is judged to be matched with the preset facial feature information, outputting target prompt information corresponding to the dictation content so as to prompt the dictation content to the dictating person.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, after determining that the collected facial feature information matches the preset facial feature information, before outputting target prompt information corresponding to the dictation content to prompt the dictation content to the dictation user, the method further includes:

determining learning ability information of the dictating person, and screening at least one piece of prompt information matched with the learning ability information from all pieces of prompt information which are acquired in advance and correspond to the dictation content to serve as target prompt information corresponding to the dictation content;

wherein the prompt information matched with different learning ability information is different.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the dictation content includes at least one sub-dictation content;

after judging that the collected facial feature information is matched with the preset facial feature information, before outputting target prompt information corresponding to the dictation content to prompt the dictation content to the dictation person, the method further comprises the following steps:

determining target sub dictation contents matched with the identification time of the facial feature information from all sub dictation contents included in the dictation contents;

and outputting target prompt information corresponding to the dictation content to prompt the dictation content to the dictation person, comprising:

and outputting prompt information corresponding to the target sub-dictation content so as to prompt the target sub-dictation content to the dictating person.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the determining, from all sub-dictation contents included in the dictation contents, a target sub-dictation content that matches the identification time of the facial feature information includes:

determining sub dictation contents which are output in a voice mode when the facial feature information is recognized from all sub dictation contents included in the dictation contents as target sub dictation contents matched with the recognition time of the facial feature information; alternatively, the first and second electrodes may be,

determining a sub dictation content set which is output by voice from all sub dictation contents included in the dictation contents, determining the voice output time of each sub dictation content in the sub dictation content set, screening target voice output time which is earlier than the recognition time of the facial feature information and is the shortest in time length from the recognition time, and determining the sub dictation contents output by voice at the target voice output time as target sub dictation contents matched with the recognition time of the facial feature information.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, after the collecting facial feature information of the dictating person, the method further includes:

judging whether the facial feature information comprises feature information of a target feature in all facial features of the dictating person, wherein the target feature comprises at least one of a mouth feature, an eye feature and an eyebrow feature;

when the facial feature information is judged to comprise the feature information of the target feature, triggering and executing the operation of judging whether the collected facial feature information is matched with preset facial feature information;

when the facial feature information is judged not to include the feature information of the target feature, triggering and executing the operation of collecting the facial feature information of the dictating person and the operation of judging whether the facial feature information includes the feature information of the target feature in all the facial features of the dictating person.

The second aspect of the embodiment of the invention discloses a dictation control device based on facial feature information, which comprises a voice output module, an acquisition module, a judgment module and a prompt information output module, wherein:

the voice output module is used for outputting dictation contents corresponding to the dictation instruction in a voice mode according to the detected dictation instruction;

the acquisition module is used for acquiring facial feature information of the dictating person in the process of outputting the dictation content by the voice output module;

the judging module is used for judging whether the collected facial feature information is matched with preset facial feature information, wherein the preset facial feature information is facial feature information when the dictating person does not confirm the dictation content;

and the prompt information output module is used for outputting target prompt information corresponding to the dictation content when the judgment module judges that the acquired facial feature information is matched with the preset facial feature information so as to prompt the dictation content to the dictation person.

As an optional implementation manner, in a second aspect of the embodiment of the present invention, the apparatus further includes a first determining module and a screening module, where:

the first determining module is used for determining the learning ability information of the dictating person after the judging module judges that the collected facial feature information is matched with the preset facial feature information;

the screening module is used for screening at least one piece of prompt information matched with the learning ability information from all pieces of prompt information corresponding to the dictation content, which are acquired in advance, and taking the prompt information as target prompt information corresponding to the dictation content, wherein the prompt information matched with different pieces of learning ability information is different;

wherein, the prompt information output module is specifically configured to:

when the judging module judges that the collected facial feature information is matched with the preset facial feature information, and the screening module screens at least one piece of prompt information matched with the learning capability information from all pieces of prompt information corresponding to the dictation content acquired in advance to serve as target prompt information corresponding to the dictation content, and then outputs target prompt information corresponding to the dictation content to prompt the dictation content to the dictation person.

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the dictation content includes at least one sub-dictation content;

the apparatus further comprises a second determining module, wherein:

the second determining module is configured to determine a target sub dictation content matched with the recognition time of the facial feature information from all sub dictation contents included in the dictation content;

the prompt information output module outputs target prompt information corresponding to the dictation content, and the specific way of prompting the dictation content to the dictation person is as follows:

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the specific manner of determining, by the second determining module, the target sub dictation content that matches the identification time of the facial feature information from all the sub dictation contents included in the dictation content is:

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the determining module is further configured to determine, after the acquiring module acquires facial feature information of a dictating person, whether the facial feature information includes feature information of a target feature in all facial features of the dictating person, where the target feature includes at least one of a mouth feature, an eye feature, and an eyebrow feature;

the mode that the judging module judges whether the collected facial feature information is matched with the preset facial feature information specifically comprises the following steps:

when the facial feature information is judged to comprise the feature information of the target feature, judging whether the collected facial feature information is matched with preset facial feature information;

the acquisition module is further configured to acquire the facial feature information of the dictating person when the judgment module judges that the facial feature information does not include the feature information of the target feature.

A third aspect of the embodiments of the present invention discloses another dictation control apparatus based on facial feature information, the apparatus including:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program code stored in the memory to execute all or part of the steps of any one of the methods disclosed in the first aspect of the embodiments of the present invention.

A fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium, which is characterized by storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute all or part of the steps in any one of the methods disclosed in the first aspect of the embodiments of the present invention.

A fifth aspect of embodiments of the present invention discloses a computer program product, which, when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, according to the detected dictation instruction, dictation content corresponding to the dictation instruction is output in a voice mode; collecting facial feature information of a dictating person in the process of outputting dictation contents by voice; judging whether the collected facial feature information is matched with preset facial feature information, wherein the preset facial feature information is facial feature information when the dictating person does not confirm dictation content; and when the judgment result is yes, outputting target prompt information corresponding to the dictation content to prompt the dictation content to a dictating person. Therefore, by implementing the embodiment of the invention, whether the dictating person knows the specific dictation content can be intelligently judged according to the facial feature information collected when the dictation content is output by voice, if not, the dictating person can quickly and accurately identify the dictation content by outputting the prompt information corresponding to the dictation content, so that the dictation effect and the dictation accuracy of the dictating person are improved, and the dictation experience of the dictating person can be further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a dictation control method based on facial feature information according to an embodiment of the present invention;

FIG. 2 is a flow chart of another dictation control method based on facial feature information according to the embodiment of the present invention;

fig. 3 is a schematic structural diagram of a dictation control device based on facial feature information according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of another dictation control device based on facial feature information according to the embodiment of the present invention;

fig. 5 is a schematic structural diagram of another dictation control device based on facial feature information according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, of embodiments of the present invention are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention discloses a dictation control method and device based on facial feature information, which can intelligently judge whether a dictating person knows specific dictation contents or not according to the facial feature information acquired when the dictation contents are output by voice, and if not, the dictating person can quickly and accurately identify the dictation contents by outputting prompt information corresponding to the dictation contents, so that the dictation effect and the dictation accuracy of the dictating person are improved, and the dictation experience of the dictating person can be further improved. The following are detailed below.

Example one

Referring to fig. 1, fig. 1 is a schematic flow chart of a dictation control method based on facial feature information according to an embodiment of the present invention. The method described in fig. 1 may be applied to any user terminal with a dictation control function, such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a smart wearable device, and a Mobile Internet Device (MID), which is not limited in the embodiment of the present invention. As shown in fig. 1, the dictation control method based on facial feature information may include the operations of:

101. and the user terminal outputs the dictation content corresponding to the dictation instruction in a voice mode according to the detected dictation instruction.

In the embodiment of the present invention, the dictation instruction may be triggered by a dictating person according to a dictation requirement of the dictating person, may be triggered by a parent or a teacher of the dictating person through a management terminal bound to the user terminal, or may be automatically generated by the user terminal when the current time is judged to be a dictation trigger time, which is not limited in the embodiment of the present invention. The dictation content corresponding to the dictation instruction may be a single dictation content, that is, one dictation instruction is used to trigger voice output (also referred to as "reporting") of one dictation content, or the dictation content corresponding to the dictation instruction may also be a set formed by a plurality of sub-dictation contents (for example, a set formed by all english words of a certain unit), and at this time, the dictation instruction may include a preset duration of a voice output interval, where the voice output interval is a time interval between an end time of outputting one sub-dictation content by voice and a start time of outputting the next sub-dictation content by voice.

Optionally, when the dictation content corresponding to the dictation instruction is a set formed by a plurality of sub-dictation contents, if the dictation instruction does not include the preset duration of the voice output interval, the user terminal may automatically calculate the duration of the voice output interval according to a content parameter of one sub-dictation content output by the user terminal in the voice output mode, that is, the user terminal may perform the following operations:

the user terminal determines content parameters of the current sub dictation content output by voice, wherein the content parameters can comprise at least one of the content length of the current sub dictation content, the content type of the current sub dictation content and the content complexity of the current sub dictation content;

the user terminal calculates the time length of the voice output interval according to the content parameters of the current sub-dictation content and a predetermined time length calculation formula;

and under the condition that the prompt information does not need to be output, the user terminal starts timing from the moment when the voice output of the current sub-dictation content is finished, and when the timing duration is equal to the duration of the calculated voice output interval, the next sub-dictation content of the current sub-dictation content is output in a voice mode.

Wherein, the longer the content length of the sub dictation content is, the larger the voice output interval is (i.e. the longer the duration corresponding to the voice output interval is), the higher the content complexity of the sub dictation content is, the larger the voice output interval corresponding to the sub dictation content of formula type is, the larger the voice output interval corresponding to the sub dictation content of english type is than the voice output interval corresponding to the sub dictation content of chinese character type, it should be noted that the content parameter of the sub dictation content is the content parameter of the writing content matched with the sub dictation content, i.e. the content length of the sub dictation content is specifically the content length of the writing content matched with the sub dictation content, the content complexity of the sub dictation content is specifically the content complexity of the writing content matched with the sub dictation content, the content type of the sub dictation content is specifically the content type of the writing content matched with the sub dictation content, for example: the sub dictation contents of the formula type indicate that the written contents matched with the sub dictation contents are formulas. The content length of the sub dictation content can be measured by the number of characters included in the writing content matched with the sub dictation content, and the content complexity of the sub dictation content can be measured by the character complexity included in the writing content matched with the sub dictation content, so that proper writing time (namely the calculated time length of the voice output interval) can be reserved for a dictating person in a mode of adaptively adjusting the time length of the voice output interval according to the relevant parameters of the writing content matched with the sub dictation content, and therefore the situation that the writing accuracy is low due to insufficient writing time can be reduced, and the situation that the dictation efficiency is low due to the fact that the writing time length is too long can be reduced.

Optionally, the predetermined time calculation formula may be:

T＝(k1*a+k2*b)*t；

wherein, k1 and k2 are weight coefficients corresponding to the content length and the content complexity respectively and are both greater than 0.5, t is a reference duration corresponding to a reference sub-dictation content with the same content type as the current sub-dictation content, a is the number of characters of the current writing content matched with the sub-dictation content divided by the number of characters of the writing content matched with the reference sub-dictation content, and b is the complexity of the writing content matched with the current sub-dictation content relative to the writing content matched with the reference sub-dictation content. It should be noted that, k1 and k2 corresponding to different types of sub dictation contents are different, k1 corresponding to the formula type sub dictation contents is greater than k1 corresponding to the english type sub dictation contents is greater than k1 corresponding to the chinese type sub dictation contents, and k2 corresponding to the formula type sub dictation contents is greater than k2 corresponding to the english type sub dictation contents is greater than k2 corresponding to the chinese type sub dictation contents, which is not limited in the embodiment of the present invention.

102. And in the process of outputting the dictation contents by voice, the user terminal collects the facial feature information of the dictating person.

In the embodiment of the present invention, in the process of outputting the dictation content by voice, the acquiring, by the user terminal, the facial feature information of the dictating person may include:

when the dictation content is single content, the user terminal starts timing from the moment of outputting the dictation content by voice and collects facial feature information of a dictating person in the process of increasing the timing duration to a target duration;

when the dictation content is a set formed by a plurality of sub-dictation contents, the user terminal starts timing from the moment of outputting the current sub-dictation content by voice and collects facial feature information of the dictating person in the process of increasing the timing duration to the target duration.

Wherein the target duration is equal to the speech output duration plus a duration of a speech output interval corresponding to the content of the speech output.

In the embodiment of the present invention, the user terminal may collect facial feature information of the dictating person through an image collecting device (e.g., a camera) on the user terminal, where the facial feature information may include a change of a target feature among all facial features of the dictating person, the change of the target feature is used to determine a facial expression of the user, and the target feature may be at least one of a mouth feature, an eye feature, and an eyebrow feature.

In an alternative embodiment, if the user terminal does not collect facial feature information of the dictating person, the user terminal may perform the following operations:

if the image acquisition device on the user terminal is a rotatable image acquisition device, the user terminal controls the image acquisition device to rotate until the image acquisition device captures facial feature information of the dictating person, and further, the face of the dictating person captured by the image acquisition device is positioned in the central position in the field range of the image acquisition device, so that the maximum shaking range can be reserved for the dictating person, and the situation that the image acquisition device cannot capture the facial feature information of the dictating person due to slight shaking of the dictating person is reduced;

if the image acquisition device on the user terminal is a non-rotatable image acquisition device, the user terminal outputs a voice prompt to prompt the dictating person to move until the image acquisition device can acquire the facial feature information of the dictating person or prompt the dictating person to move the user terminal until the image acquisition device can acquire the facial feature information of the dictating person.

Therefore, the alternative embodiment can improve the reliability and accuracy of the facial feature information of the dictating person.

103. The user terminal judges whether the collected facial feature information is matched with preset facial feature information, if the judgment result of the step 103 is yes, the step 104 is triggered to be executed, and if the judgment result of the step 103 is no, the process can be ended.

In an embodiment of the present invention, the preset facial feature information is facial feature information when the dictating person does not confirm the dictation content output by voice. The preset facial feature information may be at least one facial feature information of the dictating person in doubt and/or at least one facial feature information of the dictating person in incomprehensible condition, which is pre-stored in the user terminal. Specifically, the determining, by the user terminal, whether the collected facial feature information matches preset facial feature information may include:

the user terminal judges whether the collected facial feature information is matched with one of the preset facial feature information or not, and if so, the collected facial feature information is determined to be matched with the preset facial feature information.

It should be noted that, in an alternative embodiment, when the determination result in step 103 is negative, the user terminal may automatically output the next dictation content in a voice. And the starting time of voice outputting the next dictation content can be the time which starts to time from the time of finishing the voice outputting the previous dictation content and reaches the time of the voice outputting interval corresponding to the previous dictation content in the timed time length.

104. The user terminal outputs the target prompt information corresponding to the dictation content so as to prompt the dictation content to the dictating person.

In the embodiment of the invention, the user terminal can output the target prompt information in a voice mode and/or a text mode. When the target prompt information is output in a text mode, the target prompt information may be output in a pop-up text box mode. Optionally, when the target prompt information is output in a text manner, if the target prompt information includes part or all of the written content matching the dictation content, processing the content of the part of the target prompt information that is the same as the written content in a preset processing manner, for example, replacing the content of the part of the target prompt information that is the same as the written content with other characters or printing a mosaic on the content of the part of the target prompt information that is the same as the written content.

In this embodiment of the present invention, the target prompt information corresponding to the dictation content may include a part-of-speech prompt information for prompting a part of speech of the writing content matching the dictation content, a prompt information for prompting an application scenario of the writing content matching the dictation content, and a prompt information for explaining a meaning of the writing content matching the dictation content (a word group of the writing content matching the dictation content, a specific explanation of the writing content matching the dictation content, a sentence making of the writing content matching the dictation content, an exclusion prompt of a homophonic and dissimilarity content of the writing content matching the dictation content, and a text original sentence corresponding to the writing content matching the dictation content, etc.). The written content matched with the dictation content may be the dictation content itself or not, for example, when the dictation content is "blue sky", the written content matched with the dictation content is "blue sky", or when the dictation content is "please write the opposite word of tiger", the written content matched with the dictation content is "serious".

Taking the dictation of Chinese characters as an example, if the dictation content is the Chinese character "plant", after the "plant" is output in voice, if the suspicious expression of the dictating person is detected, the "word is a noun", "is applied to a greening scene" or "is one of the main forms of life, including trees, shrubs, vines, grass, ferns, green algae and the like" is output in voice.

In an alternative embodiment, when the determination result in step 103 is yes, before step 104 is executed, the dictation control method based on facial feature information may further include the following operations:

the user terminal determines the learning ability information of the dictating person, screens at least one piece of prompt information matched with the learning ability information from all pieces of prompt information which are acquired in advance and correspond to the dictation content to serve as target prompt information corresponding to the dictation content, and triggers the execution of step 104. The prompting information matched with different learning ability information is different for the same dictation content, and the stronger the learning ability is, the simpler the prompting information is, the lower the prompting degree on the dictation content is, the weaker the learning ability is, the more detailed the prompting information is, and the higher the prompting degree on the dictation content is.

Wherein, the determining, by the user terminal, the learning capability information of the dictating person may include:

the user terminal determines the learning ability information of the dictating person according to the age bracket of the dictating person, the grade of the dictating person, the learning achievement of the dictating person in the subject to which the dictating content belongs or the past dictation accuracy of the dictating person.

Therefore, the optional embodiment can output the prompt information matched with the dictation content and the learning ability of the dictating person, and is beneficial to improving the probability that the dictating person confirms the writing content matched with the dictation content through the prompt information.

In an alternative embodiment, after finishing step 104, the dictation control method based on facial feature information may further include the following operations:

and the user terminal starts timing from the moment of outputting the target prompt information, collects the facial feature information of the dictating person, and outputs other prompt information corresponding to the dictating content to prompt the dictating person if the collected facial feature information is matched with one of the preset facial feature information.

In the embodiment of the present invention, the user terminal may output different prompt messages corresponding to the dictation content for multiple times, and when outputting the prompt message corresponding to the dictation content each time, the user terminal may first screen, according to a preset screening rule, the prompt message that has not been output from all the prompt messages corresponding to the dictation content or from all the prompt messages corresponding to the dictation content and matching with the learning ability information of the dictating person, where the screening rule may be a screening rule in which the prompt message prompts the dictation content to a low degree or a screening rule in which the detailed degree of the prompt message is to a high degree, and the embodiment of the present invention is not limited. Therefore, the method for prompting the dictation contents by outputting the reminding information for a plurality of times from simple to detailed is favorable for realizing the progressive guidance of the dictating person, can not only realize the prompting of the dictation contents, but also trigger the thinking of the dictating person, and further can enable the dictating person to learn the knowledge related to the dictation contents.

Taking the dictation of a Chinese character as an example, if the dictation content is a Chinese character ' plant ', after the ' plant ' is output by voice, if the doubtful expression of the dictating person is collected, the ' word is a noun ' is output by voice, if the doubtful expression of the dictating person is collected after the ' word is a noun ' is output by voice, the ' voice output ' is applied to a greening scene, and if the doubtful expression of the dictating person is collected after the ' voice output ' is applied to the greening scene ', the ' voice output ' is one of the main forms of life and comprises trees, shrubs, vines, grasses, ferns, green algae and the like.

In this optional embodiment, further optionally, if the collected facial feature information of the dictating person is used to indicate that the dictating person still cannot confirm the dictation content, the dictation control method based on the facial feature information may further include the following operations between the voice output and the prompt information corresponding to the dictation content:

the user terminal judges whether the total times of outputting the prompt information corresponding to the dictation content reaches a preset time threshold (such as 3 times), if the total times does not reach the preset time threshold, other prompt information corresponding to the dictation content is output; if the total number of times reaches a preset number threshold, setting a dictation label for the dictation content and/or adding the dictation content into a pre-generated dictation content set, and outputting the next dictation content in a voice mode, wherein the dictation label is used for indicating that a dictating person does not master the writing content matched with the dictation content, and the dictation content set is used for storing all the dictation contents of which the user does not master the matched writing content in the dictation process.

Therefore, the alternative embodiment can also avoid the situation of low dictation efficiency caused by the fact that the dictating person cannot confirm the dictation content for a long time by means of the maximum prompting times.

Still further optionally, the dictation control method based on facial feature information may further include the operations of:

after the dictation process is finished, the user terminal outputs the written content matched with the target dictation content in a text mode, wherein the target dictation content may include the dictation content provided with the dictation label or the dictation content included in the dictation content set, and the embodiment of the invention is not limited.

Still further optionally, after the user terminal outputs the written content matched with the target dictation content in a text mode, the user terminal may also output prompt information corresponding to the target dictation content in a text mode and/or a voice mode.

Therefore, the optional embodiment can also output the content which is not mastered by the dictating person through the text in time after the dictation is finished, and further can output the prompt information corresponding to the content which is not mastered by the dictating person, so that the dictating person can deepen the impression of the content which is not mastered by the dictating person.

It should be noted that, when the dictation content is a set composed of a plurality of sub-dictation contents, the dictation content in the above step is the current sub-dictation content output by the user terminal. If the face feature information matched with the preset face feature information is not acquired after the current sub dictation content is output by voice, timing is started at the moment when the current sub dictation content is output by voice, and when the timing duration reaches the duration of the calculated voice output interval corresponding to the current sub dictation content, the next sub dictation content of the current sub dictation content is automatically output by voice; if the face feature information matched with the preset face feature information is collected, the user terminal starts timing after outputting the prompt information corresponding to the current sub-dictation content, and if the face feature information matched with the preset face feature information is not collected in the timing process, the next sub-dictation content of the current sub-dictation content is automatically output in a voice mode when the timing time reaches the time of the calculated voice output interval corresponding to the current sub-dictation content.

It should be noted that the recognition, mastering, or confirmation of the dictation content is the writing content that the user recognizes, masters, or confirms and matches with the dictation content.

Therefore, by implementing the dictation control method based on the facial feature information described in fig. 1, whether the dictating person knows specific dictation content can be intelligently judged according to the facial feature information collected when the dictation content is output by voice, if not, the dictating person can quickly and accurately identify the dictation content by intelligently outputting prompt information corresponding to the dictation content, the dictation effect and the dictation accuracy of the dictating person are improved, and the dictation experience of the dictating person can be further improved.

Example two

Referring to fig. 2, fig. 2 is a schematic flow chart of another dictation control method based on facial feature information according to an embodiment of the present invention. The method described in fig. 2 may be applied to any user terminal with a dictation control function, such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a smart wearable device, and a Mobile Internet Device (MID), which is not limited in the embodiment of the present invention. As shown in fig. 2, the dictation control method based on facial feature information may include the operations of:

201. and the user terminal outputs the dictation content corresponding to the dictation instruction in a voice mode according to the detected dictation instruction.

In an embodiment of the present invention, the dictation content includes at least one sub-dictation content.

202. And in the process of outputting the dictation contents by voice, the user terminal collects the facial feature information of the dictating person.

203. The user terminal judges whether the collected facial feature information is matched with the preset facial feature information, if yes, the step 204 is triggered, and if no, the step 202 may be triggered.

In an embodiment of the present invention, the preset facial feature information is facial feature information when the dictating person does not confirm the dictation content.

In the embodiment of the present invention, other detailed descriptions for steps 201 to 202 may refer to the corresponding descriptions for steps 101 to 102 in the first embodiment, and are not described again in the embodiment of the present invention.

204. The user terminal determines the target sub dictation content matched with the identification time of the facial feature information from all the sub dictation contents included in the dictation content.

205. And the user terminal determines the learning capability information of the dictating person, and screens at least one piece of prompt information matched with the learning capability information from all pieces of prompt information which are acquired in advance and correspond to the target sub-dictation content to serve as target prompt information corresponding to the target sub-dictation content.

206. And the user terminal outputs target prompt information corresponding to the target sub-dictation content so as to prompt the dictation content to a dictating person.

In the embodiment of the present invention, it should be noted that after the step 204 is executed, the step 206 may also be directly triggered to be executed, which is not limited in the embodiment of the present invention.

In an alternative embodiment, the determining, by the user terminal, the target sub-dictation content that matches the recognition time of the facial feature information from all the sub-dictation contents included in the dictation content may include:

the user terminal determines sub dictation contents which are output in voice when the facial feature information is recognized from all the sub dictation contents included in the dictation contents, and the sub dictation contents are used as target sub dictation contents matched with the recognition time of the facial feature information; alternatively, the first and second electrodes may be,

the user terminal determines a sub dictation content set which is output by voice from all sub dictation contents included in the dictation content, determines the voice output time of each sub dictation content in the sub dictation content set, screens a target voice output time which is earlier than the recognition time of the facial feature information and is the shortest in time from the recognition time, and determines the sub dictation content which is output by voice at the target voice output time as the target sub dictation content matched with the recognition time of the facial feature information.

Therefore, the optional embodiment can accurately identify the sub-dictation contents output by voice when the dictating person presents the facial feature information matched with the preset facial feature information when a plurality of sub-dictation contents need to be dictated, and is favorable for improving the reliability of the output prompt information.

In another alternative embodiment, after completing step 201, the dictation control method based on facial feature information may further include the following operations:

the user terminal judges whether the facial feature information comprises the feature information of the target feature in all the facial features of the dictating person, and triggers the execution of step 202 when judging that the facial feature information comprises the feature information of the target feature;

when the facial feature information is judged not to include the feature information of the target feature, the steps 201 to 202 are triggered again.

Wherein the target feature comprises at least one of a mouth feature, an eye feature, and an eyebrow feature.

Therefore, the optional embodiment can also improve the accuracy of judging whether the dictating person grasps or confirms the writing content matched with the dictating content in a mode of ensuring that the collected facial feature information contains the feature information of the target feature.

It can be seen that, by implementing the dictation control method based on the facial feature information described in fig. 2, whether the dictation person knows the specific dictation content can be intelligently judged according to the facial feature information collected when the dictation content is output by voice, and if not, the dictation person can quickly and accurately recognize the dictation content by intelligently outputting the prompt information corresponding to the dictation content, so that the dictation effect and the dictation accuracy of the dictation person are improved, and the dictation experience of the dictation person can be further improved. And the sub dictation contents output by voice when the dictating person has facial feature information matched with the preset facial feature information can be accurately identified when a plurality of sub dictation contents need to be dictated, so that the reliability of the output prompt information is improved, and the accuracy of judging whether the dictating person grasps or confirms the writing contents matched with the dictation contents is improved by ensuring that the acquired facial feature information contains the feature information of the target features.

EXAMPLE III

Referring to fig. 3, fig. 3 is a schematic structural diagram of a dictation control device based on facial feature information according to an embodiment of the present invention. The apparatus described in fig. 3 may be applied to any user terminal such as a smart phone (e.g., an Android phone, an iOS phone), a tablet computer, a palmtop computer, a smart wearable device, and a Mobile Internet Device (MID), and the embodiment of the present invention is not limited thereto. As shown in fig. 3, the dictation control apparatus based on facial feature information may include:

and the voice output module 301 is configured to output the dictation content corresponding to the dictation instruction in a voice manner according to the detected dictation instruction.

The collecting module 302 is configured to collect facial feature information of the dictating person in the process of outputting the dictation content by the voice output module 301.

The judging module 303 is configured to judge whether the facial feature information acquired by the acquiring module 302 matches preset facial feature information, where the preset facial feature information is facial feature information when the dictating person does not confirm the dictation content.

And a prompt information output module 304, configured to output target prompt information corresponding to the dictation content to prompt the dictation content to a dictating person when the judgment module 303 judges that the collected facial feature information matches preset facial feature information.

It can be seen that, by implementing the dictation control device based on the facial feature information described in fig. 3, whether the dictation person knows specific dictation content can be intelligently judged according to the facial feature information collected when the dictation content is output by voice, and if not, the dictation person can quickly and accurately recognize the dictation content by intelligently outputting prompt information corresponding to the dictation content, so that the dictation effect and the dictation accuracy of the dictation person are improved, and the dictation experience of the dictation person can be further improved.

In an alternative embodiment, as shown in fig. 4, the dictation control apparatus based on facial feature information may further include a first determining module 305 and a filtering module 306, wherein:

a first determining module 305, configured to determine learning ability information of the dictating person after the judging module 303 judges that the collected facial feature information matches the preset facial feature information.

The filtering module 306 is configured to filter at least one piece of prompt information matching the learning ability information from all pieces of prompt information corresponding to the dictation content, which are acquired in advance, as target prompt information corresponding to the dictation content, where the prompt information matching different pieces of learning ability information is different.

The prompt information output module 304 is specifically configured to:

when the judging module 303 judges that the collected facial feature information matches the preset facial feature information, and after the screening module 306 screens at least one piece of prompt information matching the learning ability information from all pieces of prompt information corresponding to the dictation content acquired in advance as target prompt information corresponding to the dictation content, target prompt information corresponding to the dictation content is output to prompt the dictation content to the dictation person.

It can be seen that, the dictation control device based on the facial feature information described in fig. 4 can also output prompt information matched with the dictation content and the learning ability of the dictating person, which is beneficial to improving the probability that the dictating person confirms the writing content matched with the dictation content through the prompt information.

In another alternative embodiment, the dictation content may include at least one sub-dictation content, and as shown in fig. 4, the dictation control apparatus based on facial feature information may further include a second determining light module 307, wherein:

a second determining module 307, configured to determine a target sub dictation content that matches the recognition time of the facial feature information from all the sub dictation contents included in the dictation content.

The prompt information output module 304 outputs the target prompt information corresponding to the dictation content, and the specific way of prompting the dictation content to the dictation person is as follows:

and outputting prompt information corresponding to the target sub-dictation content to prompt the target sub-dictation content to a dictating person.

In this optional embodiment, further optionally, the specific way for the second determining module 307 to determine the target sub dictation content matching the identification time of the facial feature information from all the sub dictation contents included in the dictation content may be:

determining sub dictation contents which are output in a voice mode when the facial feature information is recognized from all sub dictation contents included in the dictation contents, and using the sub dictation contents as target sub dictation contents matched with the recognition time of the facial feature information; alternatively, the first and second electrodes may be,

determining a sub dictation content set which is output by voice from all sub dictation contents included in the dictation contents, determining the voice output time of each sub dictation content in the sub dictation content set, screening target voice output time which is earlier than the recognition time of the facial feature information and has the shortest time length from the recognition time, and determining the sub dictation contents which are output by voice at the target voice output time as target sub dictation contents matched with the recognition time of the facial feature information.

It can be seen that, the dictation control device based on the facial feature information described in the implementation of fig. 4 can also accurately identify the sub-dictation contents output by voice when the dictating person presents facial feature information matched with the preset facial feature information when a plurality of sub-dictation contents are required to be dictated, which is beneficial to improving the reliability of the output prompt information.

In yet another optional embodiment, the determining module 303 may be further configured to determine whether the facial feature information acquired by the acquiring module 302 includes feature information of a target feature in all facial features of the dictating person, where the target feature includes at least one of a mouth feature, an eye feature and an eyebrow feature, after the acquiring module 301 acquires the facial feature information of the dictating person.

In this further alternative embodiment, the manner for the determining module 303 to determine whether the facial feature information acquired by the acquiring module 302 matches the preset facial feature information specifically is:

when it is determined that the facial feature information acquired by the acquisition module 302 includes feature information of a target feature, it is determined whether the facial feature information acquired by the acquisition module 302 matches preset facial feature information.

In this further alternative embodiment, the collecting module 302 may be further configured to re-collect facial feature information of the dictating person when the judging module 303 judges that the facial feature information does not include feature information of the target feature.

It can be seen that, implementing the dictation control apparatus based on facial feature information described in fig. 4 can also improve the accuracy of determining whether the dictating person grasps or confirms the written content matching the dictating content by ensuring that the collected facial feature information contains the feature information of the target feature.

Example four

Referring to fig. 5, fig. 5 is a schematic structural diagram of another dictation control device based on facial feature information according to an embodiment of the present invention. As shown in fig. 5, the dictation control apparatus based on facial feature information may include:

a memory 501 in which executable program code is stored;

a processor 502 coupled to a memory 501;

the processor 502 calls the executable program code stored in the memory 501 to execute the steps in the dictation control method based on facial feature information described in fig. 1 or fig. 2.

EXAMPLE five

An embodiment of the present invention discloses a computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the steps in the dictation control method based on facial feature information described in fig. 1 or fig. 2.

EXAMPLE six

An embodiment of the present invention discloses a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the steps in the dictation control method based on facial feature information described in fig. 1 or fig. 2.

In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

In the embodiments provided herein, it should be understood that "B corresponding to A" means that B is associated with A from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.

It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by instructions associated with a program, which may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), compact disc-Read-Only Memory (CD-ROM), or other Memory, magnetic disk, magnetic tape, or magnetic tape, Or any other medium which can be used to carry or store data and which can be read by a computer.

The method and the device for dictation control based on facial feature information disclosed by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A dictation control method based on facial feature information, the method comprising:

2. A dictation control method based on facial feature information as claimed in claim 1, wherein after determining that the collected facial feature information matches the preset facial feature information, before outputting target prompt information corresponding to the dictation content to prompt the dictation content to the dictation person, the method further comprises:

3. A dictation control method as claimed in claim 1, characterized in that the dictation content comprises at least one sub-dictation content;

4. A dictation control method based on facial feature information as claimed in claim 3, characterized in that the determining of the target sub-dictation content matching the recognition moment of the facial feature information from all the sub-dictation contents comprised by the dictation content comprises:

5. A dictation control method based on facial feature information as claimed in any of claims 1-4, characterized in that after the acquisition of the facial feature information of the dictating person, the method further comprises:

6. The utility model provides a dictation controlling means based on facial feature information, its characterized in that, the device includes speech output module, collection module, judgement module and suggestion information output module, wherein:

7. A dictation control device based on facial feature information as claimed in claim 6 characterized in that the device further comprises a first determination module and a filtering module, wherein:

wherein, the prompt information output module is specifically configured to:

8. A dictation control device as claimed in claim 6, characterized in that the dictation content comprises at least one sub-dictation content;

the apparatus further comprises a second determining module, wherein:

9. The dictation control device as claimed in claim 8, wherein the second determination module determines the target sub-dictation content matching the recognition time of the facial feature information from all the sub-dictation content included in the dictation content in a manner of:

10. The dictation control device based on facial feature information as claimed in any one of claims 6-9, wherein the judging module is further configured to judge whether the facial feature information comprises feature information of a target feature in all facial features of the dictating person after the facial feature information of the dictating person is collected by the collecting module, wherein the target feature comprises at least one of a mouth feature, an eye feature and an eyebrow feature;