CN113311987B

CN113311987B - Control method and device of dictation equipment, dictation equipment and storage medium

Info

Publication number: CN113311987B
Application number: CN202110853790.8A
Authority: CN
Inventors: 张恒志; 李泽桐; 姬传国
Original assignee: Beijing Ape Power Future Technology Co Ltd
Current assignee: Beijing Ape Power Future Technology Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-11-16
Anticipated expiration: 2041-07-28
Also published as: CN113311987A

Abstract

The disclosure discloses a control method and device of dictation equipment, the dictation equipment and a storage medium. The scheme is as follows: playing first audio data in an audio data set corresponding to target dictation content according to a preset time interval; under the condition that the playing times of the first audio data reach a first set value, displaying a first touch control assembly and a second touch control assembly on a display interface; in response to monitoring that the first touch control assembly is triggered, displaying target dictation content and a third touch control assembly in a prompt box; and in response to the fact that the third touch control assembly is triggered, closing the prompt box, returning to execute the operation of playing the first audio data until the second touch control assembly is triggered, and playing second audio data in the audio data set, wherein the second audio data is different from the first audio data. Therefore, the method and the device not only realize the automatic dictation of the user by using the dictation equipment, but also save the time of the user and improve the dictation efficiency.

Description

Control method and device of dictation equipment, dictation equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence technologies such as speech technology and natural language processing, and in particular, to a method and an apparatus for controlling dictation equipment, and a storage medium.

Background

For students in middle and primary schools, parts such as Chinese words and English words are of great importance in the learning process, and generally, the Chinese words and English words can be detected in a dictation mode to determine whether the students in middle and primary schools master the content.

Generally, a teacher can listen and write in a classroom, and students write corresponding characters according to the heard contents; or the teacher can also listen and write by parents in a family operation mode. Whether a teacher listens and writes or a parent listens and writes, a large amount of time is generally occupied.

Disclosure of Invention

The disclosure provides a control method and device of a dictation device, the dictation device and a storage medium.

In one aspect of the present disclosure, a method for controlling a dictation apparatus is provided, including:

playing first audio data in an audio data set corresponding to target dictation content according to a preset time interval;

under the condition that the playing times of the first audio data reach a first set value, displaying a first touch control assembly and a second touch control assembly on a display interface;

in response to monitoring that the first touch control assembly is triggered, displaying the target dictation content and a third touch control assembly in a prompt box;

and in response to the fact that the third touch control assembly is triggered, closing the prompt box, and returning to execute the operation of playing the first audio data until the second touch control assembly is triggered, and playing second audio data in the audio data set, wherein the second audio data is different from the first audio data.

In another aspect of the present disclosure, there is provided a control apparatus of a dictation device, including:

the first playing module is used for playing first audio data in an audio data set corresponding to target dictation content according to a preset time interval;

the first display module is used for displaying the first touch control assembly and the second touch control assembly on a display interface under the condition that the playing frequency of the first audio data reaches a first set value;

the second display module is used for responding to the monitoring that the first touch control assembly is triggered, and displaying the target dictation content and a third touch control assembly in a prompt box;

and the second playing module is used for closing the prompt box in response to the fact that the third touch control assembly is triggered, returning to execute the operation of playing the first audio data until the second touch control assembly is triggered, and playing second audio data in the audio data set, wherein the second audio data is different from the first audio data.

In another aspect of the present disclosure, there is provided a dictation apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the control method of the dictation apparatus described in the above-mentioned embodiment of an aspect.

In another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing thereon a computer program for causing a computer to execute a method of controlling a dictation apparatus described in an embodiment of the above-described aspect.

In another aspect of the present disclosure, a computer program product is provided, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the control method of the dictation apparatus described in the embodiment of the above aspect.

The control method, the device, the dictation equipment and the storage medium of the dictation equipment can play first audio data in an audio data set corresponding to target dictation content according to a preset time interval, display a first touch component and a second touch component on a display interface under the condition that the playing frequency of the first audio data reaches a first set value, display the target dictation content and a third touch component in a prompt box under the condition that the first touch component is triggered, close the prompt box under the condition that the third touch component is triggered, and return to execute the operation of playing the first audio data until the second touch component is triggered to play the second audio data in the audio data set. Therefore, the method not only realizes the automatic dictation of the user by utilizing the dictation equipment, not only saves the time of the user and improves the dictation efficiency, but also can reproduce the target dictation content in real time according to the requirements of the user in the dictation process, thereby deepening the mastering degree of the user on the dictation content.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flowchart illustrating a control method of a dictation apparatus according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a control method of a dictation apparatus according to another embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a control method for a dictation apparatus according to another embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a control device of a dictation apparatus according to yet another embodiment of the present disclosure;

FIG. 5 is a block diagram of a dictation apparatus used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning technology, a deep learning technology, a big data processing technology, a knowledge map technology and the like.

Natural language processing is the computer processing, understanding and use of human languages (such as chinese, english, etc.), which is a cross discipline between computer science and linguistics, also commonly referred to as computational linguistics. Since natural language is the fundamental mark that humans distinguish from other animals. Without language, human thinking has not been talk about, so natural language processing embodies the highest task and context of artificial intelligence, that is, only when a computer has the capability of processing natural language, the machine has to realize real intelligence.

Key technologies in the field of computers for speech technology are automatic speech recognition technology (ASR) and speech synthesis technology (TTS). The computer can listen, see, speak and feel, and the development direction of future human-computer interaction is provided, wherein the voice becomes the best viewed human-computer interaction mode in the future, and the voice has more advantages than other interaction modes.

A method and apparatus for controlling a dictation device, and a storage medium according to embodiments of the present disclosure are described below with reference to the accompanying drawings.

The control method of the dictation device in the embodiment of the disclosure can be executed by the control device of the dictation device provided in the embodiment of the disclosure, and the device can be configured in the dictation device.

For convenience of description, in the embodiments of the present disclosure, the control device of the dictation apparatus may be simply referred to as "control device".

Fig. 1 is a schematic flowchart of a method for controlling a dictation apparatus according to an embodiment of the present disclosure.

As shown in fig. 1, the control method of the dictation apparatus may include the steps of:

step 101, playing first audio data in an audio data set corresponding to a target dictation content according to a preset time interval.

The time interval may be set in advance, may be a uniform numerical value, or may be a value that matches the difficulty level of the target dictation content. The present disclosure is not limited thereto.

For example, if the preset time interval is 6 seconds, the control device may play the first audio data in the audio data set corresponding to the target dictation content once every 6 seconds.

Or, if the target dictation content is more complex, the preset time interval may be longer, for example, 10 seconds, 13 seconds, and the like; the target dictation content is relatively simple, and the preset time interval may be relatively short, for example, 5 seconds, 6 seconds, and so on.

Or, each target dictation content has a preset and specific time interval, so that the first audio data in the audio data set corresponding to the target dictation content can be played according to the respective corresponding time interval during playing.

It should be noted that the above examples are only illustrative, and should not be taken as a limitation on the preset time intervals and the like in the embodiments of the present disclosure.

In addition, the dictation content may be a certain course content, or may be a word of a certain unit, and the like. For example, the target dictation content is an original word of the second lesson, or the target dictation content is an original word and phrase of the third lesson, or the target dictation content is a word of the first unit, and the like, which is not limited in the disclosure.

In addition, there may be multiple audio data in an audio data set, where the multiple audio data may correspond to the same target dictation content.

For example, the target dictation content is "medium", and the corresponding audio data set may include a plurality of audio data. For example, they are respectively: the "middle, chinese, middle at noon", and the plurality of audio data correspond to "middle", and the like, which is not limited in the present disclosure.

Step 102, displaying the first touch control assembly and the second touch control assembly on a display interface under the condition that the playing frequency of the first audio data reaches a first set value.

The first set value may be a preset value, such as 3, 4, 6, etc.; or it may be adjusted as needed, etc., and the disclosure is not limited thereto.

In addition, the first touch control component can be a component which can arbitrarily guide a user to trigger the component so as to acquire prompt information. It can be understood that by triggering the first touch component, target dictation content can be displayed, so that a user can be given prompt information.

The second touch component may be any component that can guide a user to trigger the component to play the next dictation content, and it can be understood that the second audio data corresponding to the next dictation content can be played by triggering the second touch component.

It is to be understood that the style or the presentation form of the first touch device and the second touch device may be any form, which is not limited in the present disclosure.

For example, the first touch component may be a red circular "hint" touch component, a blue square "tip" touch component, or the like; the second touch device may be a "play next" with a yellow square, a "next" touch device with a green oval, and so on, which is not limited in this disclosure.

For example, the first setting value is 4. The first audio data is 'Chinese' and has been played for 4 times to reach a first set value, and at this time, a user usually finishes writing the 'Chinese' character, so that the first touch control assembly and the second touch control assembly can be displayed on the display interface to guide the user to perform the next operation according to the actual grasping condition of the user, the first audio data is prevented from being played repeatedly all the time, the dictation progress of the user is not stopped all the time, the time of the user is saved, and the dictation efficiency is improved.

It should be noted that the above examples are only examples, and should not be taken as limitations on the first setting value, the first touch component, the second touch component, and the like in the embodiments of the present disclosure.

Optionally, the first touch component and the second touch component can also provide a voice broadcast function, so that the time of a user is saved, and the user can be helped to listen and write better.

Step 103, in response to the monitoring that the first touch component is triggered, displaying the target dictation content and a third touch component in a prompt box.

The third touch control component can be any component capable of triggering the device to return, and the dictation operation can be continued by triggering the third touch control component.

In addition, the style or presentation form of the third touch component may be any form set in advance, which is not limited in this disclosure.

For example, the third touch component is a "continuous dictation" touch component, and the dictation can be continuously performed by triggering the component; or the third touch component is a "write again" touch component, and by triggering the component, the user can continue to input the dictation content corresponding to the first audio data, and the like, which is not limited in this disclosure.

It can be understood that after the first touch component is triggered, the first touch component does not exist in the current interface, correspondingly, a prompt box can appear in the interface, the user can deepen the mastering degree of the target dictation content according to the target dictation content displayed in the prompt box, and can continue dictation by triggering the third touch component, so that the user operation is facilitated, and the time is saved.

And 104, in response to the fact that the third touch control assembly is triggered, closing the prompt box, and returning to execute the operation of playing the first audio data until the second touch control assembly is triggered, and playing second audio data in the audio data set, wherein the second audio data is different from the first audio data.

It is understood that the control device may determine the corresponding second audio data according to the dictation mode selected by the user.

Optionally, in a case that the current dictation mode is random dictation, it may be determined that any audio data that is not played in the audio data set is the second audio data.

For example, the current dictation mode is random dictation, and the total number of the audio data sets is 5, which are respectively: audio data 1, audio data 2, audio data 3, audio data 4, audio data 5. If the audio data 1, the audio data 3, and the audio data 4 are not played, the control device may use any one of the audio data 1, the audio data 3, and the audio data 4 as the second audio data, for example, use the audio data 3 as the second audio data, and so on, which is not limited in this disclosure.

Alternatively, when the current dictation mode is sequential dictation, it may be determined that audio data that is adjacent to the first audio data and is located after the first audio data in the audio sequence included in the audio data set is the second audio data.

For example, the current dictation mode is sequential dictation, and the audio data set comprises audio sequences in sequence as follows: audio data 1, audio data 2, audio data 3, audio data 4, audio data 5. If the first audio data is the audio data 3, the audio data 4 may be the second audio data, and so on, which is not limited in this disclosure.

According to the embodiment of the disclosure, the first audio data in the audio data set corresponding to the target dictation content can be played at preset time intervals, the first touch component and the second touch component are displayed on the display interface under the condition that the playing frequency of the first audio data reaches a first set value, the target dictation content and the third touch component can be displayed in the prompt box under the condition that the first touch component is triggered, then the prompt box can be closed under the condition that the third touch component is triggered, the operation of playing the first audio data is returned to be executed until the second touch component is triggered to play the second audio data in the audio data set. Therefore, the method not only realizes the automatic dictation of the user by utilizing the dictation equipment, but also saves the time of the user, improves the dictation efficiency, and can reproduce the target dictation content in real time according to the requirements of the user in the dictation process, thereby deepening the mastering degree of the user on the dictation content.

Fig. 2 is a schematic flowchart of a method for controlling a dictation apparatus according to an embodiment of the present disclosure.

As shown in fig. 2, the control method of the dictation apparatus may include the steps of:

step 201, in response to monitoring that the target dictation content in the dictation service interface is triggered, acquiring an audio data set corresponding to the target dictation content.

The user may trigger the target dictation content in a touch manner, for example, the user may click or select the target dictation content with a finger, or may click the target dictation content with a capacitive pen, a touch pen, or the like, or may select the target dictation content with a mouse, or the like, which is not limited in this disclosure.

For example, in the dictation service interface, a textbook version, a text content, and one or more dictation contents included in the text content are displayed. For example, the dictation content of the current text content in the dictation service interface is a new word of the first course. The control device may acquire an audio data set corresponding to the first course word after monitoring that the first course word is triggered, and the like, which is not limited in this disclosure.

Step 202, playing first audio data in an audio data set corresponding to the target dictation content at preset time intervals.

In step 203, a fourth touch component and a second touch component are displayed on the display interface when the playing frequency of the first audio data reaches a second set value, wherein the second set value is smaller than the first set value.

The fourth touch control component can be any component which can guide a user to trigger the component arbitrarily so as to continue to play the first audio data, and the touch control component is triggered so as to continue to play the first audio data.

In addition, by triggering the second touch control assembly, second audio data corresponding to the next dictation content can be played.

It is understood that the style or presentation form of the fourth touch component, etc. may be any form. For example, the fourth touch device can be a red circular touch device for "replay", a blue square touch device for "continue play", and so on, which is not limited by the present disclosure.

For example, the second set value is 2. If the first audio data has been played for 2 times currently, the control device may display a "continue playing" touch component and a "play next" touch component on the display interface, so that the user may select whether to continue playing the first audio data corresponding to the current target dictation content or play the next dictation content, thereby facilitating the user operation, saving the user time, and better satisfying the user dictation requirements.

It should be noted that the above examples are only illustrative, and cannot be taken as limitations on the second setting value, each trigger control, and the like in the embodiments of the present disclosure.

In step 204, in response to the fourth touch component being triggered, the operation of playing the first audio data is returned to be executed until the playing frequency reaches the first set value.

It can be understood that, when determining that the fourth touch component is triggered, the control device may continue to play the first audio data corresponding to the current target dictation content; or other audio data or video data corresponding to the current target dictation content can be played, and the user is inspired from multiple angles, so that the user is helped to perform dictation better.

Alternatively, multimedia data associated with the first audio data may be acquired first, and then the first audio data may be replaced with the multimedia data.

Wherein the first audio data is the same as the dictation content indicated by the multimedia data.

It will be appreciated that the multimedia data may be audio data, or may also be video data or the like. In addition, the video data may be a picture in a lesson where the target dictation content is located, or may also be a picture or a video obtained from another database according to the target dictation content, and the like, which is not limited in this disclosure.

For example, if the target dictation content is "middle", the first audio data is "middle, middle in china", and the control device determines that the multimedia data associated with the first audio data is "middle, middle in noon", "middle, middle", the first audio data may be replaced by the multimedia data, that is, "middle, middle in noon", or "middle, middle" may be played, and the like, which is not limited in this disclosure.

In the embodiment of the present disclosure, if the playing frequency of the first audio data reaches the second set value, the user may be helped to recall the target dictation content by playing other multimedia data, that is, the user is inspired from multiple angles, and is guided to write the target dictation content, so that dictation is completed, and dictation efficiency is improved.

Step 205, displaying the first touch component and the second touch component in a touch layer on the display interface.

The first touch component and the second touch component may be located at any position of the touch layer, such as at an upper left corner, an upper right corner, a center position, and the like of the touch layer, which is not limited in this disclosure.

In addition, the position of the touch layer in the display interface can be set according to needs, for example, the position can be set in the upper right corner and the upper left corner of the display interface, or the corresponding position can be determined according to the operation habits of the user.

Specifically, the operation habit of the user to which the dictation device belongs may be determined according to the historical operation data, then the display position of the touch layer may be determined according to the operation habit of the user, and the touch layer may be displayed on the display interface based on the display position.

The historical operation data can be acquired first, then the historical operation data is analyzed, and the operation habits of the user to which the dictation device belongs are determined according to the analysis result.

For example, by analyzing the historical operation data, it is determined that the user a to which the dictation device belongs is used to operate at the upper left corner of the display interface, it may be determined that the display position of the touch layer is located at the upper left corner of the display interface, and then the touch layer may be displayed at the upper left corner of the display interface.

It should be noted that the above example is only an example, and cannot be used as a limitation on a manner of determining a position of a touch layer of a display interface in the embodiment of the present disclosure.

Step 206, in response to the detection that the first touch component is triggered, displaying the target dictation content and a third touch component in a prompt box.

Step 207, in response to the third touch component being triggered, closing the prompt box, and returning to execute the operation of playing the first audio data until the second touch component is triggered, playing second audio data in the audio data set, wherein the second audio data is different from the first audio data.

In the embodiment of the disclosure, when it is monitored that the target dictation content in the dictation service interface is triggered, an audio data set corresponding to the target dictation content may be obtained first, then the first audio data in the audio data set corresponding to the target dictation content may be played at a preset time interval, when the playing frequency of the first audio data reaches a second set value, the fourth touch component and the second touch component are displayed on the display interface, when the fourth touch component is triggered, the operation of playing the first audio data is returned to be executed until the playing frequency reaches the first set value, then when the first touch component is triggered, the target dictation content and the third touch component are displayed in the prompt box, when the third touch component is triggered, the prompt box is closed, and the operation of playing the first audio data is returned to be executed, and playing second audio data in the audio data set until the second touch control assembly is triggered. Therefore, the method and the device not only realize the automatic dictation of the user by using the dictation equipment, but also save the time of the user and improve the dictation efficiency.

It is understood that after the user completes dictation, the content written by the user can be detected and the detection result can be given, and the above process is described in detail with reference to fig. 3.

Fig. 3 is a schematic flowchart of a method for controlling a dictation apparatus according to an embodiment of the present disclosure.

As shown in fig. 3, the control method of the dictation apparatus may include the steps of:

step 301, in response to receiving a dictation data detection instruction, determining sequence data to be detected and target dictation content.

It is understood that, in a case where the user clicks the "submit" touch component after the playing of all the audio data in the audio data set corresponding to the target dictation content is finished, the control device may determine that the dictation data detection instruction is received.

In addition, the sequence data to be detected may be one or more arrays in a sequential order, which is not limited in this disclosure.

It will be appreciated that each operation instruction of the user may correspond to a set of arrays. Therefore, the sequence data to be detected can comprise the start position coordinate, the end position coordinate and the corresponding operation type corresponding to each operation instruction.

The operation type may be an input operation, or may also be an erase operation, etc., which is not limited in this disclosure.

It is understood that the user can write through the input operation, and if the user has an error in the writing process, the user can erase the error to eliminate the error part.

For convenience of description, the input operation may be described as "1", the erase operation may be described as "0", or other contents may be described, and the present disclosure does not limit the input operation.

For example, the sequence data to be detected are: (x 1, y1, 1; x2, y2, 1), then from the sequence data, it can be determined that the user's final written part is: (x 2-x1, y2-y 1) so that the reference character entered by the user can be determined.

Alternatively, the sequence data to be detected is: (x 1, y1, 1; x2, y2, 1), (x 3, y3, 1; x4, y4, 1), (x 3, y3, 0; x4, y4, 0), it can be determined from the sequence data that the user has performed an input operation first, the input portions are (x 2-x1, y2-y 1) and (x 4-x3, y4-y 3), and then an erasing operation is performed, the erasing portion is: (x 4-x3, y4-y 3) so that the reference character entered by the user can be determined based on the final written portion of the user.

It should be noted that the above examples are only illustrative, and should not be taken as limitations on the sequence data to be detected and the reference characters input by the user in the embodiments of the present disclosure.

And 302, determining and displaying a detection result corresponding to the sequence data to be detected according to the target dictation content.

The reference characters input by the user can be determined according to the sequence data to be detected, and then the reference characters are compared with the target dictation content in a matching degree mode. If the matching degree is greater than the set first threshold, it may be determined that the reference character input by the user is consistent with the target dictation content, that is, it may be determined that the detection result of the sequence data to be detected is: correct; otherwise, the detection result of the sequence data to be detected can be determined as: and (4) an error.

For example, the first threshold is 0.95, the matching degree of the reference character input by the user and the target dictation content is compared, the obtained matching degree is 0.98, and if the obtained matching degree is greater than the first threshold, it can be determined that the detection result of the sequence data to be detected is: and displaying the detection result correctly, and displaying the target dictation content maliciously to remind the user to write more normatively.

It should be noted that the above examples are only illustrative, and cannot be taken as limitations on the sequence data to be detected and the manner of determining the detection result corresponding to the sequence data to be detected in the embodiments of the present disclosure.

It is understood that, in the embodiments of the present disclosure, the matching degree of the reference character and the target dictation content may be determined in any desirable way, which is not limited by the present disclosure.

Optionally, reference content associated with the target dictation content may be obtained first, and then a detection result corresponding to the sequence data to be detected is determined and displayed according to the target dictation content and the reference content.

And the similarity between the first character in the reference content and the second character in the target dictation content is greater than a threshold value.

The threshold may be a preset value, such as 0.95, 0.9, 0.8, and the like, which is not limited in this disclosure.

In addition, there may be one or more first characters in the reference content, and the disclosure is not limited thereto.

For example, the second character in the target dictation content is: recognizing that the first characters in the reference content are respectively: plait, debate, flap, the similarity between each first character and the second character is all greater than the threshold. Then, according to the sequence data to be detected, a reference character input by the user can be determined, and then the reference character is compared with the second character of the target dictation content and each first character in the reference content, and the character with the highest matching degree is used as the character input by the user.

If the character with the highest matching degree is the second character in the target dictation content, the sequence data to be detected can be determined to be correct, and the second character in the target dictation content can be displayed. If the character with the highest matching degree is any first character in the reference content, the sequence data error to be detected can be determined, and the any first character is displayed.

It should be noted that the above examples are only illustrative, and should not be taken as limitations on the manner and the like of detecting the sequence data to be detected and determining the detection result corresponding to the sequence data to be detected in the embodiments of the present disclosure.

In the embodiment of the disclosure, the detection result corresponding to the sequence data to be detected can be displayed, so that a user can visually see the error part, the user can analyze the error reason, the user time is saved, and the efficiency of the user for correcting the target dictation content is improved.

In the embodiment of the disclosure, when a dictation data detection instruction is received, sequence data to be detected and target dictation content may be determined first, and then a detection result corresponding to the sequence data may be determined and displayed according to the target dictation content. Therefore, sequence data to be detected can be automatically detected, a detection result is displayed, manual correction of dictation content is not needed, time is saved, and dictation efficiency is improved.

In order to implement the above embodiments, the present disclosure further provides a control device of a dictation apparatus.

Fig. 4 is a schematic structural diagram of a control device of a dictation apparatus according to an embodiment of the present disclosure.

As shown in fig. 4, the control device 400 of the dictation apparatus includes: a first playing module 410, a first display module 420, a second display module 430 and a second playing module 440.

The first playing module 410 is configured to play first audio data in an audio data set corresponding to a target dictation content at preset time intervals.

The first display module 420 is configured to display the first touch component and the second touch component on a display interface when the playing frequency of the first audio data reaches a first set value.

The second display module 430 is configured to display the target dictation content and a third touch component in a prompt box in response to the detection that the first touch component is triggered.

The second playing module 440 is configured to close the prompt box in response to the third touch component being monitored to be triggered, and return to execute the operation of playing the first audio data until the second touch component is monitored to be triggered, and play second audio data in the audio data set, where the second audio data is different from the first audio data.

Optionally, the first display module 420 is further configured to display a fourth touch component and the second touch component on the display interface when the number of times of playing the first audio data reaches a second set value, where the second set value is smaller than the first set value.

The first playing module 410 is further configured to, in response to the monitoring that the fourth touch component is triggered, return to the operation of playing the first audio data until the playing frequency reaches the first set value.

Optionally, the first playing module 410 is specifically configured to:

acquiring multimedia data associated with the first audio data, wherein the first audio data is the same as dictation content indicated by the multimedia data;

replacing the first audio data with the multimedia data.

Optionally, the first playing module 410 is further specifically configured to:

and displaying the first touch control assembly and the second touch control assembly in a touch control layer on the display interface.

Optionally, the apparatus further includes:

and the first determining module is used for determining the operation habits of the user to which the dictation equipment belongs according to historical operation data.

And the second determining module is used for determining the display position of the touch layer according to the operation habit of the user.

And the third display module is used for displaying the touch control layer on the display interface based on the display position.

Optionally, the apparatus further includes:

and the acquisition module is used for responding to the monitoring that the target dictation content in the dictation service interface is triggered and acquiring an audio data set corresponding to the target dictation content.

Optionally, the second playing module 440 is specifically configured to:

under the condition that the current dictation mode is random dictation, determining any audio data which is not played in the audio data set as the second audio data;

alternatively, the first and second electrodes may be,

and when the current dictation mode is sequential dictation, determining that audio data which is adjacent to the first audio data and is positioned behind the first audio data in an audio sequence contained in the audio data set is the second audio data.

Optionally, the first determining module is further configured to:

in response to receiving a dictation data detection instruction, determining sequence data to be detected and the target dictation content;

and determining and displaying a detection result corresponding to the sequence data according to the target dictation content.

Optionally, the first determining module is specifically configured to:

acquiring reference content associated with the target dictation content, wherein the similarity between a first character in the reference content and a second character in the target dictation content is greater than a threshold value;

and determining and displaying a detection result corresponding to the sequence data according to the target dictation content and the reference content.

The functions and specific implementation principles of the modules in the embodiments of the present disclosure may refer to the embodiments of the methods, and are not described herein again.

The control device of the dictation equipment in the embodiment of the disclosure may play first audio data in an audio data set corresponding to target dictation content at preset time intervals, display the first touch component and the second touch component on the display interface when the playing frequency of the first audio data reaches a first set value, display the target dictation content and the third touch component in the prompt box when the first touch component is triggered, then close the prompt box when the third touch component is triggered, and return to execute the operation of playing the first audio data until it is monitored that the second touch component is triggered to play the second audio data in the audio data set. Therefore, the method not only realizes the automatic dictation of the user by utilizing the dictation equipment, not only saves the time of the user and improves the dictation efficiency, but also can reproduce the target dictation content in real time according to the requirements of the user in the dictation process, thereby deepening the mastering degree of the user on the dictation content.

FIG. 5 shows a schematic block diagram of an example dictation device 500 that may be used to implement embodiments of the present disclosure. Dictation machine is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Dictation devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 executes the respective methods and processes described above, such as the control method of the dictation apparatus. For example, in some embodiments, the control method of a dictation apparatus may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the control method of the dictation apparatus described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the control method of the dictation device by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

According to the technical scheme, the first audio data in the audio data set corresponding to the target dictation content can be played according to the preset time interval, the first touch control assembly and the second touch control assembly are displayed on the display interface under the condition that the playing frequency of the first audio data reaches the first set value, the target dictation content and the third touch control assembly can be displayed in the prompt box under the condition that the first touch control assembly is triggered, then the prompt box can be closed under the condition that the third touch control assembly is triggered, the operation of playing the first audio data is returned to be executed until the second touch control assembly is triggered to play the second audio data in the audio data set. Therefore, the method not only realizes the automatic dictation of the user by utilizing the dictation equipment, not only saves the time of the user and improves the dictation efficiency, but also can reproduce the target dictation content in real time according to the requirements of the user in the dictation process, thereby deepening the mastering degree of the user on the dictation content.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of controlling a dictation apparatus comprising:

under the condition that the playing times of the first audio data reach a first set value, displaying a first touch control assembly and a second touch control assembly in a touch control layer of a display interface;

and in response to the fact that the third touch assembly is monitored to be triggered, closing the prompt box, and returning to execute the operation of playing the first audio data until the second touch assembly is monitored to be triggered, and playing second audio data in the audio data set, wherein the second audio data is determined according to a dictation mode selected by a user and is different from the first audio data.

2. The method of claim 1, wherein prior to displaying the first touch component and the second touch component on the display interface, further comprising:

displaying a fourth touch control assembly and the second touch control assembly in a touch control layer of the display interface under the condition that the playing times of the first audio data reach a second set value, wherein the second set value is smaller than the first set value;

and in response to monitoring that the fourth touch control assembly is triggered, returning to execute the operation of playing the first audio data until the playing times reach the first set value.

3. The method of claim 2, wherein said returning to perform said operation of playing said first audio data is preceded by:

replacing the first audio data with the multimedia data.

4. The method of claim 1, further comprising:

determining the operation habit of a user to which the dictation equipment belongs according to historical operation data;

determining the display position of the touch layer according to the operation habit of the user;

and displaying the touch control layer on the display interface based on the display position.

5. The method according to any one of claims 1-4, further comprising, before playing the first audio data in the audio data set corresponding to the target dictation content at the preset time interval:

and in response to monitoring that the target dictation content in the dictation service interface is triggered, acquiring an audio data set corresponding to the target dictation content.

6. The method of any of claims 1-4, wherein said playing the second audio data in the set of audio data comprises:

alternatively, the first and second electrodes may be,

7. The method of any of claims 1-4, further comprising:

8. The method of claim 7, wherein the determining and displaying the detection result corresponding to the sequence data according to the target dictation content comprises:

9. A control apparatus for a dictation device, comprising:

the first display module is used for displaying the first touch control assembly and the second touch control assembly in a touch control layer of a display interface under the condition that the playing frequency of the first audio data reaches a first set value;

and the second playing module is used for closing the prompt box in response to the fact that the third touch component is monitored to be triggered, returning to execute the operation of playing the first audio data until the second touch component is monitored to be triggered, and playing second audio data in the audio data set, wherein the second audio data is determined according to a dictation mode selected by a user and is different from the first audio data.

10. A dictation apparatus comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

11. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-8.

12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.