CN115394282A

CN115394282A - Information interaction method and device, teaching platform, electronic equipment and storage medium

Info

Publication number: CN115394282A
Application number: CN202210615936.XA
Authority: CN
Inventors: 马鸿图; 王佳静; 金庆文; 张震; 陈健
Original assignee: Beijing Whaty Technology Development Co ltd
Current assignee: Beijing Whaty Technology Development Co ltd
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2022-11-25

Abstract

The disclosure provides an information interaction method and device, a teaching platform, electronic equipment and a storage medium. The method comprises the following steps: acquiring image-text content, wherein the image-text content is rich text formed by mixing characters and pictures; extracting text information in the image-text content, and performing voice conversion on the extracted text information; and playing the voice of the target text information based on the target text information selected by the user. By extracting the text information in the image-text content of the image-text mixed arrangement and converting the text information into the voice content, the user can automatically play the voice of the target text information by clicking the target text information, thereby helping the visually impaired people to complete online learning of the image-text content.

Description

Information interaction method and device, teaching platform, electronic equipment and storage medium

Technical Field

The disclosure belongs to the technical field of teaching interaction, and particularly relates to an information interaction method and device, a teaching platform, electronic equipment and a storage medium.

Background

On-line teaching has been developed greatly at present, the popularization rate is higher and higher, and people also enjoy a great deal of convenience brought by on-line teaching. The online teaching platform provides various types of teaching resources such as pictures, texts, audio frequency, video and the like, and people can acquire knowledge from various teaching resources at any time and any place. However, the auxiliary teaching technology for the visually impaired people is still deficient, so that the text teaching contents provided in the teaching platform are difficult to directly acquire. Therefore, the voice reading technology for intelligently identifying the characters is provided in the image-text mixed mode, and a method for effectively solving the problem of difficulty in online learning of visually impaired people is provided.

Disclosure of Invention

The present disclosure is directed to at least one of the technical problems in the prior art, and provides an information interaction method and apparatus, a teaching platform, an electronic device, and a storage medium.

In one aspect of the present disclosure, an information interaction method is provided for a teaching platform, where the method includes:

acquiring image-text content, wherein the image-text content is rich text formed by mixing characters and pictures;

extracting text information in the image-text content, and performing voice conversion on the extracted text information;

and playing the voice of the target text information based on the target text information selected by the user.

In some embodiments, the voice converting the extracted text information includes:

and performing sentence division processing on the extracted text information, and performing voice conversion on each sentence.

In some embodiments, the performing the speech conversion on each of the clauses includes:

judging whether each clause exists in a preset voice library or not;

and responding to the fact that the current clause does not exist in the voice base, performing voice conversion on the clause, and storing the converted voice content into the voice base.

In some embodiments, before playing the corresponding voice of the instructional text information based on the instructional picture selected by the user, the method further comprises:

configuring playing parameter information, wherein the playing parameter information comprises at least one of language, tone and tone.

In some embodiments, the playing the voice of the target text information based on the target text information selected by the user includes:

playing the voices of the target text information in sequence based on the target text information selected by the user; during playback, the target text being played is highlighted.

In some embodiments, the method further comprises:

and responding to the click of the currently unplayed target text information by the user, and skipping to play the voice of the unplayed text information.

In another aspect of the present disclosure, an information interaction apparatus is provided for a teaching platform, the apparatus includes:

the acquisition module is used for acquiring image-text contents, wherein the image-text contents are rich texts in which characters and pictures are arranged in a mixed manner;

the conversion module is used for extracting text information in the image-text content and carrying out voice conversion on the extracted text information;

and the playing module is used for playing the voice of the target text information based on the target text information selected by the user.

In some embodiments, the conversion module is further specifically configured to: and performing clause processing on the extracted text information, and performing voice conversion on each clause respectively.

In some embodiments, the conversion module is further configured to: judging whether each clause exists in a preset voice library or not; and responding to the situation that the current clause does not exist in the voice library, performing voice conversion on the clause, and storing the converted voice content into the voice library.

In some embodiments, the apparatus further includes a configuration module configured to configure the playing parameter information, where the playing parameter information includes at least one of a language, a tone, and a tone.

In some embodiments, the playing module is further specifically configured to: playing voices of the target text information in sequence based on the target text information selected by a user; during playback, the target text being played is highlighted.

In some embodiments, the playing module is further specifically configured to: and responding to the click of the currently unplayed target text information by the user, and skipping to play the voice of the unplayed text information.

In another aspect of the present disclosure, a teaching platform is provided, the teaching platform comprising:

the teacher end is used for acquiring image-text contents, extracting text information in the image-text contents and performing voice conversion on the extracted text information;

the management terminal is used for configuring playing parameter information, and the playing parameter information comprises at least one of language, tone and tone;

and the user terminal is used for playing the voice of the target text information according to the configured playing parameter information based on the target text information selected by the user.

In some embodiments, the teacher end is further configured to: and performing clause processing on the extracted text information, and performing voice conversion on each clause respectively.

In some embodiments, the teacher end is further configured to: judging whether each clause exists in a preset voice library or not; and responding to the fact that the current clause does not exist in the voice base, performing voice conversion on the clause, and storing the converted voice content into the voice base.

In some embodiments, the ue is further configured to: playing the voices of the target text information in sequence based on the target text information selected by the user; during playback, the target text being played is highlighted.

In some embodiments, the user terminal is further specifically configured to: and responding to the click of the currently unplayed target text information by the user, and skipping to play the voice of the unplayed target text information.

In another aspect of the present disclosure, an electronic device is provided, including:

one or more processors;

a storage unit for storing one or more programs which, when executed by the one or more processors, enable the one or more processors to implement the method according to the preceding description.

In another aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is capable of carrying out the method according to the above-mentioned.

According to the information interaction method and device and the teaching platform, the text information in the image-text content in which the images and the texts are arranged in a mixed mode is extracted and converted into the voice content, and the user can automatically play the voice of the target text information by clicking the target text information, so that the visually impaired people can be helped to complete online learning of the image-text content.

Drawings

FIG. 1 is a block diagram illustrating an electronic device according to an embodiment of the disclosure

FIG. 2 is a flow chart of an information interaction method according to another embodiment of the present disclosure;

FIG. 3 is a flow chart of an information interaction method according to another embodiment of the present disclosure;

FIG. 4 is a flow chart of an information interaction method according to another embodiment of the present disclosure;

FIG. 5 is a flow chart of an information interaction method according to another embodiment of the disclosure;

fig. 6 is a schematic structural diagram of a teaching platform according to another embodiment of the present disclosure.

Detailed Description

For a better understanding of the technical aspects of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

First, an example electronic device for implementing the apparatus and method of the embodiments of the present disclosure is described with reference to fig. 1.

As shown in FIG. 1, electronic device 100 includes one or more processors 110, one or more memory devices 120, one or more input devices 130, one or more output devices 140, and the like, interconnected by a bus system 150 and/or other form of connection mechanism. It should be noted that the components and structures of the electronic device shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

Processor 110 may be a Central Processing Unit (CPU), or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 100 to perform desired functions.

Storage 120 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that a processor may execute to implement the client functionality (implemented by the processor) in the embodiments of the disclosure described below and/or other desired functionality. Various applications and various data, such as various data used and/or generated by the applications, etc., may also be stored in the computer-readable storage medium.

The input device 130 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 140 may output various information (e.g., images or sounds) to an outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

Next, an information interaction method S200 according to an embodiment of the present disclosure will be described with reference to fig. 2.

Specifically, as shown in fig. 2, an information interaction method S200 is used for a teaching platform, where the method S200 includes:

s210, obtaining image-text content, wherein the image-text content is rich text formed by mixing characters and pictures.

Specifically, in this step, the graphics contents may be some graphics contents generated by the teacher during the teaching process or according to the teaching materials, and the graphics contents belong to rich texts in a graphics and text mixed mode.

S220, extracting text information in the image-text content, and performing voice conversion on the extracted text information.

Specifically, in this step, the text information in the teletext content may be extracted first. And then, sentence division processing is carried out on the extracted text information, and voice conversion is respectively carried out on each sentence.

In another embodiment, the speech conversion processing is not necessarily required for all the clauses, and it may be determined whether each clause exists in a preset speech base, and in response to that the current clause does not exist in the speech base, the clause is subjected to speech conversion, and the converted speech content is stored in the speech base, so that the conversion cost may be saved.

And S230, playing the voice of the target text information based on the target text information selected by the user.

Specifically, in this step, after the user selects a certain target text message to be played, and the target text message has undergone the speech conversion process of step S220, the speech of the target text message will be automatically played.

In one embodiment, before playing the target text information, the method further comprises: configuring playing parameter information, wherein the playing parameter information comprises at least one of language, tone and tone.

In another embodiment, the playing the voice of the target text information based on the target text information selected by the user includes: playing voices of the target text information in sequence based on the target text information selected by a user; during playback, the target text being played is highlighted.

In another embodiment, the method further comprises: and responding to the click of the currently unplayed target text information by the user, and skipping to play the voice of the unplayed target text information.

The information interaction method is used for a teaching platform, text information in the image-text content of image-text mixed arrangement is extracted and converted into voice content, and a user can automatically play voice of the target text information by clicking the target text information, so that visually impaired people can be helped to complete online learning of the image-text content.

The information interaction method of the present disclosure is explained below with reference to fig. 3 to 5.

As shown in fig. 3, the voice conversion of the uploaded graphics context specifically includes the following steps:

1-1) editing image-text content applied in the teaching process;

1-2) storing and submitting image-text contents applied in the teaching process;

1-3) extracting text content in the graphics texts;

1-4) carrying out sentence division processing on the extracted text content;

1-5) checking whether each clause exists in the voice library;

1-6) if a clause is not in the speech pool, then performing speech conversion on the clause. If a clause is in the voice library, not processing the clause;

1-7) checking whether all clauses generated after the extracted text content clause processing complete voice conversion;

1-8) if all the clauses generated after the clause processing of the extracted text content are completely converted into voice, marking the voice conversion state of the image and text as completed. And if all the clauses generated after the clauses of the extracted text content are processed do not complete the voice conversion, manually triggering to re-execute the voice conversion until all the clauses complete the voice conversion.

1-9) completing voice conversion by the graphics and texts.

As shown in fig. 4, configuring the language, timbre, tone, etc. of the speech reading specifically includes the following steps:

1-1) selecting a voice intelligent reading component;

1-2) starting an intelligent reading voice function;

1-3) selecting proper language type, reading role and language style according to teaching requirements;

1-4) storing configuration information;

1-5) Intelligent Voice configuration takes effect.

As shown in fig. 5, the method for reading the selected image-text content includes the following steps:

1-1) opening the voice conversion state mark as the completed image-text;

1-2) clicking a space key in a keyboard to start voice reading of the text and image contents;

1-3) reading the image-text content by voice, and positioning the current voice reading content by the system;

1-4) clicking a space key in a keyboard to stop the voice reading of the image-text content, and supporting breakpoint continuous reading;

1-5) completing voice reading by the pictures and texts.

The following is a detailed description of the voice conversion of the graphics context.

The image-text content used in teaching is edited in the teaching platform by the teacher himself, the newly added image-text content generates an image-text message to be converted in the message queue, and the teacher end displays the voice conversion state of the image-text as being converted. After the edited image-texts are stored again, executing an abstract algorithm on all the contents of the edited image-texts to obtain an abstract value, comparing the abstract value of the edited image-texts with the abstract value of the original image-texts, if the abstract values are the same as the abstract values of the original image-texts, generating no conversion message, and displaying the voice conversion state of the image-texts to be completed by a teacher end; if the two are different, a message of the image-text to be converted is generated, and the teacher end displays that the voice conversion state of the image-text is in conversion.

The teletext to speech service asynchronously consumes messages in the message queue. When the image-text-to-voice service consumes a message, the image-text content is obtained according to the ID of the image-text, and the html rich text is analyzed to extract the plain text content. Calling a clause algorithm to perform clause processing on the pure text content, performing a summarization algorithm one sentence by one sentence to obtain a summary value, searching a local cache voice library by using the summary value, and if the same summary value exists in the voice library, not performing secondary conversion on the clause; if the same abstract value does not exist in the voice library, the TTS service is called to carry out voice conversion sentence by sentence according to the voice conversion configuration of the management end. The mechanism can improve the overall conversion efficiency of the image and text. After all the contents of the image-text are converted, the current message is consumed successfully, and the teacher end displays that the voice conversion state of the image-text is completed.

In another aspect of the present disclosure, an information interaction apparatus is provided for a teaching platform. The device may be adapted to the information interaction method described above, and reference may be made to the related description above, which is not described herein again. The device comprises an acquisition module, a conversion module and a playing module.

The acquisition module is used for acquiring the image-text content, wherein the image-text content is rich text formed by mixing characters and pictures. The conversion module is used for extracting text information in the image-text content and carrying out voice conversion on the extracted text information. The playing module is used for playing the voice of the target text information based on the target text information selected by the user.

The information interaction device of the embodiment is used for a teaching platform, text information in the image-text content of image-text mixed arrangement is extracted and converted into voice content, and a user can automatically play voice of the target text information by clicking the target text information, so that visually impaired people can be helped to complete online learning of the image-text content.

In some embodiments, the conversion module is further specifically configured to: and performing sentence division processing on the extracted text information, and performing voice conversion on each sentence.

In some embodiments, the conversion module is further specifically configured to: judging whether each clause exists in a preset voice library or not; and responding to the situation that the current clause does not exist in the voice library, performing voice conversion on the clause, and storing the converted voice content into the voice library.

In some embodiments, the playing module is further specifically configured to: playing the voices of the target text information in sequence based on the target text information selected by the user; during playback, the target text being played is highlighted.

In some embodiments, the playing module is further specifically configured to: and responding to the user clicking the unplayed target text information which is not played currently, and skipping to play the voice of the unplayed target text information.

In another aspect of the disclosure, as shown in fig. 6, a teaching platform 400 is provided, where the teaching platform 400 is suitable for the information interaction method described above, and reference may be made to the related description, which is not repeated herein. As shown in fig. 6, the tutor platform 400 includes a teacher end 410, a management end 420 and a user terminal 430.

The teacher end 410 is configured to obtain the text content, where the text content is rich text in which text and pictures are arranged in a mixed manner, extract text information in the text content, and perform voice conversion on the extracted text information. The management end 420 is configured to configure playing parameter information, where the playing parameter information includes at least one of a language, a tone, and a tone. The user terminal 430 is configured to play the voice of the target text information according to the configured play parameter information based on the target text information selected by the user.

According to the teaching platform, the text information in the image-text content in which the images and the texts are arranged in a mixed mode is extracted and converted into the voice content, and a user can automatically play the voice of the target text information by clicking the target text information, so that visually impaired people can be helped to complete online learning of the image-text content.

In some embodiments, as shown in fig. 6, the teacher end 410 is further configured to: and performing clause processing on the extracted text information, and performing voice conversion on each clause respectively.

In some embodiments, as shown in fig. 6, the teacher end 410 is further configured to: judging whether each clause exists in a preset voice library or not; and responding to the fact that the current clause does not exist in the voice base, performing voice conversion on the clause, and storing the converted voice content into the voice base.

In some embodiments, as shown in fig. 6, the user terminal 430 is further specifically configured to: playing the voices of the target text information in sequence based on the target text information selected by the user; during playback, the target text being played is highlighted.

In some embodiments, as shown in fig. 6, the user terminal 430 is further specifically configured to: and responding to the user clicking the unplayed target text information which is not played currently, and skipping to play the voice of the unplayed target text information.

In another aspect of the disclosure, an electronic device is provided that includes one or more processors; a storage unit for storing one or more programs which, when executed by the one or more processors, enable the one or more processors to implement the method according to the preceding description.

In another aspect of the disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method according to the above.

The computer readable storage medium may be any tangible medium that can contain or store a program, and may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, more specific examples include but are not limited to: a portable computer diskette, a hard disk, an optical fiber, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof.

The computer readable storage medium may also include a propagated data signal with computer readable program code embodied therein, for example, in a non-transitory form, such as in a carrier wave or in a carrier wave, wherein the carrier wave is any suitable carrier wave or carrier wave for carrying the program code.

It will be understood that the above embodiments are merely exemplary embodiments employed to illustrate the principles of the present disclosure, and the present disclosure is not limited thereto. It will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the disclosure, and these are to be considered as the scope of the disclosure.

Claims

1. An information interaction method is used for a teaching platform, and is characterized by comprising the following steps:

2. The method of claim 1, wherein the converting the extracted text information into speech comprises:

3. The method of claim 2, wherein said separately voice converting each of said clauses comprises:

judging whether each clause exists in a preset voice library or not;

and responding to the situation that the current clause does not exist in the voice library, performing voice conversion on the clause, and storing the converted voice content into the voice library.

4. The method of any of claims 1 to 3, wherein prior to playing the speech of the target textual information based on the target textual information selected by the user, the method further comprises:

5. The method according to any one of claims 1 to 3, wherein playing the voice of the target text information based on the target text information selected by the user comprises:

playing voices of the target text information in sequence based on the target text information selected by a user; during playback, the target text being played is highlighted.

6. The method of claim 5, further comprising:

and responding to the click of the currently unplayed target text information by the user, and skipping to play the voice of the unplayed target text information.

7. An information interaction device for a teaching platform, the device comprising:

8. A teaching platform, said teaching platform comprising:

9. An electronic device, comprising:

one or more processors;

a storage unit to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is able to carry out a method according to any one of claims 1 to 6.