CN116778498A - Playing method and device of open format document, electronic equipment and storage medium - Google Patents

Playing method and device of open format document, electronic equipment and storage medium Download PDF

Info

Publication number
CN116778498A
CN116778498A CN202310497832.8A CN202310497832A CN116778498A CN 116778498 A CN116778498 A CN 116778498A CN 202310497832 A CN202310497832 A CN 202310497832A CN 116778498 A CN116778498 A CN 116778498A
Authority
CN
China
Prior art keywords
information
text information
open
playing
format document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310497832.8A
Other languages
Chinese (zh)
Inventor
宋敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuxin Kunpeng Beijing Information Technology Co ltd
Original Assignee
Fuxin Kunpeng Beijing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuxin Kunpeng Beijing Information Technology Co ltd filed Critical Fuxin Kunpeng Beijing Information Technology Co ltd
Priority to CN202310497832.8A priority Critical patent/CN116778498A/en
Publication of CN116778498A publication Critical patent/CN116778498A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a playing method, a device, electronic equipment and a storage medium of an open format document, and relates to the technical field of open format document processing, wherein the method comprises the following steps: performing word recognition on the open format document, and extracting initial text information of the open format document; performing word segmentation processing on the initial text information to generate word segmentation results; carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information; generating target text information based on semantic information and grammar structure information; the target text information accords with natural language description rules; generating voice information corresponding to the target text information based on the target text information; and playing the voice information. Through the method, the content of the open-format document is efficiently, accurately and naturally played by combining various means such as character recognition, natural language processing and voice synthesis, so that the accessibility and the use convenience of the open-format document are improved.

Description

Playing method and device of open format document, electronic equipment and storage medium
Technical Field
The present application relates to the field of open format document processing technologies, and in particular, to a method and apparatus for playing an open format document, an electronic device, and a storage medium.
Background
With the widespread use of electronic documents, more and more people need to play document contents through electronic devices. Among them, an Open-layout Document (OFD) is an Open electronic Document format, and has the characteristics of rich layout and being capable of embedding multimedia, so that the OFD is receiving more and more attention.
In the related art, when a user plays the content of an OFD document by using an electronic device, the accuracy of playing the document content is low, so that the experience of the user is poor.
Therefore, how to improve the accuracy of OFD document playing is a current urgent problem to be solved.
Disclosure of Invention
Aiming at the problems existing in the prior art, the embodiment of the application provides a playing method, a device, electronic equipment and a storage medium of an open format document.
The application provides a playing method of an open format document, which comprises the following steps:
performing word recognition on an open format document, and extracting initial text information of the open format document;
performing word segmentation processing on the initial text information to generate a word segmentation result corresponding to the initial text information;
carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result;
generating the target text information based on the semantic information and the grammar structure information; the target text information accords with natural language description rules;
generating voice information corresponding to the target text information based on the target text information;
and playing the voice information.
Optionally, the performing text recognition on the open-format document, extracting initial text information of the open-format document includes:
converting the open format document into an extensible markup language file;
and carrying out optical character recognition on the extensible markup language file, and extracting the initial text information.
Optionally, the generating, based on the target text information, voice information corresponding to the target text information includes:
inputting the target text information into a target voice synthesis model to obtain the voice information output by the target voice synthesis model; the target speech synthesis model is obtained by training an initial speech synthesis model based on a text sample and a speech sample corresponding to the text sample.
Optionally, after the generating the voice information corresponding to the target text information, the method further includes:
and carrying out enhancement processing on the voice information to generate optimized voice information.
Optionally, the method further comprises:
and responding to a first instruction input by a user, and marking the open layout document.
Optionally, the method further comprises:
and responding to a second instruction input by a user, and controlling the play attribute of the voice information, wherein the play attribute comprises the play rate of the voice information and the play mode of the voice information.
The application also provides a playing device of the open format document, which comprises:
the extraction module is used for carrying out character recognition on the open format document and extracting initial text information of the open format document;
the first generation module is used for carrying out word segmentation processing on the initial text information and generating word segmentation results corresponding to the initial text information;
the second generation module is used for carrying out semantic analysis and grammar analysis on the word segmentation result and generating semantic information and grammar structure information corresponding to the word segmentation result;
the third generation module is used for generating target text information based on the semantic information and the grammar structure information; the target text information accords with natural language description rules;
a fourth generation module, configured to generate, based on the target text information, voice information corresponding to the target text information;
and the playing module is used for playing the voice information.
The application also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the playing method of the open format document when executing the program.
The present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of playing an open-layout document as described in any of the above.
The application also provides a computer program product comprising a computer program which when executed by a processor implements a method of playing an open layout document as described in any one of the above.
According to the method, the device, the electronic equipment and the storage medium for playing the open format document, provided by the application, the initial text information of the open format document is extracted, word segmentation processing is carried out on the initial text information, and a word segmentation result corresponding to the initial text information is generated; then, carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result, and generating target text information conforming to natural language description rules based on the semantic information and the grammar structure information; then converting the target text information into corresponding voice information by utilizing a voice synthesis technology, thereby realizing the playing of the voice information; that is, by combining various means such as character recognition, natural language processing, voice synthesis and the like on the open-format document, the content of the open-format document is efficiently, accurately and naturally played, the accuracy of playing the content of the open-format document is improved, and the accessibility and the use convenience of the open-format document are further improved.
Drawings
In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for playing an open format document according to the present application;
FIG. 2 is a second flowchart of a method for playing an open format document according to the present application;
FIG. 3 is a schematic structural diagram of a playing device for an open format document provided by the application;
fig. 4 is a schematic structural diagram of an electronic device provided by the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The application aims to provide a playing method of an OFD format document, so that people can listen to the content in the OFD document through voice, and the accessibility and the use convenience of the document are improved.
The method for playing the open format document provided by the application is specifically described below with reference to fig. 1 to 2. Fig. 1 is a schematic flow chart of a method for playing an open format document according to the present application, as shown in fig. 1, the method includes steps 101 to 104, where:
and 101, performing word recognition on the open format document, and extracting initial text information of the open format document.
Firstly, it should be noted that the execution body of the present application may be any electronic device capable of playing an open format document, for example, any one of a smart phone, a smart watch, a desktop computer, a laptop computer, and the like.
In the related art, when a user uses an electronic device to play the content of an OFD document, the accuracy of playing the content of the document is low, so that the experience of the user is poor.
Therefore, in order to improve the accuracy of the OFD document playing, in this embodiment, first, text recognition needs to be performed on the OFD document, and initial text information in the OFD document is extracted, where the initial text information includes text content of the OFD document and layout typesetting information.
It should be noted that, the OFD document is not easy to be tampered as a layout file in the process of uploading, and generally only a few notes or signatures and other information are added on the OFD document, and the layout file is opened on different terminals such as software, computers and the like, so that typesetting and content are highly consistent.
In one possible implementation manner of the embodiment of the application, after the initial text information of the OFD document is extracted, the content of the extracted initial text information needs to be optimized, for example, the content of the initial text information is subjected to processing such as de-duplication, de-noising, error correction and the like, so that the accuracy of the playing of the OFD document can be ensured.
Step 102, performing word segmentation processing on the initial text information to generate a word segmentation result corresponding to the initial text information.
In this embodiment, after the initial text information of the OFD document is extracted, word segmentation processing is first required to be performed on the initial text information, so as to generate a word segmentation result corresponding to the initial text information, where the word segmentation result may be a plurality of words, phrases or a combination of words and phrases.
There are various methods for word segmentation of the initial text information, for example, the initial text information can be matched with a preset keyword database, so as to realize word segmentation; for another example, keywords in the initial text information may be extracted using a word Frequency-inverse text Frequency (Term Frequency ≡ Inverse Document Frequency, TF-IDF), thereby realizing word segmentation; for another example, the initial text information may be input to a neural network model, such that the neural network model performs word segmentation on the initial text information. The method for word segmentation of the initial text information is not limited.
And 103, carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result.
In this embodiment, after the initial text information is segmented, it is possible that the extracted initial text information does not conform to the daily language description rules of people.
Therefore, in order to improve the natural smoothness of the OFD document playing, semantic analysis and grammar analysis are required to be performed on the word segmentation result, semantic information and grammar structure information corresponding to the word segmentation result are generated, and then target text information conforming to the natural language description rule of people is generated based on the semantic information and the grammar structure information.
There are various methods for semantic analysis and grammar analysis of word segmentation results. Specifically, the method for semantic analysis and grammar analysis comprises the following steps: lexical analysis, syntactic analysis, speech analysis, context analysis, etc.
a) Lexical analysis includes two aspects, namely "lexical analysis" and "lexical analysis". The morphological analysis is mainly represented by analyzing prefixes, suffixes and the like of words, and the vocabulary analysis is represented by controlling the whole vocabulary system, so that the characteristics of word segmentation results can be accurately analyzed.
b) Syntactic analysis is the analysis of lexical phrases to identify the syntactic structure of sentences to implement the process of automatic syntactic analysis.
c) The language analysis is a higher-level linguistic analysis by adding analysis on context, language background, context and the like relative to semantic analysis, namely, extracting additional information such as images, interpersonal relations and the like from the structures of articles. It associates the content of the sentence with the details of real life to form a dynamic ideographic structure.
d) Context analysis, which refers primarily to the technique of analyzing a large number of "gaps" outside of the original query language in order to more accurately interpret the desired query language. These "gaps" include general knowledge, domain-specific knowledge, the needs of the querying user, and the like.
104, generating target text information based on the semantic information and the grammar structure information; the target text information accords with natural language description rules.
In this embodiment, after extracting the initial text information in the OFD document, it is possible that the extracted initial text information does not conform to the daily language description rule of people.
In order to improve the natural smoothness of the OFD document playing, target text information conforming to the natural language description rule of people needs to be generated based on semantic information and grammar structure information.
For example, the text content of the extracted initial text information is "good weather today in afternoon"; because the text content does not accord with the daily language description rule of people, in order to improve the natural smoothness of the OFD document, the text content needs to be subjected to natural language processing based on semantic information and grammar structure information, and becomes target text information which accords with the language description rule, namely, the weather in the afternoon is good.
And 105, generating voice information corresponding to the target text information based on the target text information.
In this embodiment, after the target text information is acquired, the target text information needs to be converted into corresponding voice information by using a voice synthesis technology.
And 106, playing the voice information.
After converting the target text information into the corresponding voice information, the voice information is played.
According to the method for playing the open format document, provided by the application, the initial text information of the open format document is extracted, word segmentation processing is carried out on the initial text information, and a word segmentation result corresponding to the initial text information is generated; then, carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result, and generating target text information conforming to natural language description rules based on the semantic information and the grammar structure information; then converting the target text information into corresponding voice information by utilizing a voice synthesis technology, thereby realizing the playing of the voice information; that is, by combining various means such as character recognition, natural language processing, voice synthesis and the like on the open-format document, the content of the open-format document is efficiently, accurately and naturally played, the accuracy of playing the content of the open-format document is improved, and the accessibility and the use convenience of the open-format document are further improved.
The method for playing the open format document can realize high-efficiency, accurate and natural reading of the content of the OFD document, thereby facilitating people to acquire the content of the document and improving the use efficiency of the document. The method can be applied to various occasions such as education, office work, reading and the like, and has wide application prospect and market demand.
Optionally, in one possible implementation manner of the embodiment of the present application, the text recognition is performed on the open format document, and the extracting of the initial text information of the open format document may be specifically implemented through the following steps 1) to 2):
step 1), converting the open format document into an extensible markup language file;
and 2) carrying out optical character recognition on the extensible markup language file, and extracting the initial text information.
In this embodiment, the extensible markup language (Extensible Markup Language, XML) file is a source language that allows users to define their own markup language. The XML file has the advantages of good expandability, separation of content and form, compliance with strict grammar requirements, good security and the like.
The character recognition is carried out on the XML file by utilizing the optical character recognition (Optical Character Recognition, OCR) technology, so that the initial text information of the OFD document can be extracted.
After the initial text information is extracted, the processing such as de-duplication, de-noising, error correction and the like is needed to be carried out on the initial text information so as to improve the accuracy of subsequent voice playing.
Optionally, the generating, based on the target text information, the voice information corresponding to the target text information may be specifically implemented by:
inputting the target text information into a target voice synthesis model to obtain the voice information output by the target voice synthesis model; the target speech synthesis model is obtained by training an initial speech synthesis model based on a text sample and a speech sample corresponding to the text sample.
In this embodiment, the types of the target speech synthesis model are various, such as Tactoron 2, transducer TTS, deep Voice 3, etc. The type of the target speech synthesis model is not particularly limited in the present application.
Specifically, the target speech synthesis model is trained by: inputting a text sample and a voice sample corresponding to the text sample into an initial voice synthesis model for iterative training until the initial voice synthesis model converges, for example, a loss function corresponding to the initial voice synthesis model reaches a preset threshold value, so as to generate a target voice synthesis model.
In addition to generating speech information using a target speech synthesis model, there are a variety of ways to convert target text information into speech information. For example, a method for mapping text into speech based on a model text-to-speech using a statistical pronunciation model based on a speech feature library; for another example, phoneme synthesis, by dividing a text into units, then generating speech according to rules, assembling the units into a complete sentence; for another example, text-to-speech based on a hidden Markov model (Hidden Markov Model, HMM) is accomplished by converting text into a parametric representation based on the HMM and then parsing in a context.
In the above embodiment, the target text information is input to the target speech synthesis model, so that the target text information can be converted into speech information, and the OFD document content can be played.
Optionally, after the generating the voice information corresponding to the target text information, the method further includes:
and carrying out enhancement processing on the voice information to generate optimized voice information.
Specifically, there are various ways to enhance the voice information, for example, denoising and enhancing the voice information, so as to improve the quality of the voice information.
Optionally, the embodiment of the application further provides an interaction mode with a user, which specifically includes at least one implementation mode of the following:
and in the mode 1, the open format document is marked in response to a first instruction input by a user.
For example, after responding to the first instruction, the content of the open format document is marked by drawing lines, the ground color is added, the color of the document font is changed, and the like.
And 2, responding to a second instruction input by a user, and controlling the play attribute of the voice information, wherein the play attribute comprises the play rate of the voice information and the play mode of the voice information.
For example, there are various ways to control the playing rate of the voice information, such as pausing, continuing, fast-forwarding, fast-rewinding, etc. the voice information being played; the user can conveniently control the playing progress.
The playing mode of the voice information, such as cyclic playing, random playing, etc.
In the embodiment, the interaction function with the user is increased by responding to the first instruction and/or the second instruction input by the user, so that the user can conveniently mark the open format document, control the playing speed of the open format document and the playing mode of the voice information, and the accessibility and the use convenience of the open format document are improved.
Fig. 2 is a second flowchart of a method for playing an open format document according to the present application, as shown in fig. 2, the method includes steps 201 to 210, in which:
step 201, converting the open format document into an extensible markup language file.
And 202, performing optical character recognition on the extensible markup language file, and extracting initial text information.
And 203, optimizing the initial text information to generate optimized initial text information.
Specifically, there are various ways to optimize the initial text information, for example, performing processes such as de-duplication, de-noising, and error correction on the content of the initial text information, so that the accuracy of recognizing the text of the open format document can be improved.
And 204, performing word segmentation processing on the initial text information to generate a word segmentation result corresponding to the initial text information.
And 205, carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result.
It should be noted that, the execution sequence of the semantic analysis and the grammar analysis is not sequential, that is, the semantic analysis may be performed first, the grammar analysis may be performed first, or the semantic analysis and the grammar analysis may be performed in parallel. The present application is not particularly limited thereto.
Step 206, generating target text information based on the semantic information and the grammar structure information.
Step 207, inputting the target text information into a target voice synthesis model to obtain voice information output by the target voice synthesis model; the target speech synthesis model is obtained by training an initial speech synthesis model based on a text sample and a speech sample corresponding to the text sample.
And step 208, enhancing the voice information to generate optimized voice information.
Specifically, there are various ways to enhance the voice information, for example, denoising and enhancing the voice information, so as to improve the quality of the voice information.
And step 209, labeling the open layout document in response to a first instruction input by a user.
For example, after responding to the first instruction, the content of the open format document is marked by drawing lines, the ground color is added, the color of the document font is changed, and the like.
Step 210, responding to a second instruction input by the user, and controlling the playing attribute of the voice information, wherein the playing attribute comprises the playing rate of the voice information and the playing mode of the voice information.
For example, there are various ways to control the playing rate of the voice information, such as pausing, continuing, fast-forwarding, fast-rewinding, etc. the voice information being played; the user can conveniently control the playing progress. The playing mode of the voice information, such as cyclic playing, random playing, etc.
It should be noted that, the execution sequence of step 209 and step 210 is not sequential, that is, step 209 may be executed first, step 210 may be executed first, or step 209 and step 210 may be executed in parallel. The present application is not particularly limited thereto.
According to the method for playing the open format document, provided by the application, the initial text information of the open format document is extracted, word segmentation processing is carried out on the initial text information, and a word segmentation result corresponding to the initial text information is generated; then, carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result, and generating target text information conforming to natural language description rules based on the semantic information and the grammar structure information; then converting the target text information into corresponding voice information by utilizing a voice synthesis technology, thereby realizing the playing of the voice information; that is, by combining various means such as character recognition, natural language processing, voice synthesis and the like on the open-format document, the content of the open-format document is efficiently, accurately and naturally played, and the accuracy of playing the content of the open-format document is improved; meanwhile, by responding to the first instruction and/or the second instruction input by the user, the interactive function with the user is increased, the user can conveniently label the open format document, control the playing speed of the open format document and the playing mode of the voice information, and accessibility and use convenience of the open format document are improved.
The method for playing the open format document can realize high-efficiency, accurate and natural reading of the content of the OFD document, thereby facilitating people to acquire the content of the document and improving the use efficiency of the document. The method can be applied to various occasions such as education, office work, reading and the like, and has wide application prospect and market demand.
The device for playing the open format document provided by the application is described below, and the device for playing the open format document described below and the method for playing the open format document described above can be referred to correspondingly. Fig. 3 is a schematic structural diagram of a playing device for an open format document according to the present application, as shown in fig. 3, the playing device 300 for an open format document includes: an extraction module 301, a first generation module 302, a second generation module 303, a third generation module 304, a fourth generation module 305, and a play module 306, wherein:
the extraction module 301 is configured to perform text recognition on an open format document, and extract initial text information of the open format document;
the first generating module 302 is configured to perform word segmentation processing on the initial text information, and generate a word segmentation result corresponding to the initial text information;
the second generating module 303 is configured to perform semantic analysis and syntax analysis on the word segmentation result, and generate semantic information and syntax structure information corresponding to the word segmentation result;
a third generating module 304, configured to generate target text information based on the semantic information and the grammar structure information; the target text information accords with natural language description rules;
a fourth generating module 305, configured to generate, based on the target text information, voice information corresponding to the target text information;
and the playing module 306 is used for playing the voice information.
According to the playing device of the open format document, the initial text information of the open format document is extracted, word segmentation processing is carried out on the initial text information, and word segmentation results corresponding to the initial text information are generated; then, carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result, and generating target text information conforming to natural language description rules based on the semantic information and the grammar structure information; then converting the target text information into corresponding voice information by utilizing a voice synthesis technology, thereby realizing the playing of the voice information; that is, by combining various means such as character recognition, natural language processing, voice synthesis and the like on the open-format document, the content of the open-format document is efficiently, accurately and naturally played, the accuracy of playing the content of the open-format document is improved, and the accessibility and the use convenience of the open-format document are further improved.
Optionally, the extracting module 301 is further configured to:
converting the open format document into an extensible markup language file;
and carrying out optical character recognition on the extensible markup language file, and extracting the initial text information.
Optionally, the fourth generating module 305 is further configured to:
inputting the target text information into a target voice synthesis model to obtain the voice information output by the target voice synthesis model; the target speech synthesis model is obtained by training an initial speech synthesis model based on a text sample and a speech sample corresponding to the text sample.
Optionally, the apparatus further comprises:
and the enhancement module is used for carrying out enhancement processing on the voice information and generating optimized voice information.
Optionally, the apparatus further comprises:
and the first response module is used for responding to a first instruction input by a user and marking the open format document.
Optionally, the apparatus further comprises:
and the second response module is used for responding to a second instruction input by a user and controlling the play attribute of the voice information, wherein the play attribute comprises the play rate of the voice information and the play mode of the voice information.
Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include: processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform a method of playing an open layout document, the method comprising: performing word recognition on an open format document, and extracting initial text information of the open format document; performing word segmentation processing on the initial text information to generate a word segmentation result corresponding to the initial text information; carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result; generating target text information based on the semantic information and the grammar structure information; the target text information accords with natural language description rules; generating voice information corresponding to the target text information based on the target text information; and playing the voice information.
Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present application also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute a method for playing an open format document provided by the above methods, where the method includes: performing word recognition on an open format document, and extracting initial text information of the open format document; performing word segmentation processing on the initial text information to generate a word segmentation result corresponding to the initial text information; carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result; generating target text information based on the semantic information and the grammar structure information; the target text information accords with natural language description rules; generating voice information corresponding to the target text information based on the target text information; and playing the voice information.
In still another aspect, the present application further provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method for playing an open-format document provided by the above methods, the method comprising: performing word recognition on an open format document, and extracting initial text information of the open format document; performing word segmentation processing on the initial text information to generate a word segmentation result corresponding to the initial text information; carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result; generating target text information based on the semantic information and the grammar structure information; the target text information accords with natural language description rules; generating voice information corresponding to the target text information based on the target text information; and playing the voice information.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. The playing method of the open format document is characterized by comprising the following steps:
performing word recognition on an open format document, and extracting initial text information of the open format document;
performing word segmentation processing on the initial text information to generate a word segmentation result corresponding to the initial text information;
carrying out semantic analysis and grammar analysis on the word segmentation result to generate semantic information and grammar structure information corresponding to the word segmentation result;
generating target text information based on the semantic information and the grammar structure information; the target text information accords with natural language description rules;
generating voice information corresponding to the target text information based on the target text information;
and playing the voice information.
2. The method for playing an open-format document according to claim 1, wherein the performing text recognition on the open-format document, extracting initial text information of the open-format document, includes:
converting the open format document into an extensible markup language file;
and carrying out optical character recognition on the extensible markup language file, and extracting the initial text information.
3. The method for playing an open-format document according to claim 1, wherein generating, based on the target text information, voice information corresponding to the target text information includes:
inputting the target text information into a target voice synthesis model to obtain the voice information output by the target voice synthesis model; the target speech synthesis model is obtained by training an initial speech synthesis model based on a text sample and a speech sample corresponding to the text sample.
4. The method for playing an open-format document according to claim 1, wherein after the generating the voice information corresponding to the target text information, the method further comprises:
and carrying out enhancement processing on the voice information to generate optimized voice information.
5. The method of playing an open layout document according to any one of claims 1 to 4, further comprising:
and responding to a first instruction input by a user, and marking the open layout document.
6. The method of playing an open layout document according to any one of claims 1 to 4, further comprising:
and responding to a second instruction input by a user, and controlling the play attribute of the voice information, wherein the play attribute comprises the play rate of the voice information and the play mode of the voice information.
7. A playback device for an open-format document, comprising:
the extraction module is used for carrying out character recognition on the open format document and extracting initial text information of the open format document;
the first generation module is used for carrying out word segmentation processing on the initial text information and generating word segmentation results corresponding to the initial text information;
the second generation module is used for carrying out semantic analysis and grammar analysis on the word segmentation result and generating semantic information and grammar structure information corresponding to the word segmentation result;
the third generation module is used for generating target text information based on the semantic information and the grammar structure information; the target text information accords with natural language description rules;
a fourth generation module, configured to generate, based on the target text information, voice information corresponding to the target text information;
and the playing module is used for playing the voice information.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of playing open-format documents according to any one of claims 1 to 6 when the program is executed by the processor.
9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method of playing open-format documents according to any of claims 1 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, implements a method of playing open-layout documents according to any one of claims 1 to 6.
CN202310497832.8A 2023-05-05 2023-05-05 Playing method and device of open format document, electronic equipment and storage medium Pending CN116778498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310497832.8A CN116778498A (en) 2023-05-05 2023-05-05 Playing method and device of open format document, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310497832.8A CN116778498A (en) 2023-05-05 2023-05-05 Playing method and device of open format document, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116778498A true CN116778498A (en) 2023-09-19

Family

ID=88012347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310497832.8A Pending CN116778498A (en) 2023-05-05 2023-05-05 Playing method and device of open format document, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116778498A (en)

Similar Documents

Publication Publication Date Title
US8924210B2 (en) Text processing using natural language understanding
CN108536654B (en) Method and device for displaying identification text
KR101259558B1 (en) apparatus and method for detecting sentence boundaries
Schuster et al. Japanese and korean voice search
KR20210146368A (en) End-to-end automatic speech recognition for digit sequences
EP3405912A1 (en) Analyzing textual data
CN111477216A (en) Training method and system for pronunciation understanding model of conversation robot
CN110852075B (en) Voice transcription method and device capable of automatically adding punctuation marks and readable storage medium
CN113268576B (en) Deep learning-based department semantic information extraction method and device
US20130173251A1 (en) Electronic device and natural language analysis method thereof
CN111881297A (en) Method and device for correcting voice recognition text
Anastasopoulos Computational tools for endangered language documentation
CN116909435A (en) Data processing method and device, electronic equipment and storage medium
US20230069113A1 (en) Text Summarization Method and Text Summarization System
Trivedi Fundamentals of Natural Language Processing
Granell et al. Study of the influence of lexicon and language restrictions on computer assisted transcription of historical manuscripts
Sung et al. Deploying google search by voice in cantonese
CN116778498A (en) Playing method and device of open format document, electronic equipment and storage medium
CN110428668B (en) Data extraction method and device, computer system and readable storage medium
US11250837B2 (en) Speech synthesis system, method and non-transitory computer readable medium with language option selection and acoustic models
Thatphithakkul et al. LOTUS-BI: A Thai-English code-mixing speech corpus
Dinarelli et al. Concept segmentation and labeling for conversational speech
Carson-Berndsen Multilingual time maps: portable phonotactic models for speech technology
Ruiz Domingo et al. FILENG: an automatic English subtitle generator from Filipino video clips using hidden Markov model
CN112988965B (en) Text data processing method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination