CN111353281B

CN111353281B - Text conversion method and device, electronic equipment and storage medium

Info

Publication number: CN111353281B
Application number: CN202010112628.6A
Authority: CN
Inventors: 刘占一; 王海峰; 吴华; 赵亮; 徐新超; 刘智彬; 郭振
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2023-04-07
Anticipated expiration: 2040-02-24
Also published as: CN111353281A

Abstract

The application discloses a text conversion method, a text conversion device, electronic equipment and a storage medium, which relate to the field of artificial intelligence, wherein the method comprises the following steps: acquiring a first text in a written expression form; carrying out long sentence segmentation on the first text to obtain segmented short sentences; sequencing each divided short sentence according to the semantic consistency requirement; and generating a second text in a spoken expression form according to the sorted short sentences. By applying the scheme, the accuracy of the generated result can be improved.

Description

Text conversion method and device, electronic equipment and storage medium

Technical Field

The present application relates to computer application technologies, and in particular, to a text conversion method and apparatus, an electronic device, and a storage medium in the field of artificial intelligence.

Background

Many systems such as dialog systems need to use texts in spoken language expression form, such as sentences, but most of the current knowledge data are texts in written expression form, and the two texts have great difference in form, so that text conversion technology is needed to convert the texts in written expression form into the texts in spoken language expression form.

At present, conversion is usually realized by using a neural network model obtained by training under a large corpus, namely, an end-to-end conversion mode of deep learning is adopted. However, the generated result of the method has the problems of weak semantic association, poor fluency and the like, namely the accuracy of the generated result is poor.

Disclosure of Invention

In view of this, the present application provides a text conversion method, an apparatus, an electronic device, and a storage medium.

A method of text conversion, comprising:

acquiring a first text in a written expression form;

carrying out long sentence segmentation on the first text to obtain segmented short sentences;

sequencing the divided short sentences according to the semantic consistency requirement;

and generating a second text in a spoken expression form according to the sorted short sentences.

According to a preferred embodiment of the present application, the long-sentence segmentation of the first text includes: generating a syntax tree by parsing the first text; and determining each required short sentence according to the syntax tree.

According to a preferred embodiment of the present application, before sorting the divided phrases according to the semantic consistency requirement, the method further includes: and adjusting and optimizing each divided short sentence according to a preset rule.

According to a preferred embodiment of the present application, said ordering the divided phrases according to the semantic consistency requirement includes:

taking the starting identifier as a starting short sentence, and respectively forming short sentence pairs by the cut short sentences and the starting short sentence to obtain short sentence pairs corresponding to the starting short sentence, wherein each short sentence pair is the front of the starting short sentence;

aiming at each divided short sentence, respectively combining each short sentence except the short sentence in each divided short sentence with the short sentence to form a short sentence pair to obtain each short sentence pair corresponding to the short sentence, wherein each short sentence pair is the short sentence before;

respectively determining semantic consistency scores of the short sentence pairs;

and sequencing each divided short sentence according to the semantic consistency score.

According to a preferred embodiment of the present application, the determining semantic consistency scores of each short sentence pair respectively includes: and aiming at each statement pair, inputting the short statement pair into a first network model obtained by pre-training respectively to obtain semantic consistency scores of the output statement pairs.

According to a preferred embodiment of the present application, the sorting the divided phrases according to the semantic coherence score includes:

selecting a short sentence pair with the highest semantic consistency score from all short sentence pairs corresponding to the initial short sentence, and taking the short sentences except the initial short sentence in the selected short sentence pair as the first-order short sentences after sequencing;

taking the short sentence which is positioned at the first position after sequencing as a short sentence to be processed;

and aiming at the short sentence to be processed, performing the following preset processing:

determining whether short sentences which are not sequenced exist in the segmented short sentences, and if not, finishing sequencing;

and if so, selecting a short sentence pair with the highest semantic coherence score from all short sentence pairs corresponding to the short sentences to be processed, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences next to the short sentences to be processed after sequencing, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences to be processed, and repeatedly executing the preset processing.

According to a preferred embodiment of the present application, the generating the second text in the form of spoken expressions according to the ordered short sentences includes: and inputting each sequenced short sentence into a second network model obtained by pre-training to obtain the output second text.

A text conversion apparatus comprising: the device comprises a segmentation unit, a sorting unit and a generation unit;

the segmentation unit is used for acquiring a first text in a written expression form, and performing long sentence segmentation on the first text to obtain each segmented short sentence;

the sequencing unit is used for sequencing each divided short sentence according to the semantic consistency requirement;

and the generating unit is used for generating a second text in a spoken language expression form according to the sorted short sentences.

According to a preferred embodiment of the present application, the parsing unit generates a syntax tree by performing syntax analysis on the first text, and determines each required short sentence according to the syntax tree.

According to a preferred embodiment of the present application, the segmenting unit is further configured to adjust and optimize each segmented short sentence according to a predetermined rule.

According to a preferred embodiment of the present application, the sorting unit uses the starting identifier as a starting short sentence, and forms a short sentence pair with each of the cut short sentences and the starting short sentence to obtain each short sentence pair corresponding to the starting short sentence, where each short sentence pair is the starting short sentence in front; aiming at each divided short sentence, respectively combining each short sentence except the short sentence in each divided short sentence with the short sentence to form a short sentence pair to obtain each short sentence pair corresponding to the short sentence, wherein each short sentence pair is the short sentence before; respectively determining semantic consistency scores of the short sentence pairs; and sequencing each divided short sentence according to the semantic consistency score.

According to a preferred embodiment of the present application, the ordering unit inputs the phrase pairs into a first network model obtained by pre-training, respectively, for each phrase pair, so as to obtain semantic consistency scores of the output phrase pairs.

According to a preferred embodiment of the present application, the sorting unit selects a short sentence pair with the highest semantic consistency score from the short sentence pairs corresponding to the starting short sentence, and takes the short sentences except the starting short sentence in the selected short sentence pair as the first-order short sentences after sorting; taking the short sentence which is positioned at the first position after sequencing as a short sentence to be processed; aiming at the short sentence to be processed, the following preset processing is carried out: determining whether short sentences which are not sequenced exist in the cut short sentences, if not, finishing sequencing; and if so, selecting a short sentence pair with the highest semantic coherence score from all short sentence pairs corresponding to the short sentences to be processed, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences next to the short sentences to be processed after sequencing, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences to be processed, and repeatedly executing the preset processing.

According to a preferred embodiment of the present application, the generating unit inputs the sorted phrases into a second network model obtained by pre-training, so as to obtain the output second text.

An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.

A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.

One embodiment in the above application has the following advantages or benefits: by utilizing the syntactic relation of long sentences and the like, the composite long sentences are segmented into independent ideographic short sentences instead of nested sentences through long sentence segmentation, and a smooth expression effect is obtained through short sentence sequencing, so that the text in a spoken expression form is generated based on the sequenced short sentences, the accuracy of the generated result is improved, and the like; each cut short sentence can be obtained through the syntax tree, and each cut short sentence can be adjusted and optimized according to a preset rule, so that the accuracy of each obtained short sentence is improved; semantic consistency scoring can be carried out on short sentence pairs formed by two short sentences, and the short sentences are sequenced according to the semantic consistency scoring, so that the accuracy of sequencing results is improved; other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flow chart of an embodiment of a text conversion method described herein;

FIG. 2 is a diagram of a syntax tree according to the present application;

FIG. 3 is a schematic diagram illustrating semantic consistency scores for pairs of phrases identified in the present application;

FIG. 4 is a schematic diagram of a process for obtaining a second text by conversion according to the present application;

FIG. 5 is a schematic diagram illustrating a structure of an embodiment 500 of a text conversion apparatus according to the present application;

fig. 6 is a block diagram of an electronic device according to the method of the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1 is a flowchart of an embodiment of a text conversion method according to the present application. As shown in fig. 1, the following detailed implementation is included.

In 101, a first text in the form of a written expression is obtained.

At 102, the first text is subjected to long sentence segmentation to obtain segmented short sentences.

At 103, the divided phrases are sorted according to semantic consistency requirements.

At 104, a second text in spoken form is generated from the sorted phrases.

For convenience of description, in this embodiment, the text in the written expression to be converted is referred to as a first text, the first text may include only one sentence or may include a plurality of sentences, and the finally required text in the spoken expression is referred to as a second text.

Generally speaking, the written expression form is strict in expression, and focuses on grammar and syntax, while the spoken expression form has a shorter sentence pattern compared with the written expression form, each short sentence can be independently expressed, a nested sentence pattern is not used, the interpretation is easier, and the method is more suitable for being used in a dialog system and the like.

For the first text to be converted, the long sentence can be segmented first, so as to obtain the segmented short sentences. Preferably, a syntax tree may be generated by parsing the first text, and the desired phrases may be determined from the syntax tree.

Assume that the first text is: the bane was mainly played by Liu De Hua and Liangchaowei, telling that two men with disordered identities are the alert and the bedridden of the black society, and through a fierce fighting, they decided to retrieve their own story, and the two bedridden people were played by two leading roles.

Fig. 2 is a diagram of a syntax tree according to the present application. As shown in fig. 2, the large-granularity phrases in the syntactic relation can be segmented, the number of words covered by the phrases is controlled within a proper range, a simple sentence/short sentence can be formed, and the short sentences are extracted according to the result of syntactic analysis, which is specifically implemented in the prior art. The nodes in fig. 2 respectively represent: ROOTs (ROOT), verb Phrases (VP), noun Phrases (NP), preposition Phrases (PP), simple clauses (IP), punctuation (PU), common nouns (NN), verbs (VV), proper Nouns (NR), content tags (AS), adverb phrases (ADVP), "noun phrases (DNP) of", parallel conjunctions (CC), clauses (CP) centered on complements, complement tags (DEC), sign of attorney (DEG), and the like.

As shown in fig. 2, based on the syntax tree, each short sentence shown in a rectangular box can be obtained, including: the indifferent way is mainly played by Liudebhua and Liangchao, and teaches two men with disordered identities and a bedridden part with the police and the black society respectively through a fierce corner bucket, the people seek themselves, seek their stories, and two lying bottoms are decorated by two chief actors respectively.

Preferably, each divided short sentence can be further subjected to adjustment optimization according to a predetermined rule, and the adjustment optimization can include but is not limited to merging, adding subject, and the like.

For example, for each phrase shown in fig. 2, the optimization of the adjustment thereof may include: because two phrases of 'telling two men with disordered identities' and 'men are respectively the police and the bed of the black society' have obvious cross parts, the phrases can be combined to ensure that the semantic structure is more complete and the like, the combined phrases can be 'two men with disordered identities are respectively the bed of the police and the black society', and similarly, 'they do you find themselves' and 'find their own story' can be combined into 'they do you find themselves'; in addition, the subject language is lacked when the man goes through a fierce fighting, and the subject language of ' two men ' can be increased, so that a short sentence that the two men go through a fierce fighting ' is obtained.

And aiming at each cut short sentence, sequencing each cut short sentence according to semantic consistency requirements. The specific implementation can include: taking a START identifier (START) as a starting short sentence, and respectively forming a short sentence pair by each cut short sentence and the START to obtain each short sentence pair corresponding to the START, wherein each short sentence pair is in front of the START; aiming at each divided short sentence, respectively combining each short sentence except the short sentence in each divided short sentence with the short sentence to form a short sentence pair to obtain each short sentence pair corresponding to the short sentence, wherein each short sentence pair is the short sentence before; respectively determining semantic consistency scores of the short sentence pairs; and sequencing the divided short sentences according to the semantic consistency scores.

The method for determining semantic consistency scores of the short sentence pairs respectively may include: and inputting the short sentence pair into a first network model obtained by pre-training respectively aiming at each sentence pair, thereby obtaining the semantic consistency score of the output sentence pair. The specific type of the first network model is not limited, and may be determined according to actual needs.

For a phrase pair consisting of any two phrases a and B (the phrase a is preceding), when the obtained semantic consistency score is positive, it indicates that the phrase B is behind the phrase a, and the higher the semantic consistency score is, the higher the matching degree is, and when the obtained semantic consistency score is negative, it indicates that the phrase B should not be behind the phrase a.

Assuming that the following five phrases are obtained after the above processing: the bane is mainly played by Liudeb Hua and Liangchaowei, two men with disordered identities are respectively the police and the bedridden of the black society, the two men go through a fierce fighting, the two men try to retrieve the story, and the two bedridden people are respectively played by two main roles.

Fig. 3 is a schematic diagram of semantic consistency scores of the determined phrase pairs according to the present application. As shown in fig. 3, for START, the corresponding pairs of phrases are as follows: (START, no lane, by Liude Hua and Liangchaowei, START, two men with confused identities are the alert and the bed of the black society, respectively), (START, two men pass through a fierce corner fight), (START, their worry will seek their own story), (START, two bed backs are decorated by two main corners, respectively), the semantic consistency scores of each short sentence pair are: 0.9, 0.3, 0.1, 0.2, 0.6; for the phrase "without lane, bane and bang dynasty" the corresponding phrase pairs are as follows: (the men without lane mastered by Liu De Hua and Liang Xiang, the two men with confused identities are the false backs of the police and the black society, respectively), (the men without lane mastered by Liu De Hua and Liang Xiang, the two men with confused identities pass through a fierce fighting), (the men without lane mastered by Liu De Hua and Liang Xiang, the men with their fate about trying to retrieve their own stories), (the men without lane mastered by Liu De Hua and Liang Xiang, the two false backs are decorated by two main corners, respectively), and the semantic continuity scores of each short sentence pair are respectively: 0.6, 0.4, 0.3, 0.7; other semantic consistency scores are shown in FIG. 3.

After semantic consistency scores of sentence pairs are obtained, a ranking result with the highest scoring sequence score can be found from START, and the result is used as the ranking result of each short sentence.

Specifically, a phrase pair with the highest semantic consistency score may be selected from the phrase pairs corresponding to START, and the phrases except START in the selected phrase pair may be used as the first-ranked phrases; then, the short sentence at the first position after sequencing can be used as a short sentence to be processed; aiming at the short sentence to be processed, the following preset processing is carried out: determining whether short sentences which are not sequenced exist in the cut short sentences, if not, finishing sequencing; and if so, selecting a short sentence pair with the highest semantic coherence score from all short sentence pairs corresponding to the short sentences to be processed, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences which are positioned next to the short sentences to be processed after sequencing, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences to be processed, and repeatedly executing the preset processing.

As shown in fig. 3, the semantic consistency scores of each phrase pair corresponding to START are: 0.9, 0.3, 0.1, 0.2, 0.6,0.9, the highest value is obtained, and then the phrase "without border between Liu De Hua and Liang Wei in the phrase pair corresponding to 0.9 is taken as the first phrase after the ranking, and for the phrase" without border between Liu De Hua and Liang Wei in Liang Ziang ", the semantic continuity scores of the corresponding phrase pairs are: the values of 0.6, 0.4, 0.3, 0.7 and 0.7 are the highest, then the short sentence "two lying bottoms are decorated by two chief actors respectively" in the short sentence pair corresponding to 0.7 can be used as the short sentence which is at the second position after the ordering, and the ordering sequence of other short sentences can be determined in turn according to a similar mode.

After completion of the ranking, a second text in spoken form may be generated from the ranked phrases. Preferably, the sorted phrases may be input into a second network model obtained by pre-training, so as to obtain an output second text. The specific type of the second network model is not limited, and can be determined according to actual needs.

Preferably, the second network model can be obtained by means of pre-training + Fine-Tuning (Fine-Tuning). For example, the initial model may be trained using a large number of synonymous sentence pairs of short text in the pre-training stage, and the initial model may be Fine-tuned using a written expression form in a specific field and a corresponding short sentence pair in a spoken expression form in the Fine-Tuning stage, so as to obtain the final required second network model.

The sentences in the second text are all short and are easy to understand colloquially and fluent colloquially expressed sentences.

By combining the above introduction, the first text is taken as "the first text is played by liude hua and zhang wu dynasty without any lane, and the two men with disordered identities are respectively the police and the bedridden of the black society, and after a fierce fighting, the men seek to retrieve their own stories, and the two bedridden people are played by two chief actors respectively", and fig. 4 is a schematic diagram of the process of obtaining the second text by the conversion in the present application.

As shown in fig. 4, for the first text, the first text may be first long-sentence segmented to obtain segmented short sentences, and the segmented short sentences may be adjusted and optimized according to a predetermined rule to obtain the following short sentences: the bane is mainly played by Liudeb Hua and Liangchaowei, two men with disordered identities are respectively the police and the bedridden of the black society, the two men go through a fierce fighting, the two men try to retrieve the story, and the two bedridden people are respectively played by two main roles. Then, the short sentences can be sorted, and the sorting result is as follows: the bane-bang dynasty is mastered by Liudeb Hua and Liangchaowei, the two bedridden people are respectively mastered by two main actors, the two men with disordered identities are respectively the police and the bedridden people of the black society, the two men pass through a fierce fighting, and the people do not need to retrieve the own story. Finally, a second text can be generated according to the sorted short sentences: the movie without the gap is a movie mastered by Liu De Hua and Liangchaowei, two people play the roles of the police and the bed in the black society in the movie respectively, and the movie searches the two beds with disordered identities for own stories after going through a fierce corner fight.

It should be noted that the foregoing method embodiments are described as a series of acts or combinations for simplicity in explanation, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In a word, by adopting the scheme of the embodiment of the method, the syntactic relation of the long sentence and the like can be utilized, the composite long sentence can be divided into independent ideographic short sentences instead of nested sentences through the long sentence division, the smooth expression effect can be obtained through the short sentence sequencing, and the text in the spoken expression form is generated based on the sequenced short sentences, so that the accuracy of the generated result is improved; each cut short sentence can be obtained through the syntax tree, and each cut short sentence can be adjusted and optimized according to a preset rule, so that the accuracy of each obtained short sentence is improved; the semantic consistency scoring can be carried out on the short sentence pairs formed by every two short sentences, and the short sentences are sequenced according to the semantic consistency scoring, so that the accuracy of sequencing results is improved.

The above is a description of method embodiments, and the embodiments of the present application are further described below by way of apparatus embodiments.

Fig. 5 is a schematic structural diagram of a text conversion apparatus 500 according to an embodiment of the present application. As shown in fig. 5, includes: a slicing unit 501, a sorting unit 502, and a generating unit 503.

The segmentation unit 501 is configured to obtain a first text in a written expression form, and perform long sentence segmentation on the first text to obtain each segmented short sentence.

And the sorting unit 502 is configured to sort the divided short sentences according to the semantic consistency requirement.

A generating unit 503, configured to generate a second text in a spoken language expression form according to the sorted short sentences.

For the first text to be converted, the segmentation unit 501 may perform long-sentence segmentation on the first text to obtain segmented short sentences. Preferably, a syntax tree may be generated by parsing the first text, and the desired phrases may be determined from the syntax tree.

Preferably, the segmentation unit 501 may further perform adjustment optimization on each segmented phrase according to a predetermined rule, where the adjustment optimization may include, but is not limited to, merging, adding subject, and the like.

For each segmented short sentence, the sorting unit 502 may sort the segmented short sentences according to the semantic consistency requirement. The specific implementation can include: taking START as a starting short sentence, and respectively forming short sentence pairs by each cut short sentence and START to obtain each short sentence pair corresponding to START, wherein each short sentence pair is in front of START; aiming at each segmented short sentence, respectively combining each short sentence except the short sentence in each segmented short sentence with the short sentence to form a short sentence pair to obtain each short sentence pair corresponding to the short sentence, wherein each short sentence pair is the previous short sentence; respectively determining semantic consistency scores of the short sentence pairs; and sequencing the divided short sentences according to the semantic consistency scores.

The method for determining semantic consistency scores of the short sentence pairs respectively may include: and inputting the short sentence pair into a first network model obtained by pre-training aiming at each sentence pair so as to obtain the semantic consistency score of the output sentence pair. The specific type of the first network model is not limited, and may be determined according to actual needs.

For a phrase pair consisting of any two phrases a and B (the phrase a precedes), when the obtained semantic continuity score is positive, it indicates that the phrase B follows the phrase a, and the higher the semantic continuity score is, the higher the matching degree is, and when the obtained semantic continuity score is negative, it indicates that the phrase B should not follow the phrase a.

After obtaining the semantic consistency scores of the sentence pairs, the ranking unit 502 may find a ranking result with the highest scoring sequence score from START, and use the result as the ranking result of each short sentence. Specifically, a phrase pair with the highest semantic consistency score may be selected from the phrase pairs corresponding to START, and the phrases except START in the selected phrase pair may be used as the first-ranked phrases; then, the short sentence at the first position after sequencing can be used as a short sentence to be processed; aiming at the short sentence to be processed, the following preset processing is carried out: determining whether short sentences which are not sequenced exist in the segmented short sentences, and if not, finishing sequencing; and if so, selecting a short sentence pair with the highest semantic coherence score from all short sentence pairs corresponding to the short sentences to be processed, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences which are positioned next to the short sentences to be processed after sequencing, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences to be processed, and repeatedly executing the preset processing.

After the sorting is completed, the generating unit 503 may generate the second text in the form of the spoken expression from each of the sorted phrases. Preferably, the sorted phrases may be input into a second network model obtained by pre-training, so as to obtain an output second text. The specific type of the second network model is not limited, and can be determined according to actual needs.

For a specific work flow of the apparatus embodiment shown in fig. 5, reference is made to the related description in the foregoing method embodiment, and details are not repeated.

In a word, by adopting the scheme of the embodiment of the application device, the syntactic relation of long sentences and the like can be utilized, the long sentences are segmented into independent ideographic short sentences instead of nested sentences through the long sentence segmentation, the smooth expression effect is obtained through the short sentence sequencing, and then the text in the spoken expression form is generated based on the sequenced short sentences, so that the accuracy of the generated result is improved; each segmented short sentence can be obtained through the syntax tree, and each segmented short sentence can be adjusted and optimized according to a preset rule, so that the accuracy of each obtained short sentence is improved; the semantic consistency scoring can be carried out on the short sentence pairs formed by every two short sentences, and the short sentences are sequenced according to the semantic consistency scoring, so that the accuracy of sequencing results is improved.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 6 is a block diagram of an electronic device according to the method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors Y01, a memory Y02, and interfaces for connecting the components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information for a graphical user interface on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, if desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor Y01 is taken as an example.

The memory Y02 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided herein.

The memory Y02, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present application. The processor Y01 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory Y02, that is, implements the method in the above-described method embodiment.

The memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory Y02 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory Y02 may optionally include a memory remotely located from the processor Y01, and these remote memories may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, blockchain networks, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03, and the output device Y04 may be connected by a bus or other means, and are exemplified by being connected by a bus in fig. 6.

The input device Y03 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output device Y04 may include a display apparatus, an auxiliary lighting device, a tactile feedback device (e.g., a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display, a light emitting diode display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific integrated circuits, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a cathode ray tube or a liquid crystal display monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area networks, wide area networks, blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments are not intended to limit the scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of text conversion, comprising:

acquiring a first text in a written expression form;

sequencing the divided short sentences according to the semantic consistency requirement, comprising the following steps: taking the starting identifier as a starting short sentence, and respectively forming short sentence pairs by the cut short sentences and the starting short sentence to obtain short sentence pairs corresponding to the starting short sentence, wherein each short sentence pair is the front of the starting short sentence; aiming at each divided short sentence, respectively combining each short sentence except the short sentence in each divided short sentence with the short sentence to form a short sentence pair to obtain each short sentence pair corresponding to the short sentence, wherein each short sentence pair is the short sentence before; respectively determining semantic consistency scores of the short sentence pairs; sequencing the divided short sentences according to the semantic consistency scores;

2. The method of claim 1,

the long sentence segmentation of the first text comprises: generating a syntax tree by parsing the first text; and determining each required short sentence according to the syntax tree.

3. The method of claim 1,

before the step of sequencing the divided short sentences according to the semantic consistency requirement, the method further comprises the following steps: and adjusting and optimizing each divided short sentence according to a preset rule.

4. The method of claim 1,

the determining semantic consistency scores of the short sentence pairs respectively comprises: and aiming at each statement pair, inputting the short statement pair into a first network model obtained by pre-training respectively to obtain semantic consistency scores of the output statement pairs.

5. The method of claim 1,

the sorting of the divided short sentences according to the semantic consistency scores comprises:

selecting a short sentence pair with the highest semantic coherence score from all short sentence pairs corresponding to the initial short sentence, and taking the short sentences except the initial short sentence in the selected short sentence pair as the first short sentence after sequencing;

taking the short sentence at the first position after sequencing as a short sentence to be processed;

aiming at the short sentence to be processed, the following preset processing is carried out:

6. The method of claim 1,

the generating of the second text in the spoken language expression form according to the sorted short sentences comprises: and inputting each sequenced short sentence into a second network model obtained by pre-training to obtain the output second text.

7. A text conversion apparatus, comprising: the device comprises a segmentation unit, a sorting unit and a generation unit;

the segmentation unit is used for acquiring a first text in a written expression form, and performing long sentence segmentation on the first text to obtain segmented short sentences;

the sorting unit is used for sorting the divided short sentences according to the semantic consistency requirement, and comprises: taking the starting identifier as a starting short sentence, and respectively forming short sentence pairs by the cut short sentences and the starting short sentence to obtain short sentence pairs corresponding to the starting short sentence, wherein each short sentence pair is the front of the starting short sentence; aiming at each divided short sentence, respectively combining each short sentence except the short sentence in each divided short sentence with the short sentence to form a short sentence pair to obtain each short sentence pair corresponding to the short sentence, wherein each short sentence pair is the short sentence before; respectively determining semantic consistency scores of the short sentence pairs; sequencing each divided short sentence according to the semantic consistency score;

8. The apparatus of claim 7,

and the segmentation unit generates a syntax tree by carrying out syntax analysis on the first text, and determines required short sentences according to the syntax tree.

9. The apparatus of claim 7,

the segmentation unit is further used for adjusting and optimizing each segmented short sentence according to a preset rule.

10. The apparatus of claim 7,

and the sequencing unit inputs the short sentence pairs into a first network model obtained by pre-training respectively aiming at each sentence pair to obtain the semantic consistency score of the output sentence pairs.

11. The apparatus of claim 7,

the sorting unit selects a short sentence pair with the highest semantic consistency score from all short sentence pairs corresponding to the initial short sentence, and takes the short sentences except the initial short sentence in the selected short sentence pair as the first-order short sentences after sorting; taking the short sentence at the first position after sequencing as a short sentence to be processed; and aiming at the short sentence to be processed, performing the following preset processing: determining whether short sentences which are not sequenced exist in the segmented short sentences, and if not, finishing sequencing; and if so, selecting a short sentence pair with the highest semantic coherence score from all short sentence pairs corresponding to the short sentences to be processed, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences next to the short sentences to be processed after sequencing, taking the short sentences except the short sentences to be processed in the selected short sentence pair as the short sentences to be processed, and repeatedly executing the preset processing.

12. The apparatus of claim 7,

and the generating unit inputs the sequenced short sentences into a second network model obtained by pre-training to obtain the output second text.

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-6.