CN117829101A

CN117829101A - Method, apparatus, device and medium for converting text style

Info

Publication number: CN117829101A
Application number: CN202311861817.3A
Authority: CN
Inventors: 吴昊; 邱鑫; 李辰; 周高景; 郭雨; 易沐阳; 杨成
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-05

Abstract

Methods, apparatuses, devices, and media for converting text styles are provided. In one method, a first text having a first style is extracted from a reference video. Based on the first text and the reference prompt, the first text is converted to a second text using a machine learning model, the second text having a second style different from the first style. Based on the first text and the second text, a conversion model is determined, the conversion model describing an association between text having the second style and text having the first style. With the exemplary implementations of the present disclosure, more accurate conversion models can be built with higher efficiency, and the conversion models can be utilized to perform conversion text style tasks.

Description

Method, apparatus, device and medium for converting text style

Technical Field

Example implementations of the present disclosure relate generally to text processing and, more particularly, relate to methods, apparatuses, devices, and computer-readable storage media for converting text styles.

Background

Machine learning techniques have been widely used for text processing. For example, text having a certain style may be entered and converted to other styles using a machine learning model. However, a large amount of manual labeling work is required in constructing a conversion model using a machine learning technique, which makes it difficult to generate a conversion model with high accuracy in a short time. At this point, it is desirable to construct a more accurate conversion model with greater efficiency and then use the conversion model to perform the converted text style task.

Disclosure of Invention

In a first aspect of the present disclosure, a method for converting text styles is provided. In the method, a first text having a first style is extracted from a reference video. Based on the first text and the reference prompt, the first text is converted to a second text using a machine learning model, the second text having a second style different from the first style. Based on the first text and the second text, a conversion model is determined, the conversion model describing an association between text having the second style and text having the first style.

In a second aspect of the present disclosure, an apparatus for converting text styles is provided. The device comprises: an extraction module configured to extract a first text having a first style from a reference video; a conversion module configured to convert the first text to a second text using the machine learning model based on the first text and the reference prompt, the second text having a second style different from the first style; and a determining module configured to determine a conversion model describing an association between the text having the second style and the text having the first style based on the first text and the second text.

In a third aspect of the present disclosure, an electronic device is provided. The electronic device includes: at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions when executed by the at least one processing unit cause the electronic device to perform the method according to the first aspect of the disclosure.

In a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to implement a method according to the first aspect of the present disclosure.

It should be understood that what is described in this section of this disclosure is not intended to limit key features or essential features of the implementations of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages, and aspects of various implementations of the present disclosure will become more apparent hereinafter with reference to the following detailed description in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals designate like or similar elements, and wherein:

FIG. 1 illustrates a block diagram of an application environment for converting text styles according to one exemplary implementation of the present disclosure;

FIG. 2 illustrates a block diagram for converting text styles, according to some implementations of the present disclosure;

FIG. 3 illustrates a block diagram of a process for processing a first text and a second text, according to some implementations of the present disclosure;

FIG. 4 illustrates a block diagram of a process for correcting a first text, in accordance with some implementations of the present disclosure;

FIG. 5 illustrates a block diagram of text before and after correction, in accordance with some implementations of the present disclosure;

FIG. 6 illustrates a block diagram of text before and after correction, in accordance with some implementations of the present disclosure;

FIG. 7 illustrates a block diagram of a process for validating a second text, in accordance with some implementations of the present disclosure;

FIG. 8 illustrates a block diagram of various text in a verification process, in accordance with some implementations of the present disclosure;

FIG. 9 illustrates a flow chart of a method for converting text styles according to some implementations of the present disclosure;

FIG. 10 illustrates a block diagram of an apparatus for converting text styles, according to some implementations of the present disclosure; and

fig. 11 illustrates a block diagram of a device capable of implementing various implementations of the disclosure.

Detailed Description

Implementations of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain implementations of the present disclosure are shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the implementations set forth herein, but rather, these implementations are provided so that this disclosure will be more thorough and complete. It should be understood that the drawings and implementations of the present disclosure are for illustrative purposes only and are not intended to limit the scope of the present disclosure.

In the description of implementations of the present disclosure, the term "include" and its similar terms should be understood as open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one implementation" or "the implementation" should be understood as "at least one implementation". The term "some implementations" should be understood as "at least some implementations". Other explicit and implicit definitions are also possible below. As used herein, the term "model" may represent an associative relationship between individual data. For example, the above-described association relationship may be obtained based on various technical schemes currently known and/or to be developed in the future.

It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the prompt information may be sent to the user, for example, in a pop-up window, where the prompt information may be presented in text. In addition, a selection control for the user to select "agree" or "disagree" to provide personal information to the electronic device may also be carried in the pop-up window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative, and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

The term "responsive to" as used herein means a state in which a corresponding event occurs or a condition is satisfied. It will be appreciated that the execution timing of a subsequent action that is executed in response to the event or condition is not necessarily strongly correlated with the time at which the event occurs or the condition is established. For example, in some cases, the follow-up actions may be performed immediately upon occurrence of an event or establishment of a condition; in other cases, the subsequent action may be performed after a period of time has elapsed after the event occurred or the condition was established.

Example Environment

Text having different styles may be used in different application environments. With the development of machine learning technology, machine learning technology has been applied to the technical field of text processing. For example, in the field of video generation, it is desirable to add commentary to a video that is more spoken in style. At this time, a written style of text may be obtained and spoken elements added to the text, and a text more conforming to the spoken form of the video is generated.

For ease of description, a video sharing application will be described in the context of this disclosure as a specific example. Here, the video sharing application may provide videos generated by a large number of users, for example, where the videos may introduce a physical entity such as a restaurant, shopping mall, movie theater, attraction, or virtual entity such as an online store, online theater, or the like. Users of video sharing applications typically desire a spoken style that is colloquially understandable to the narrative in the video, and that is as much as possible more closely to the daily conversation.

An application environment according to one example implementation of the present disclosure is described with reference to fig. 1. FIG. 1 illustrates a block diagram 100 of an application environment for converting text styles according to one exemplary implementation of the present disclosure. As shown in fig. 1, text 110 having a certain style (e.g., a written style) may be entered and a conversion model 120 utilized to convert the text 110 to text 112 of another style (e.g., a spoken style). However, a large amount of manual labeling work is required in constructing the conversion model 120 using the machine learning technique, which makes it difficult to generate the conversion model 120 with high accuracy in a short time. At this point, it is desirable to construct a more accurate conversion model with greater efficiency and then use the conversion model to perform the converted text style task.

Overview of style conversion

To address, at least in part, the deficiencies in the prior art, a method for converting text styles is presented in accordance with one exemplary implementation of the present disclosure. An overview of one exemplary implementation according to the present disclosure is described with reference to fig. 2, which fig. 2 illustrates a block diagram 200 for converting text styles according to some implementations of the present disclosure. As shown in fig. 2, a reference video 230 may be acquired, and the reference video 230 may be, for example, an already published video acquired from a video sharing application under user permission. These videos are typically manually made or edited and include popular and easily understood narrative text.

The first text 210 may be extracted from the reference video 230 (the first text 210 may have a spoken style, which may also be referred to hereinafter as a first text of a first style (e.g., spoken style). In turn, a reference prompt 232 may be constructed for invoking the machine learning model 240 to perform a style conversion, and based on the first text 210 and the reference prompt 232, the machine learning model 240 may convert the first text 210 to a second text 220 having a second style.

It should be appreciated that the machine learning model 240 herein may be a language model that is currently known and/or that will be developed in the future. The language model can perform corresponding tasks as required by the user-specified prompt words. Here, the reference prompt 232 may specify that a style conversion task be performed by the machine learning model 240 to convert the first text 210 having the first style to the second text 220 having the second style.

Further, the first text 210 and the second text 220 may be used as training data 242 to determine a conversion model 250. Here, the conversion model 250 may describe an association relationship between text having the second style and text having the first style. In other words, the conversion model 250 may convert text having the second style to text having the first style.

With the exemplary implementations of the present disclosure, the machine learning model may be caused to perform style conversion tasks by constructing appropriate cue words. The task of rendering, i.e. converting the written style text into spoken text more suitable for oral broadcasting, is thereby performed. Text having different styles may be determined in a simpler and efficient manner, and then the training process may be performed with the text as training data. In this way, the powerful processing capacity of the machine learning model can be fully utilized, the workload of a large number of manual marks in the process of acquiring training data is reduced, and the efficiency of acquiring the conversion model is further improved.

Detailed process of style conversion

Having described an overview of one example implementation according to the present disclosure, further details of the style conversion process will be described below with reference to the accompanying drawings. It should be appreciated that the correction process needs to be performed for the first text since the narration in the reference video may have problems of too fast speech, poor pronunciation, etc., which results in that the first text extracted in favor of the speech recognition process may have errors. For another example, for some reasons, the semantic meaning of the second text generated by the machine learning model may deviate, when it is necessary to verify whether the second text includes key information of the first text, and so on.

More information is described with reference to fig. 3, which fig. 3 illustrates a block diagram 300 of a process for processing a first text and a second text in accordance with some implementations of the present disclosure. As shown in fig. 3, correction 310 processing may be performed on the first text 210 extracted from the reference video in order to correct potential errors in the first text. According to one example implementation of the present disclosure, suitable correction hints may be constructed to further utilize the capabilities of the machine learning model to perform the correction process.

Specifically, in the subtitle-based correction process, a correction prompt for correcting the first text may be acquired, and the first text is corrected using a machine learning model based on the correction prompt. Further, the first text may be converted to the second text using a machine learning model based on the corrected first text and the reference prompt. In this way, it is not necessary to develop a dedicated language correction tool, but rather the powerful language processing capability of the machine learning model can be invoked directly by way of constructing suitable prompt words, thereby correcting potential errors in text obtained by techniques such as speech recognition.

As shown in fig. 3, correction 310 may include the following: first, caption-based correction 312 may be performed using the captions in the reference video; second, the digital conversion 314 may be performed to convert the language-expressed digits in the first text into an arabic-numeric format. According to one example implementation of the present disclosure, the above-described correction processes may be performed individually and/or in combination to obtain more accurate text data.

It should be appreciated that some miscords or other recognition errors may exist in text obtained using speech recognition techniques. These errors can affect text quality and cause potential problems such as syntactic errors and/or sentence confusion, for example, which can lead to language model processing of input text that is difficult to understand and thus results in information loss. For this problem, the identified text may be corrected with subtitles in the reference video.

Referring to fig. 4 describing a subtitle-based correction process, fig. 4 illustrates a block diagram 400 of a process for correcting a first text in accordance with some implementations of the present disclosure. As shown in fig. 4, the first text 210 may be determined based on a variety of technical solutions currently known and/or to be developed in the future (e.g., based on Automatic Speech Recognition (ASR) techniques, etc.). Further, the subtitles 410 in the reference video 230 may be determined based on a variety of technical solutions (e.g., optical Character Recognition (OCR) or image-based text recognition models, etc.) that are currently known and/or that will be developed in the future.

In general, the first text 210 and the caption 410 should be corresponding, at which time the first text 210 and the caption 410 may be utilized to construct the hint word 420, thereby requiring the machine learning model 240 to reference the caption 410 to modify the error in the first text 210. Further, correction prompt 420 may be constructed. For convenience of description, a cue word for correcting the first text based on subtitles of the reference video may be referred to as a first correction cue word. The first text is then corrected using the subtitle based on the first correction prompt and the machine learning model. According to one example implementation of the present disclosure, the first correction hint word may be constructed, for example, based on the following table 1.

Table 1 examples of hint words

In the example of Table 1, the hint words describe the task requirements and give a set of examples. Section 1 describes the goal of the correction task, namely, instructing a machine learning model to modify errors in an ASR document based on OCR information. Section 2 shows an example of OCR information, section 3 shows an example of ASR text, section 4 shows an output for example 1, and section 5 shows a modified text. Further, more examples may be given in section 6 so that the machine learning model knows more details about tasks.

According to one example implementation of the present disclosure, the first correction prompt may be further for instructing the machine learning model to provide a difference between the first text and the corrected first text. Further, the discrepancy may be provided based on the first correction prompt and the machine learning model. With the example implementations of the present disclosure, a machine learning model may be instructed to provide basis for correcting text in an explicit manner, thereby facilitating a user's determination of whether to accept the correction result.

In particular, section 4 in table 1 shows a format for presenting the difference, for example, the text "alice cafe brings back a new activity" before correction may be presented first, and then the corrected text "alice cafe brings back a new activity". Further, section 5 instructs the machine learning model to present the modified document. In this way, the source of text output by the machine learning model may be tracked, thereby reducing the likelihood of the occurrence of a phantom.

According to one example implementation of the present disclosure, the hint words may be provided in one or more rounds, e.g., the hint words shown in Table 1 may be provided in a first round, and text to be corrected is entered into the machine learning model in a second round where the machine learning model confirms that the task to be performed has been understood (e.g., the machine learning model may answer: "yes, i understand"). In one example, the text to be corrected may be, for example: "good message, good message-! The coffee and dessert are only nineteen pieces and nine hairs of the afternoon tea set in the coffee bag. The preferential amplitude is unprecedented, and people catch up to purchase the Ha-! "at this time, the machine learning model may perform a correction task as required by the hint word.

At this time, similar to the format specified in section 4 of table 1, the machine learning model may output the source of the modified information in the format shown in table 2:

table 2 modification of information sources

Further, more information about the processing results is described with reference to fig. 5, which shows a block diagram 500 of text before and after correction in accordance with some implementations of the present disclosure. As shown in fig. 5, text 510 represents the directly extracted ASR text before correction, and text 520 represents the corrected text. Assume that the OCR recognized caption describes "bob's cafe", at which point the word "package" shown in block 512 would be corrected to the word "bob" as shown in block 522.

It should be appreciated that fig. 5 is merely exemplary, and that the text shown at blocks 512 and 522 may be highlighted using different fonts, colors, sizes, undertones, and the like. It should be appreciated that the hint words in Table 1 are merely exemplary and that the hint words can be modified based on the requirements of a particular application environment, e.g., different correction tasks can be specified and additions, deletions, and/or modifications can be performed with respect to the examples given in Table 1. Further, the text in FIG. 5 is also illustrative, and other text to be processed may be entered into the machine learning model to obtain corresponding results.

It should be appreciated that the digits in ASR text are represented in chinese language, however, typically machine learning models utilize arabic digits to represent the digits and can introduce potential errors in processing and language-expressed digits. For example, when training a conversion model with training data that includes numbers represented in language, the conversion model may tend to convert Arabic numbers to Chinese numbers during the inference phase, thereby increasing the complexity of the model and the training difficulty. At this time, digits may be expressed based on Arabic numerals when generating training data, thereby reducing conversion complexity.

According to one example implementation of the present disclosure, the correction hint may further include a second correction hint word for converting the linguistically expressed digits in the first text to Arabic digits. At this time, the numbers expressed in language in the first text may be converted into arabic numerals based on the second correction prompt words and the machine learning model. According to one example implementation of the present disclosure, the second correction hint word may be constructed, for example, based on the following table 3.

Table 3 examples of hint words

According to one example implementation of the present disclosure, the subtitle-corrected text 520 may be input to a machine learning model to replace numbers. Fig. 6 illustrates a block diagram 600 of text before and after correction in accordance with some implementations of the present disclosure. As shown in block 612 of fig. 6, text 520 includes price information "nineteen blocks nine-wool-nine" expressed in a language, and after numerical replacement, text 610 may be generated. At this point, the price information has been replaced with "19.99" in Arabic numerals, as indicated at block 622.

It should be appreciated that fig. 6 is merely exemplary, and that the text shown at blocks 612 and 622 may be highlighted using different fonts, colors, sizes, undertones, and the like. It should be appreciated that the hint words in Table 3 are merely exemplary and that the hint words can be modified based on the requirements of a particular application environment. Further, the text in FIG. 6 is also illustrative, and other text to be processed may be entered into the machine learning model to obtain corresponding results.

According to one example implementation of the present disclosure, a reference hint word may be constructed to convert the first text described above (or the corrected first text) to the second text. Herein, the reference hint word includes a task portion for describing a task and an example portion for describing performing the task. In this way, the machine learning model can be informed of the specifics of the task to be performed in a more clear manner. For example, the reference hint word may be constructed based on the manner shown in Table 4 below to convert a first file of a first style to a second text of a second style.

Table 4 examples of hint words

As shown in Table 4, section 1 shows tasks for instructing the machine learning model to perform. Further, the reference hint word further includes a multi-aspect constraint for the output text, such as at least any one of the following constraints: the Chinese word constraint, the person name constraint, the template constraint and the semantic constraint. Specifically, section 2 shows constraints for the output text, such as constraints that prohibit the use of certain words of speech, constraints that prohibit the use of the first person's name, and so on. Section 3 shows constraints on the templates used to output text, and the constraints define the specifics of the portions that the output data should include. Section 4 gives an example of the output text, and section 5 specifies key information that needs to be held in the output text, and so on.

With example implementations of the present disclosure, the machine learning model may be instructed to perform different tasks by constructing different hint words, and the processing results output in a predetermined format. In this way, the spoken style text can be converted to a more compact written style in a simpler and more efficient manner, thereby simplifying the process of obtaining training data.

According to one example implementation of the present disclosure, the hint word may be entered in one or more rounds. For example, a task to be performed by the machine learning model may be notified in a first round, and with the machine learning model knowing about the task, text to be processed is entered and processed text (i.e., second text) is obtained from the machine learning model.

According to one example implementation of the present disclosure, there may be situations where a first text input to the machine learning model is not aligned with a second text output, e.g., some critical information is included in the first text, but not in the second text. This may lead to errors in the training data. When determining a conversion model using these erroneous training data, the conversion model may be subjected to an illusion, i.e. irrelevant error information may be included in the output data of the conversion model.

Referring to fig. 7 for more details, fig. 7 shows a block diagram 700 of a process for validating a second text in accordance with some implementations of the present disclosure. At this time, the alignment between the first text and the second file may be further verified. In particular, the extraction hint word 710 can be utilized to invoke the processing power of the machine learning model to extract the key information 720 from the first text 210. For example, the machine learning model may be instructed to extract key information from the text before conversion based on the hint words as shown in Table 5 below.

Table 5 examples of hint words

As shown in table 5, section 1 represents a task description, i.e., instructs the machine learning model to extract "meaningful" key information. Part 2 represents one example of input and output, and further, parts 3 and 4 show other examples. Finally, section 5 may include the data to be processed. According to one example implementation of the present disclosure, text to be processed (e.g., first text 210) may be added to the portion following the "enter" of portion 5 in order to obtain the key information 720.

It should be appreciated that while FIG. 7 illustrates an example of obtaining the key information 720 based on extracting the hint word 710, the key information 720 may alternatively and/or additionally be extracted based on other ways. For example, the key information 720 may be extracted based on a manner of text semantic analysis.

In turn, a validation hint word 730 may be constructed and the processing power of the machine learning model is invoked with the validation hint word 730 to determine whether the second text 220 includes the key information 220. In particular, the verification hint word 730 may specify the task to be performed by the machine learning model and provide one or more examples and specify the format of the output result. For example, the machine learning model may be specified to output whether the second text includes key information, and in the event that the key information is not included, a difference between the second text and the key information may be presented. Further, a machine learning model may be specified to perform optimization for the second text such that the optimized text includes the key information.

More details are described with reference to fig. 8, which fig. 8 illustrates a block diagram 800 of various text in a verification process in accordance with some implementations of the present disclosure. As shown in fig. 8, key information 810 may be extracted from text 620. As shown in block 812, the key information represents: "coffee dessert 19.99 yuan". At this point, box 822 represents "coffee as long as 19.99" in the text directly output by the machine learning model that is converted to the second style (i.e., text 820 before verification). It can be seen that the content at block 822 conflicts with the content of block 812, i.e., text 820 loses some key information. At this point, the lost information may be complemented with key information 810, and "coffee dessert only 19.99" as indicated by box 832 in verified text 830.

With example implementations of the present disclosure, it may be ensured that the second text of the second style output by the machine learning model includes all of the key information in the original first text. In this way, it can be ensured that no information is lost during the acquisition of the training data, thereby providing more accurate training data. Thereby, it is possible to ensure that the conversion model obtained using the training data has higher accuracy.

It should be appreciated that the above illustrates by way of example only the process of extracting training data from one reference video. Alternatively and/or additionally, a large number of reference videos may be processed in a similar manner, followed by more training data. Further, the custom machine learning model may be updated in an iterative manner and with a large amount of training data to obtain a conversion model capable of converting text having the second style to text having the first style.

It should be appreciated that the custom machine learning model may be implemented herein based on a variety of architectures. For example, the architecture of the transformation model may be defined on an end-to-end basis. At this time, the conversion model may directly convert the input written-style text into text having a spoken style.

Alternatively and/or additionally, the transformation model may be defined using the architecture of the language model. At this time, the language model may have more capabilities, and may perform corresponding tasks according to the specified prompt words. Specifically, a conversion prompt for converting the input text may be obtained, and the input text may be converted to the output text using the conversion model based on the conversion prompt. At this time, the conversion model may perform a corresponding conversion task based on the entered prompt word. For example, the hint word may represent: converting the input text to a spoken style output text; alternatively and/or additionally, another hint word may represent: the input text is converted to output text of the written style. At this time, the conversion model may perform conversion as required by the hint word. With example implementations of the present disclosure, the conversion model may be generated in a variety of ways according to user requirements in different application scenarios.

According to one example implementation of the present disclosure, input text having a second style may be received and converted to output text having a first style using a conversion model. The input text may be obtained based on a variety of ways, for example, in the case where it is desired to generate a promotional video of a restaurant, the input text may be generated by a manual editing operation: "XXX restaurant, hamburger single package 19 yuan. The package comprises: beef hamburger, french fries, beverage. 16 stores. Alternatively and/or additionally, the input text may be generated in an automated manner based on a machine learning model. The text may be entered into a trained conversion model, at which point the conversion model will generate output text: "hamburger single package 19 yuan for XXX restaurant, let you taste delicates-! The package comprises: beef hamburger, french fries and beverage with good taste. The environment is comfortable, 16 stores are universal, and people quickly taste the bar-! "

According to one example implementation of the present disclosure, although specific processes for performing a conversion text style in a chinese language environment are described above as examples in chinese. Alternatively and/or additionally, the above-described processes may be performed in a multi-lingual environment. For example, where the machine learning model has multilingual capabilities, style conversion may be performed in an english environment, at which time the first text and the second text may be represented in english, and the prompt may be written using english. Alternatively and/or additionally, the text being processed herein may be in the same language as the words of the prompt, or both may be in different languages, e.g., the words of the prompt may be written in chinese, and the words of the prompt may specify a machine learning model to process text in an english language environment.

With example implementations of the present disclosure, more accurate conversion models may be built with higher efficiency, and the conversion models may be utilized to perform conversion text style tasks.

Example procedure

Fig. 9 illustrates a flow chart of a method 900 for converting text styles according to some implementations of the present disclosure. At block 910, a first text having a first style is extracted from a reference video. At block 920, the first text is converted to a second text using a machine learning model based on the first text and the reference prompt, the second text having a second style different from the first style. At block 930, a conversion model is determined that describes an association between text having a second style and text having a first style based on the first text and the second text.

According to one example implementation of the present disclosure, converting the first text to the second text includes: acquiring a correction prompt word for correcting the first text; and correcting the first text using the machine learning model based on the correction prompt; based on the corrected first text and the reference prompt, the first text is converted to a second text using a machine learning model.

According to one example implementation of the present disclosure, the correction hint words include a first correction hint word for correcting the first text based on the subtitle of the reference video, and the method includes: extracting subtitles of a reference video; and correcting the first text with the subtitle based on the first correction prompt and the machine learning model.

According to one example implementation of the present disclosure, the first correction prompt is further for instructing the machine learning model to provide a difference between the first text and the corrected first text, and the method further comprises: the discrepancy is provided based on the first correction prompt and the machine learning model.

According to one example implementation of the present disclosure, the correction hint words include a second correction hint word that is used to convert a linguistically expressed number in the first text to an arabic number, and the method includes: based on the second correction prompt word and the machine learning model, the numbers expressed in language in the first text are converted into Arabic numbers.

According to one example implementation of the present disclosure, the method further comprises: extracting key information from the first text; acquiring a verification prompt, wherein the verification prompt is used for verifying whether the second text comprises key information; and verifying whether the second text includes key information based on the verification hint word and the machine learning model.

According to one example implementation of the present disclosure, the reference hint word includes a task portion for describing a task and an example portion for describing performing the task.

According to one example implementation of the present disclosure, the reference hint term further includes at least any one of the following constraints: the Chinese word constraint, the person name constraint, the template constraint and the semantic constraint.

According to one example implementation of the present disclosure, the method further comprises: in response to receiving the input text having the second style, the input text is converted to output text having the first style using the conversion model.

According to one example implementation of the present disclosure, the conversion model is a language model, and converting the input text to the output text includes: acquiring a conversion prompt word for converting an input text; based on the conversion hint words, the conversion model is utilized to convert the input text to the output text.

According to one example implementation of the present disclosure, the first style is a spoken style and the second style is a written style.

Example apparatus and apparatus

Fig. 10 illustrates a block diagram of an apparatus 1000 for converting text styles according to some implementations of the present disclosure. The apparatus 1000 comprises: an extraction module 1010 configured to extract a first text having a first style from a reference video; a conversion module 1020 configured to utilize a machine learning model to convert the first text to a second text based on the first text and the reference prompt, the second text having a second style different from the first style; and a determining module 1030 configured to determine a conversion model describing an association between the text having the second style and the text having the first style based on the first text and the second text.

According to one example implementation of the present disclosure, a conversion module includes: an acquisition module configured to acquire a correction prompt for correcting the first text; and a correction module configured to correct the first text using the machine learning model based on the correction prompt; the correction-based conversion module is configured to convert the first text to the second text using the machine learning model based on the corrected first text and the reference prompt.

According to one example implementation of the present disclosure, the correction hint word includes a first correction hint word for correcting the first text based on the subtitle of the reference video, and the apparatus includes: a subtitle extraction module configured to extract subtitles of a reference video; and a caption-based correction module configured to correct the first text with the caption based on the first correction hint word and the machine learning model.

According to one example implementation of the present disclosure, the first correction prompt is further for instructing the machine learning model to provide a difference between the first text and the corrected first text, and the apparatus further comprises: the difference providing module is configured to provide a difference based on the first correction prompt and the machine learning model.

According to one example implementation of the present disclosure, the correction hint word includes a second correction hint word for converting a language-expressed number in the first text to an Arabic number, and the apparatus includes: and the digital conversion module is configured to convert the numbers expressed in the language in the first text into Arabic numbers based on the second correction prompt words and the machine learning model.

According to one example implementation of the present disclosure, the apparatus further comprises: a key information extraction module configured to extract key information from the first text; the verification prompt acquisition module is configured to acquire a verification prompt, and the verification prompt is used for verifying whether the second text comprises key information or not; and a verification module configured to verify whether the second text includes key information based on the verification hint and the machine learning model.

According to one example implementation of the present disclosure, the apparatus further comprises: and a style conversion module configured to convert the input text to the output text having the first style using the conversion model in response to receiving the input text having the second style.

According to one example implementation of the present disclosure, the conversion model is a language model, and the style conversion module includes: the prompt word acquisition module is configured to acquire a conversion prompt word for converting an input text; the prompt-word-based conversion module is configured to convert the input text to the output text using the conversion model based on the converted prompt word.

Fig. 11 illustrates a block diagram of a device 1100 capable of implementing various implementations of the disclosure. It should be appreciated that the computing device 1100 illustrated in fig. 11 is merely exemplary and should not be construed as limiting the functionality and scope of the implementations described herein. The computing device 1100 illustrated in fig. 11 may be used to implement the methods described above.

As shown in fig. 11, computing device 1100 is in the form of a general purpose computing device. Components of computing device 1100 may include, but are not limited to, one or more processors or processing units 1110, memory 1120, storage 1130, one or more communication units 1140, one or more input devices 1150, and one or more output devices 1160. The processing unit 1110 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 1120. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capabilities of computing device 1100.

Computing device 1100 typically includes a number of computer storage media. Such media can be any available media that is accessible by computing device 1100 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. The memory 1120 may be volatile memory (e.g., registers, cache, random Access Memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. The storage device 1130 may be a removable or non-removable medium and may include a machine readable medium such as a flash drive, a magnetic disk, or any other medium that may be capable of storing information and/or data (e.g., training data for training) and may be accessed within the computing device 1100.

Computing device 1100 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in fig. 11, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data medium interfaces. Memory 1120 may include a computer program product 1125 having one or more program modules configured to perform the various methods or acts of the various implementations of the present disclosure.

The communication unit 1140 enables communication with other computing devices via a communication medium. Additionally, the functionality of the components of computing device 1100 may be implemented in a single computing cluster or in multiple computing machines capable of communicating over a communications connection. Accordingly, computing device 1100 may operate in a networked environment using logical connections to one or more other servers, a network Personal Computer (PC), or another network node.

The input device 1150 may be one or more input devices such as a mouse, keyboard, trackball, etc. The output device 1160 may be one or more output devices such as a display, speakers, printer, etc. Computing device 1100 can also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., with one or more devices that enable a user to interact with computing device 1100, or with any device (e.g., network card, modem, etc.) that enables computing device 1100 to communicate with one or more other computing devices, as desired, via communication unit 1140. Such communication may be performed via an input/output (I/O) interface (not shown).

According to an exemplary implementation of the present disclosure, a computer-readable storage medium having stored thereon computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the method described above is provided. According to an exemplary implementation of the present disclosure, there is also provided a computer program product tangibly stored on a non-transitory computer-readable medium and comprising computer-executable instructions that are executed by a processor to implement the method described above. According to an exemplary implementation of the present disclosure, a computer program product is provided, on which a computer program is stored which, when being executed by a processor, implements the method described above.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus, devices, and computer program products implemented according to the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of implementations of the present disclosure has been provided for illustrative purposes, is not exhaustive, and is not limited to the implementations disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various implementations described. The terminology used herein was chosen in order to best explain the principles of each implementation, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand each implementation disclosed herein.

Claims

1. A method for converting text styles, comprising:

extracting a first text having a first style from a reference video;

converting the first text to a second text using a machine learning model based on the first text and the reference prompt, the second text having a second style different from the first style; and

based on the first text and the second text, a conversion model is determined, the conversion model describing an association between text having the second style and text having the first style.

2. The method of claim 1, wherein converting the first text to the second text comprises:

Acquiring a correction prompt word for correcting the first text; and

correcting the first text using the machine learning model based on the correction prompt;

based on the corrected first text and the reference prompt, the first text is converted to a second text using the machine learning model.

3. The method of claim 2, wherein the correction hint word includes a first correction hint word that is used to correct the first text based on a subtitle of the reference video, and the method includes:

extracting the subtitles of the reference video; and

the first text is corrected using the subtitle based on the first correction prompt and the machine learning model.

4. The method of claim 3, wherein the first correction prompt is further for instructing the machine learning model to provide a difference between the first text and the corrected first text, and the method further comprises: the difference is provided based on the first correction prompt and the machine learning model.

5. The method of claim 2, wherein the correction hint word includes a second correction hint word that is used to convert a linguistically expressed number in the first text to an arabic number, and the method comprises: the digits expressed in language in the first text are converted into Arabic numerals based on the second correction prompt and the machine learning model.

6. The method of claim 1, further comprising:

extracting key information from the first text;

acquiring a verification prompt, wherein the verification prompt is used for verifying whether the second text comprises the key information; and

and verifying whether the second text comprises the key information based on the verification prompt and the machine learning model.

7. The method of claim 1, wherein the reference hint word includes a task portion for describing the task and an example portion for describing performing the task.

8. The method of claim 1, wherein the reference hint word further comprises at least any one of the following constraints: the Chinese word constraint, the person name constraint, the template constraint and the semantic constraint.

9. The method of claim 1, further comprising: in response to receiving input text having the second style, the input text is converted to output text having the first style using the conversion model.

10. The method of claim 9, wherein the conversion model is a language model, and converting the input text to the output text comprises:

Acquiring a conversion prompt word for converting the input text;

based on the conversion prompt, the input text is converted to the output text using the conversion model.

11. The method of claim 1, wherein the first style is a spoken style and the second style is a written style.

12. An apparatus for converting text styles, comprising:

an extraction module configured to extract a first text having a first style from a reference video;

a conversion module configured to convert the first text to a second text using a machine learning model based on the first text and a reference prompt, the second text having a second style different from the first style; and

a determination module configured to determine a conversion model describing an association between text having the second style and text having the first style based on the first text and the second text.

13. An electronic device, comprising:

at least one processing unit; and

at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, which when executed by the at least one processing unit, cause the electronic device to perform the method of any one of claims 1 to 11.

14. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to implement the method of any of claims 1 to 11.