CN117574860A

CN117574860A - Method and equipment for text color rendering

Info

Publication number: CN117574860A
Application number: CN202410057743.6A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Beijing Midu Information Technology Co ltd
Current assignee: Beijing Midu Information Technology Co ltd
Priority date: 2024-01-16
Filing date: 2024-01-16
Publication date: 2024-02-20

Abstract

The application aims to provide a method and equipment for text color rendering. Compared with the prior art, the method and the device have the advantages that a plurality of basic data sets are constructed based on a plurality of color rendering types of a preset number, each basic data set is subjected to data construction based on the corresponding color rendering type, a corresponding training data set is determined, a neural network model is trained and determined based on the training data set, the neural network model comprises a classification layer, probability distribution of the corresponding color rendering type is output through the classification layer, text color rendering is performed based on the output result of the trained neural network model, and color rendering results are determined. By the method, the resource and performance cost of a model layer can be reduced, and the accuracy of a color rendering result can be improved.

Description

Method and equipment for text color rendering

Technical Field

The application relates to the technical field of computers, in particular to a text color rendering technology.

Background

Text rendering refers to revising and improving text to improve its quality and readability. This includes modifications in grammar, spelling, punctuation, sentence structure, and expression to ensure accuracy, fluency, and clarity of the text. Text collation refers to the careful collation and review of the completed text to find and correct any spelling, punctuation, grammar or typesetting errors. Collation is primarily concerned with correction of errors, rather than making significant modifications or improvements to text. In short, text rendering focuses on the overall improvement and optimization of text, including modifications in grammar, structure, and expression, to improve text quality and readability. Text collation focuses more on finding and correcting errors in spelling, punctuation, grammar, etc., to ensure text accuracy and normalization. Therefore, text rendering can better improve text quality, but implementation difficulty is also higher.

In the prior art, patent publication No. CN114298031a discloses a text processing method, a computer device and a storage medium, in which an excellent expression generating task, a modifier supplementing task and a vocabulary raising task are defined. The excellent expression generating task adopts a keyword information guided fine tuning method to fine tune a GPT (Gererate Pre-Training Model, unsupervised learning generation) Model; the modifier supplementing task adopts a pointer generation network based on BART (Bidirectional and Auto-regressive transformers, conditional generation) to finely tune the model; the word raising task adopts a synonym word graph constructed by word paraphrasing to acquire synonyms, and scores and sorts the synonyms by using a GPT model and a superior expression recognition model. In the processing, a multi-method fusion mode is adopted to moisten the text, so that on one hand, a large amount of computing resources and time are required to train and infer, and the usability and instantaneity of the model are limited, especially in environments with limited resources; on the other hand, the context information may not be fully considered in the color rendering process, and the context information is mainly used for correcting errors or replacing local words based on local sentence level information, so that the semantic and logical relationship of the whole text is ignored. This may result in the corrected text not being consistent or inconsistent with the overall semantics.

In addition, patent publication No. CN114492463a discloses a unified semantic chinese text rendering method based on antagonistic multitasking learning, in which a rendering range of the task is determined by a rendering range division model, a traversing search is performed on the number of characters inserted in the rendering range, a series of new sentences are generated by using a mask language model, then the generated new sentences are scored by using a position scoring model, and finally the best sentences are screened according to the scoring result. Disadvantages of this approach include: errors may exist in the determination of the color rendering range division model, so that the color rendering range is divided inaccurately; performing traversal search on the number of characters inserted in the color rendering range may result in higher computational complexity and consume more time and resources; when generating new sentences using the mask language model, there may be cases where the grammar is wrong or does not conform to the context semantics; the location scoring model may not accurately evaluate the generated new sentence, resulting in inaccurate screening results for the best sentence.

In summary, the existing color rendering method has the problems of low training and reasoning speed, high resource and performance cost on the model level and low result accuracy.

Disclosure of Invention

An object of the present application is to provide a method and apparatus for text rendering.

According to one aspect of the present application, there is provided a method for text rendering, wherein the method comprises:

constructing a plurality of basic data sets based on a preset number of the plurality of color rendering types;

constructing data of each basic data set based on the corresponding color rendering type, and determining the corresponding training data set;

training a neural network model based on the training data set to determine a trained neural network model, wherein the neural network model comprises a classification layer so as to output probability distribution of a corresponding color rendering type through the classification layer;

and performing text color rendering based on the output result of the trained neural network model, and determining a color rendering result.

Optionally, when the color rendering result is multiple, the method further includes:

and scoring the plurality of color rendering results based on the sentence through degree model, and taking the color rendering result with the highest score as the final color rendering result.

Optionally, the method further includes:

acquiring text data of a plurality of categories;

preprocessing the text data of the multiple categories, and determining the preprocessed text data;

wherein the constructing a plurality of base data sets based on a preset number of the plurality of color rendering types includes:

and constructing a plurality of basic data sets based on the preprocessed text data and a preset number of a plurality of color rendering types.

Optionally, the constructing a plurality of basic data sets based on the preprocessed text data and a preset number of a plurality of rendering types includes:

dividing the preprocessed text data into parts corresponding to the preset number based on the preset number, wherein each part of text data corresponds to one basic data set, and each basic data set corresponds to one color rendering type.

Optionally, when the color rendering type includes word errors, the constructing each basic data set based on the corresponding color rendering type includes:

carrying out word error construction on the data in the corresponding basic data set based on a preset word error construction rule; or generating wrong text data through the generator and judging whether the text data is correct or not through the discriminator, so that the generator can adjust according to feedback of the discriminator to construct data containing word errors.

Optionally, when the rendering type includes a grammar error, the constructing each base data set based on the corresponding rendering type includes:

when the grammar errors comprise any one of improper collocation, incomplete components, redundant components and improper language sequence, carrying out grammar analysis on the text of the corresponding basic data set so as to count collocation relation of correct sentences;

carrying out grammar error construction based on a preset grammar error construction rule and the collocation relation;

and when the grammar errors comprise sentence ambiguity, carrying out word order exchange of the modifier words on the texts with a plurality of modifier words in the basic data set so as to carry out grammar error construction.

Optionally, when the color rendering type includes modifier errors, the constructing each basic data set based on the corresponding color rendering type includes:

and replacing modifier words in sentences with modifier words in the corresponding basic data set with words with similar semantics and/or opposite semantics to perform modifier error construction.

Optionally, when the color rendering type includes a translation error, the constructing each basic data set based on the corresponding color rendering type includes:

generating text data corresponding to other languages from the text data in the corresponding basic data set;

and comparing the generated text data corresponding to other languages with the original text data, and taking sentences with similarity between preset percentages as construction data of translation errors.

Optionally, the method further includes:

and determining the color rendering result reaching a frequency threshold based on the historical use frequency of the color rendering result, and caching the color rendering result.

According to yet another aspect of the present application, there is also provided a computer readable medium having stored thereon computer readable instructions executable by a processor to implement operations as the aforementioned method.

According to yet another aspect of the present application, there is also provided an apparatus for text rendering, wherein the apparatus includes:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform operations of the method as described above.

Compared with the prior art, the method and the device have the advantages that a plurality of basic data sets are constructed based on a plurality of color rendering types of a preset number, each basic data set is subjected to data construction based on the corresponding color rendering type, a corresponding training data set is determined, a neural network model is trained and determined based on the training data set, the neural network model comprises a classification layer, probability distribution of the corresponding color rendering type is output through the classification layer, text color rendering is performed based on the output result of the trained neural network model, and color rendering results are determined. By the method, the resource and performance cost of a model layer can be reduced, and the accuracy of a color rendering result can be improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

FIG. 1 illustrates a flow chart of a method for text rendering according to an aspect of the subject application.

The same or similar reference numbers in the drawings refer to the same or similar parts.

Detailed Description

The present application is described in further detail below with reference to the accompanying drawings.

In one typical configuration of the present application, the terminal, the devices of the service network, and the trusted party each include one or more processors (e.g., central processing units (Central Processing Unit, CPU)), input/output interfaces, network interfaces, and memory.

The Memory may include non-volatile Memory in a computer readable medium, random access Memory (Random Access Memory, RAM) and/or non-volatile Memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase-Change RAM (PRAM), static random access Memory (Static Random Access Memory, SRAM), dynamic random access Memory (Dynamic Random Access Memory, DRAM), other types of Random Access Memory (RAM), read-Only Memory (ROM), electrically erasable programmable read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), flash Memory or other Memory technology, read-Only optical disk read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), digital versatile disks (Digital Versatile Disk, DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.

FIG. 1 shows a flow chart of a method for text rendering provided in one aspect of the present application, the method comprising the steps of:

s11, constructing a plurality of basic data sets based on a preset number of a plurality of color rendering types;

s12, constructing data of each basic data set based on the corresponding color rendering type, and determining the corresponding training data set;

s13, training a neural network model based on the training data set to determine a trained neural network model, wherein the neural network model comprises a classification layer so as to output probability distribution of a corresponding color rendering type through the classification layer;

s14, performing text color rendering based on the output result of the trained neural network model, and determining a color rendering result.

In this embodiment, in step S11, a plurality of basic data sets are constructed based on a predetermined number of a plurality of rendering types, where the rendering types include, but are not limited to, word errors, grammar errors, modifier errors, translation errors, etc., and the text in which these rendering types are required to be subjected to corresponding modification rendering at the time of rendering. The data in the basic data set can be collected based on a preset program, wherein the number of the basic data sets is the same as the number of the color rendering types.

Preferably, the method further comprises: acquiring text data of a plurality of categories; preprocessing the text data of the multiple categories, and determining the preprocessed text data;

wherein, the step S11 includes:

In this embodiment, the categories include, but are not limited to, science and technology, novice, social media, legal, medical, etc., and as much text data as possible is collected that encompasses a plurality of different categories. Further, these text data are subjected to preprocessing including, but not limited to, clause, washing, de-duplication, randomization, etc., in order to improve the quality and consistency of the data. Further, a plurality of base data sets are constructed based on the preprocessed text data.

Preferably, the constructing a plurality of basic data sets based on the preprocessed text data and a preset number of a plurality of rendering types includes: dividing the preprocessed text data into parts corresponding to the preset number based on the preset number, wherein each part of text data corresponds to one basic data set, and each basic data set corresponds to one color rendering type. Here, the text data after the preprocessing may be equally or randomly divided into the number of copies corresponding to the preset number.

Continuing in this embodiment, in step S12, each base dataset is data structured based on the corresponding color rendering type, determining a corresponding training dataset. The data of the training data set comprises data in the basic data set and data after modification of the data in the basic data set. Specifically, data transformation is performed on basic data sets corresponding to different color rendering types respectively.

Preferably, when the color rendering type includes word errors, the constructing each basic data set based on the corresponding color rendering type includes:

In this embodiment, on the one hand, the error data may be constructed according to a word misconstruction rule, where the word misconstruction rule includes, but is not limited to, inserting, deleting, replacing, or exchanging characters, words, phrases, etc., and word misconstruction may be performed based on the word misconstruction rule, to generate text data containing the wrong word.

On the other hand, the resistance generation network may be employed to generate high quality text correction data. First, a network of generators and discriminators needs to be built. The generator network is responsible for generating erroneous text data, while the arbiter network is responsible for determining whether the text data is correct or erroneous. The training generator and arbiter network is then started using the prepared correct text data. In the training process, the generator generates erroneous text data, and the discriminator judges the generated text data and provides feedback. The generator adjusts according to the feedback of the arbiter to generate text data that more closely approximates the true error.

Preferably, when the rendering type includes a grammar error, the constructing each basic data set based on the corresponding rendering type includes:

In this embodiment, common grammar mistakes include, but are not limited to, mismatching, incomplete components, redundant components, misordering, sentence ambiguity, sentence-based mashup, and the like. Aiming at errors of improper collocation, incomplete components, excessive components and improper word sequence, the collocation relation of correct sentences can be counted by preparing words with long-distance collocation relation in advance and utilizing a syntactic analysis module of a hanlp (Han Language Processing, chinese language processing package) model, for example: improving-quality, improving-vigilance, increasing-quantity, increasing-human hand, etc. Then, aiming at the basic data set, the invention utilizes the hanlp model to carry out syntactic analysis on the correct sentence to obtain words with main-meaning relation, in-shape structure, mediate relation, core relation, parallel relation, move-guest relation and the like in the sentence. These words are then misstructured using designed substitution, deletion, insertion or exchange rules. For example: "through efforts, our number of issues is doubled. In the preparation of the main words of ' release number ' and predicates of ' increase ', we have acquired common collocation relations, and here we can randomly select a collocation word irrelevant to ' number ' -increase, and then generate wrong sentences ' through effort, our release number is doubled. "

When the grammar error includes sentence ambiguity, the sentence ambiguity is that there are two or more different understandings of a sentence. Common semantic ambiguities are "ambiguous sentences formed by different word orders" and "ambiguous sentences formed by different meaning of words". The word order of the ambiguous sentences formed by different word orders is mostly caused by the reverse order of modifier words, and when the same word has a plurality of modifier words in the process of constructing data, the word order of modifier words is exchanged, for example: "it is understood that finding some sweet in a temporarily bitter life, your life has his own light. The meaning of the modifier "temporarily bitter" in "life" in this sentence is changed to "temporarily bitter" in the original sentence. The data such as "ambiguous sentences formed by different meanings of words" can collect a part of data as training corpus. For example: the person notified has not yet arrived. In this example, the "person notified" may be a person who receives a message, that is, a "person notified", or may be a person who notifies another person. The correct and incorrect sentence pairs may be the correct sentences: the person he has notified has not yet arrived. Erroneous sentence: the person notified has not yet arrived. Correct sentence: informing him that it is not yet. Erroneous sentence: the person notified has not yet arrived.

In addition, the grammar mistakes also comprise sentence-based mashup, namely, two expression modes are applied to one sentence at the same time, and two different sentences are mashed together. For example, "the main reason is due to" that two expression patterns are mixed together. Error data may be generated based on the base dataset for replacement of similar sentences.

Preferably, when the color rendering type includes modifier errors, the constructing each basic data set based on the corresponding color rendering type includes:

Specifically, correct sentences with modifier words are extracted based on the basic dataset, and then modifier words in the correct sentences are replaced with words with similar or opposite semantics, for example: will "if you see the movie, million-rich in lean, indian daravi lean, no strange. The "strange" in the "is changed into the word" familiar "with opposite sense, and the meaning of the original sentence to be expressed is wrong. For another example, say "Zhang Jianzhuang, the heart is very depressed by crossing the land every day. "the" pedal "in" is replaced by the semantically similar word "actual" and the expression is not as good as the original.

In this embodiment, after the data construction is completed, the classification model is also used to score the erroneous sentence and the original sentence, and only if the score of the original sentence is higher than the score of the erroneous sentence, the erroneous sentence and the correct sentence pair will be retained. The preparation before this includes: and collecting and manually sorting the modifier data set, and labeling and classifying the modifier data. Data is classified as positive, negative, neutral, and a score is given to each modifier.

Preferably, when the color rendering type includes a translation error, the constructing each basic data set based on the corresponding color rendering type includes:

In this embodiment, a part of short and small sentences can be extracted as correct data based on the basic data set, then sentences are generated by adopting the modes of Chinese translation, english translation, chinese translation, korean translation, etc., the generated sentences are compared with the original sentences in similarity, and sentences with similarity between 0.5 and 0.9 are reserved as error sentences. The purpose of this is, on the one hand, to filter out the data of the post-translation special outliers and, on the other hand, to ensure that the translated sentence is distinguishable from the original sentence.

In this embodiment, in step S13, the neural network model is trained based on the training data set to determine a trained neural network model, where the neural network model includes a classification layer to output a probability distribution of the corresponding color rendering type through the classification layer.

Specifically, the present invention employs BART-Large as a pre-trained neural network model. BART-Large (bi-directional autoregressive) is a sequence-to-sequence model that combines the advantages of denoising the self-encoder and the converter architecture. It processes the input sequence through the encoder and learns the context representation, and then generates an output sequence from the representation of the encoder using the decoder.

The invention adds a classification layer on the BART model, which takes the contextual representation of the encoder as input and outputs probability distributions of different color rendering types, wherein the classification layer is designed as a linear layer followed by a softmax activation function. Let the context of the encoder be denoted as h, represent the context representation of the input sequence, the linear layer linearly transforms the input h using weights W and offsets b: z=wh+b. The softmax activation function then converts the output z into a probability distribution of the color rendering type: y=softmax (z).

For training the model, the invention defines a cross-loss function for measuring the difference between the predicted probability distribution and the real labels. The true label is denoted by y_true and represents the true color rendering type. The cross entropy loss can be calculated as: l= - Σ (y_true) log (y)), where x represents element-wise multiplication.

In this embodiment, in step S14, text rendering is performed based on the output result of the trained neural network model, and a rendering result is determined. Preferably, when the color rendering result is a plurality of, the method further comprises: and scoring the plurality of color rendering results based on the sentence through degree model, and taking the color rendering result with the highest score as the final color rendering result.

Specifically, after model training is completed, in the reasoning stage, the beamsize is set to be 5, the output of each sentence corresponds to 5 answers, then the score of each answer is obtained through the sentence smoothness model by the 5 answers, the score obtained through smoothness is combined with the score of the answer, and the answer with the highest score is used as a final color-loosening result. Thereby improving the accuracy of the color rendering. Here, setting 5 answers is merely an example, and is not particularly limited.

Preferably, the method further comprises: and determining the color rendering result reaching a frequency threshold based on the historical use frequency of the color rendering result, and caching the color rendering result.

In this embodiment, to increase efficiency, we use the least recently used (LRU, least Recently Used) algorithm to cache the rendering results. The LRU algorithm will decide which data should be kept in the cache based on the frequency of most recent usage in order to be able to access quickly when needed. By using the LRU cache, repeated calculation of the same color rendering result can be avoided, so that the response speed of the system is improved.

According to yet another aspect of the present application, there is also provided a computer readable medium storing computer readable instructions executable by a processor to implement the foregoing method.

one or more processors; and

For example, computer-readable instructions, when executed, cause the one or more processors to: constructing a plurality of basic data sets based on a preset number of the plurality of color rendering types; constructing data of each basic data set based on the corresponding color rendering type, and determining the corresponding training data set; training a neural network model based on the training data set to determine a trained neural network model, wherein the neural network model comprises a classification layer so as to output probability distribution of a corresponding color rendering type through the classification layer; and performing text color rendering based on the output result of the trained neural network model, and determining a color rendering result.

According to the scheme, the color rendering types are subdivided, different data construction modes are adopted for different color rendering types, so that the resource and performance cost of a model layer can be saved, the performance and resource utilization of the model can be better optimized, and the overall color rendering task effect is improved. And the BART model is adopted to finely tune the color rendering data of different types, so that the problems of low training and reasoning speed and incoherence of corrected sentences can be solved. Further, a sentence-through degree model is adopted to measure the fluency of the prediction result of the BART model. By scoring and ranking the plurality of rendering results and screening out sentences with scores lower than the original sentences. The aim of this is to ensure that the generated sentence is closer to natural language in terms of grammar and consistency, thereby providing a higher quality text rendering result to enhance the user experience.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The terms first, second, etc. are used to denote a name, but not any particular order.

Claims

1. A method for text rendering, wherein the method comprises:

2. The method of claim 1, wherein when the color rendering result is a plurality, the method further comprises:

3. The method according to claim 1 or 2, wherein the method further comprises:

acquiring text data of a plurality of categories;

wherein constructing a plurality of base data sets based on a preset number of the plurality of color rendering types comprises:

4. The method of claim 3, wherein the constructing a plurality of base data sets based on the pre-processed text data and a preset number of a plurality of rendering types comprises:

5. The method of claim 1, wherein when the type of touch-up includes a word error, said constructing each base dataset based on the corresponding type of touch-up includes:

6. The method of claim 1, wherein when the touch-up type includes a grammar error, said constructing each base dataset based on the corresponding touch-up type comprises:

7. The method of claim 1, wherein when the touch-up type includes a modifier error, said constructing each base dataset based on the corresponding touch-up type includes:

8. The method of claim 1, wherein when the type of touch-up includes a translation error, said constructing each base dataset based on the corresponding type of touch-up includes:

9. The method of claim 1, wherein the method further comprises:

10. A computer readable medium having stored thereon computer readable instructions executable by a processor to implement the method of any of claims 1 to 9.

11. An apparatus for text rendering, wherein the apparatus comprises:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of any one of claims 1 to 9.