CN117973326A - Text optimization method, device, equipment and storage medium - Google Patents

Text optimization method, device, equipment and storage medium Download PDF

Info

Publication number
CN117973326A
CN117973326A CN202311303646.2A CN202311303646A CN117973326A CN 117973326 A CN117973326 A CN 117973326A CN 202311303646 A CN202311303646 A CN 202311303646A CN 117973326 A CN117973326 A CN 117973326A
Authority
CN
China
Prior art keywords
text
processed
vocabulary
semantic analysis
character information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311303646.2A
Other languages
Chinese (zh)
Inventor
颜宇辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN202311303646.2A priority Critical patent/CN117973326A/en
Publication of CN117973326A publication Critical patent/CN117973326A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a text optimizing method, a device, equipment and a storage medium.

Description

Text optimization method, device, equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a text optimization method, apparatus, device, and storage medium.
Background
In the conventional technology, when a user writes a report text, a story journal or a professional document, the user is difficult to check due to a large number of text words, and if the text has wrong words, wrong sentences or some logic defects, the reading of the user will be seriously affected.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a text optimization method, a text optimization device, text optimization equipment and a storage medium, and aims to solve the technical problem that defects in texts in the prior art can influence reading of users.
To achieve the above object, the present invention provides a text optimization method, including the steps of:
Responding to a text optimizing instruction triggered by a text editing application, and extracting a text to be processed corresponding to the text optimizing instruction;
carrying out semantic analysis on the text to be processed to obtain a semantic analysis result;
Generating an optimization mode corresponding to the text to be processed according to the current editing state of the text to be processed and the semantic analysis result;
And optimizing the text to be processed through a program interface based on the optimizing mode, wherein the program interface is a program interface for calling the text editing application.
Optionally, the performing semantic analysis on the text to be processed to obtain a semantic analysis result includes:
Acquiring character information and independent text segments of the text to be processed;
And carrying out semantic analysis on the text to be processed according to the character information and/or the independent text segment to obtain a semantic analysis result.
Optionally, the semantic analysis includes lexical analysis, and the semantic analysis result includes text miswords;
The semantic analysis is carried out on the text to be processed to obtain a semantic analysis result, which comprises the following steps:
Determining the text type and character information of the text to be processed;
Determining a corresponding vocabulary database according to the text category;
and performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain text miswords.
Optionally, the performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain text miswords includes:
Acquiring context information corresponding to the character information;
performing part-of-speech segmentation on the character information to obtain a plurality of vocabularies;
And performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords.
Optionally, the method for performing part-of-speech segmentation on the character information includes at least one of the following steps:
Obtaining character information of continuous preset quantity, obtaining a target character group, and performing part-of-speech segmentation on the character information according to a matching result of the target character group and a dictionary database;
Acquiring semantic rules of labeled semantic analysis results in a corpus, counting the combination probability of a target character set and the context information according to the semantic rules, and performing part-of-speech segmentation on the character information according to the combination probability;
and performing part-of-speech segmentation on the character information through the trained word segmentation model.
Optionally, the performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords includes:
matching each vocabulary in the text to be processed with the vocabulary database respectively;
Determining a word to be processed with failed matching;
Determining a first association probability between the vocabulary to be processed and context information corresponding to the vocabulary to be processed;
And obtaining text miswords according to the first association probability, wherein the text miswords are words to be processed, and the first association probability is smaller than a preset probability threshold.
Optionally, after performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain the text misword, the method further includes:
acquiring spelling information of the text misword;
inquiring at least one target vocabulary corresponding to the spelling information through a vocabulary database;
determining a target association probability between the target vocabulary and the context information corresponding to the target vocabulary;
and when the target association probability is greater than or equal to a preset probability threshold, adjusting the vocabulary to be processed according to the target vocabulary.
Optionally, the semantic analysis includes a syntactic analysis, and the semantic analysis result includes a text misprediction;
The semantic analysis is carried out on the text to be processed to obtain a semantic analysis result, which comprises the following steps:
classifying the character information of the text to be processed, and determining special characters and common characters;
Determining a common character set between adjacent special characters;
and carrying out syntactic analysis on the common character set to obtain text mispronounced sentences.
Optionally, the parsing the common character set includes:
determining a context common character set corresponding to the common character set in the text to be processed;
Respectively performing part-of-speech nesting test on the common character set and the contextual common character set;
When the test is successful, calculating a second association probability between the common character set and the contextual common character set;
And determining a text misplacement according to the second association probability, wherein the text misplacement is a common character set of which the second association probability is smaller than a preset probability threshold value.
Optionally, the semantic analysis includes a link analysis, and the semantic analysis result includes a text summary;
The semantic analysis is carried out on the text to be processed to obtain a semantic analysis result, which comprises the following steps:
acquiring a target independent text segment of the text to be processed;
And carrying out the connectivity analysis on the target independent text segment to obtain text summary.
Optionally, the performing the link analysis on the target independent text includes:
determining a context independent section of the target independent section;
Determining a target text intent of a target independent text segment and a context text intent of the context independent text segment, respectively;
And carrying out association probability calculation on the target independent text segment and the context independent text segment according to the target text intention and the context text intention.
Optionally, the optimizing the text to be processed through a program interface based on the optimizing mode includes:
when the current editing state is a storage state, modifying the current editing state of the text to be processed into the editing state through a program interface;
marking the text to be processed through a program interface according to the text miswords, the text missentences and the text summaries so as to remind a user to modify the content of the text to be processed.
In addition, to achieve the above object, the present invention also proposes a text optimizing apparatus including:
The extraction module is used for responding to a text optimizing instruction triggered by the text editing application and extracting a text to be processed corresponding to the text optimizing instruction;
the analysis module is used for carrying out semantic analysis on the text to be processed to obtain a semantic analysis result;
the generation module is used for generating an optimization mode corresponding to the text to be processed according to the current editing state of the text to be processed and the semantic analysis result;
And the optimizing module is used for optimizing the text to be processed through a program interface based on the optimizing mode, wherein the program interface is a program interface for calling the text editing application.
Optionally, the analysis module is further configured to obtain character information and an independent text segment of the text to be processed;
And carrying out semantic analysis on the text to be processed according to the character information and/or the independent text segment to obtain a semantic analysis result.
Optionally, the analysis module is further configured to determine a text category and character information of the text to be processed;
Determining a corresponding vocabulary database according to the text category;
and performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain text miswords.
Optionally, the analysis module is further configured to obtain context information corresponding to the character information;
performing part-of-speech segmentation on the character information to obtain a plurality of vocabularies;
And performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords.
Optionally, the analysis module is further configured to obtain a continuous preset number of character information, obtain a target character set, and perform part-of-speech segmentation on the character information according to a matching result of the target character set and a dictionary database;
Acquiring semantic rules of labeled semantic analysis results in a corpus, counting the combination probability of a target character set and the context information according to the semantic rules, and performing part-of-speech segmentation on the character information according to the combination probability;
and performing part-of-speech segmentation on the character information through the trained word segmentation model.
Optionally, the analysis module is further configured to match each vocabulary in the text to be processed with the vocabulary database respectively;
Determining a word to be processed with failed matching;
Determining a first association probability between the vocabulary to be processed and context information corresponding to the vocabulary to be processed;
And obtaining text miswords according to the first association probability, wherein the text miswords are words to be processed, and the first association probability is smaller than a preset probability threshold.
Optionally, the analysis module is further configured to obtain spelling information of the text misword;
inquiring at least one target vocabulary corresponding to the spelling information through a vocabulary database;
determining a target association probability between the target vocabulary and the context information corresponding to the target vocabulary;
and when the target association probability is greater than or equal to a preset probability threshold, adjusting the vocabulary to be processed according to the target vocabulary.
In addition, to achieve the above object, the present invention also proposes a text optimizing apparatus including: a memory, a processor, and a text optimization program stored on the memory and executable on the processor, the text optimization program configured to implement the steps of the text optimization method as described above.
In addition, to achieve the above object, the present invention also proposes a storage medium having stored thereon a text optimizing program which, when executed by a processor, implements the steps of the text optimizing method as described above.
According to the text optimizing method and device, the text optimizing instruction triggered by the text editing application is responded, the text to be processed corresponding to the text optimizing instruction is further extracted from the text editing application, semantic analysis is carried out on the text to be processed, whether the defect affecting reading exists in the text to be processed is judged according to the semantic recognition result, an optimizing mode of the text to be processed is generated according to the current editing state of the text to be processed and the semantic analysis result, so that the text defect is solved, finally, the text editing application is called through a program interface to optimize the text to be processed through the optimizing mode, the technical problem that the defect in the text in the prior art affects reading of a user is avoided, and the reading experience of the user is improved.
Drawings
FIG. 1 is a schematic diagram of a text optimization device of a hardware runtime environment according to an embodiment of the present invention;
FIG. 2 is a flow chart of a first embodiment of the text optimization method of the present invention;
FIG. 3 is a flow chart of a second embodiment of the text optimization method of the present invention;
FIG. 4 is a flow chart of an embodiment of a text optimization method according to the present invention;
FIG. 5 is a flow chart of a third embodiment of the text optimization method of the present invention;
fig. 6 is a block diagram showing the structure of a first embodiment of the text optimizing apparatus of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a text optimizing device of a hardware running environment according to an embodiment of the present invention.
As shown in fig. 1, the text optimizing apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Those skilled in the art will appreciate that the structure shown in fig. 1 does not constitute a limitation of the text optimization device, and may include more or fewer components than shown, or may combine certain components, or may be arranged in a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and a text optimization program may be included in the memory 1005 as one type of storage medium.
In the text optimization device shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the text optimization device of the present invention may be provided in the text optimization device, and the text optimization device calls the text optimization program stored in the memory 1005 through the processor 1001 and executes the text optimization method provided by the embodiment of the present invention.
An embodiment of the present invention provides a text optimization method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the text optimization method of the present invention.
In this embodiment, the text optimization method includes the following steps:
Step S10: and responding to a text optimizing instruction triggered by the text editing application, and extracting a text to be processed corresponding to the text optimizing instruction.
It should be noted that, the execution body of the method of the present embodiment may be a device having functions of data processing, program running, data transmission, and the like, for example: the present embodiment is not particularly limited, and a computer or a mobile phone equipped with an intelligent assistant will be described as an example in the present embodiment and the following embodiments.
It should be noted that, in order to enhance information processing capability, in some embodiments, a smart assistant may be installed on the terminal device, which may be a program built based on natural language processing techniques of a generated pre-training transducer (GENERATIVE PRE-trained Transformer, GPT) model, and the smart assistant may receive input text from a user, parse the input text into a computer-understandable data format, and then generate response text using the pre-trained neural network model.
In a specific implementation, the intelligent assistant uses a neural network in the deep learning technology to train through a large amount of corpus data, so that the structure, grammar rules and semantic information of the language can be learned. After the user enters the text, the intelligent assistant first performs word segmentation and parsing on the text, converts it into a computer-readable vector form, and then enters the pre-trained neural network model for inference. In the inference process, the intelligent assistant predicts the most likely next response based on the text entered by the user and the previous context. The prediction process is based on the learning of a model from a large amount of corpus data, so that the manner of human natural language expression can be restored to a great extent, and highly coherent and natural response text can be generated. And finally, the intelligent assistant returns the generated response text to the user to complete one dialogue interaction. Throughout the process, the intelligent assistant also continuously learns and optimizes to provide more accurate and user-appropriate answers.
It should be appreciated that the text editing application may perform editing operations such as adding, subtracting, optimizing, annotating, etc. the text, for example: the application software such as office, gold mountain document or WPS is not particularly limited in this embodiment, and for convenience of explanation, office-word software will be used as an example in this embodiment and the following embodiments.
In order to perform response interaction on the text content in time, the embodiment can apply permission to the editing application where the text is located before executing all the steps, so that the intelligent assistant can monitor the text content or the text state in the text editing application in real time.
It can be appreciated that the text optimization instruction is a request instruction sent by the text editing application to the intelligent assistant in a specific state, where in this embodiment, since new characters are continuously input in the text writing process, the characters may have miswords, and there are many defects that do not correspond to the foregoing logic or data.
In addition, the specific state can be that the text to be processed of the text editing application is in a storage state, if the text to be processed is in the text editing process, the text is detected and optimized in real time, the workload is large, and the text analysis is difficult, so that the embodiment can optimize the stored text after the user selects to store the edited text, and improve the efficiency of text optimization.
Step S20: and carrying out semantic analysis on the text to be processed to obtain a semantic analysis result.
It should be noted that, in this embodiment, the semantic analysis may be performed on the text to be processed by performing lexical analysis, syntactic analysis, and join-part analysis, so as to obtain text miswords corresponding to the lexical analysis, text missentences corresponding to the syntactic analysis, and text summary content corresponding to the join-part analysis, so as to reduce the above-mentioned defects of miswords, missentences, and the like of the text.
In a specific implementation, the lexical analysis, the syntactic analysis and the connectivity analysis above may be implemented by an analysis model trained in advance according to natural language samples, where the analysis model selects different algorithms based on different analysis processes, for example: in the case of lexical analysis, the cyclic neural network (Recurrent Neural Network, RNN) and the conditional random field (Conditional Random Field, CRF) may be combined, or other models that implement the same or similar functions, which is not particularly limited in this embodiment.
Step S30: and generating an optimization mode corresponding to the text to be processed according to the current editing state of the text to be processed and the semantic analysis result.
It should be noted that, the current editing state of the text to be processed includes: the text to be processed is not optimized when the text to be processed is in the state of being edited, and the corresponding optimizing mode is selected through the defects detected by the text to be processed when the text to be processed is in the state of being saved, for example: the embodiment is not particularly limited to annotating, labeling, increasing or decreasing.
Step S40: and optimizing the text to be processed through a program interface based on the optimizing mode, wherein the program interface is a program interface for calling the text editing application.
It can be understood that in this embodiment, the intelligent assistant is used to assist the text editing application to perform optimization processing on the text to be processed, and when performing operations such as marking, annotating or increasing or decreasing, the intelligent assistant may call word through the program interface to perform text optimization in a related manner on the text to be processed, and the intelligent assistant does not participate in the actual text processing process.
According to the method, the device and the system, the text optimizing instruction triggered by the text editing application is responded, the text to be processed corresponding to the text optimizing instruction is further extracted from the text editing application, semantic analysis is carried out on the text to be processed, whether the text to be processed has defects affecting reading or not is judged according to the semantic recognition result, an optimizing mode of the text to be processed is generated according to the current editing state of the text to be processed and the semantic analysis result, so that the text defects are solved, finally, the text editing application is called through a program interface to optimize the text to be processed through the optimizing mode, the technical problem that the defects in the text in the prior art can affect reading of a user is avoided, and the reading experience of the user is improved.
Referring to fig. 3, fig. 3 is a schematic flow chart of a second embodiment of a text optimization method according to the present invention.
Based on the first embodiment, in this embodiment, the step S20 includes:
Step S201: and acquiring character information and independent text segments of the text to be processed.
It is understood that the character information may be punctuation marks, chinese characters, numerals or special characters, and letter related characters, wherein the letter related characters include english characters, pinyin characters, and letter abbreviations, which are not particularly limited in this embodiment.
In addition, the independent text can be determined through specific punctuation marks or paragraph settings in the text to be processed, and is mainly used for summarizing the core content of the text or analyzing the logicality between the text and the context.
Further, in this embodiment, when the word is called through the program interface to perform optimization processing on the text to be processed, when the text to be processed is in an "editing" state, the content of the independent text existing in the text to be processed may be analyzed, so as to analyze whether the independent text has a problem of word-misplacement or sentence misplacement, etc., and the semantic analysis is not performed on the portion where the independent text is not formed temporarily, so that the real-time performance of text optimization is improved, and the use experience of the user is improved.
Step S202: and carrying out semantic analysis on the text to be processed according to the character information and/or the independent text segment to obtain a semantic analysis result.
In the specific implementation, when analyzing whether the text to be processed has wrong characters or wrong words, lexical analysis can be performed through character information in the text to be processed, so that whether the situation of wrong characters, repeated characters and missing characters exists or not is judged.
Further, when performing lexical analysis to obtain a wrong word in the text, performing semantic analysis on the text to be processed according to the character information and/or the independent text to obtain a semantic analysis result, including:
Determining the text type and character information of the text to be processed;
Determining a corresponding vocabulary database according to the text category;
and performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain text miswords.
It should be noted that the text category of the text to be processed includes, but is not limited to: professional documents, report texts, literary works and the like, wherein the professional documents refer to law documents, academic papers, professional guidance documents and the like, and the report texts refer to lectures, project plans and the like; literary works refer to various types of telling stories, prose, etc., which are not particularly limited in this embodiment.
In a specific implementation, the text category of the text to be processed directly affects the accuracy of the subsequent analysis result, for example: for the professional documents in the field of display panels, the 'front shoulder' refers to a part of time sequence area of the display panel during time sequence control, and the front shoulder has related application in other fields, so that a vocabulary database of the text to be processed is not limited in the process of semantic analysis, and the accuracy of the semantic analysis can be seriously affected.
In this embodiment, the vocabulary database may be obtained according to published materials in different fields on the internet, or may be a dictionary built in advance by the user, which is not limited in this embodiment.
Further, in order to confirm each misword in the text to be processed, the lexical analysis is performed on the text to be processed according to the vocabulary database and the character information to obtain a text misword, which includes:
Acquiring context information corresponding to the character information;
performing part-of-speech segmentation on the character information to obtain a plurality of vocabularies;
And performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords.
It should be noted that the context information includes: the number of context characters adjacent to the target character may be the same as the number of target characters, facilitating lexical analysis, for example: for "Beijing welcome you" in the text, "Beijing", the corresponding context information is "north", "cheering", and for the character set, "welcome" the corresponding context information may be "Beijing", "you", since there is only one character in the following, the context information may be "you" only, or may be filled with spaces, and the number "you" corresponding to the character set is rounded up, which is not limited in this embodiment.
It can be understood that the process of segmenting the character information in the text to be processed refers to selecting an adjacent character set with high combination probability in the text to be processed, and performing subsequent lexical analysis on the character set as a vocabulary to obtain text miswords.
In a specific implementation, the part-of-speech segmentation process of the character information may be to obtain a continuous preset number of character information, obtain a target character set, and perform part-of-speech segmentation on the character information according to a matching result of the target character set and a dictionary database;
The consecutive preset number may be defined by itself, for example: 2 or 3, which is not particularly limited in this embodiment; the dictionary database contains relevant information such as definitions, paraphrases, example sentences and the like of various vocabularies, and can be generally used in various language processing and natural language understanding application programs. These dictionary databases may be manually written or automatically extracted and collated by a computer program to ultimately form a structured data set. These dictionary databases are typically stored in text files, XML or JSON format, and may also be stored in SQL databases for query and retrieval.
More common dictionary databases include the oxford english dictionary (Oxford English Dictionary), the kolin english dictionary (Collins English Dictionary), and the weskin english dictionary (Merriam-Webster), as well as partially open source dictionary databases such as WordNet and Open Multilingual Wordnet, and the like.
In the specific implementation, the target character set is matched with the dictionary database, and if the target character set is successfully matched, the target character set is explained to form a vocabulary.
In addition, the embodiment can also obtain semantic rules of the labeled semantic analysis results in the corpus, count the combination probability of the target character set and the context information according to the semantic rules, and perform part-of-speech segmentation on the character information according to the combination probability;
Illustrating: for the text "study biology", by calculating the combination probability of the contextual character groups of each sequence corresponding to the target character group, taking "study" as an example, the character groups which can be obtained by the text "study biology" are three kinds of "biology", "life", and the corresponding combination probabilities are calculated respectively, and the probability calculation formula is as follows:
P=p(w1)p(<w2|w1>)p(<w3|w2,w1>)…p(<wn|wn-1…w2,w1>)
Where w n refers to the nth character, p (< w n|wn-1…w2,w1 >) refers to the probability that each of the nth characters joins the first n-1 characters, which is determined by the labeled semantic rules in the corpus.
In this embodiment, the part-of-speech segmentation may be performed on the character information by using a trained word segmentation model, where the trained word segmentation model may be a word segmentation model obtained after training based on neural network algorithms such as a cyclic neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Networks, CNN), and the embodiment is not limited specifically.
Further, the performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords includes:
matching each vocabulary in the text to be processed with the vocabulary database respectively;
Determining a word to be processed with failed matching;
Determining a first association probability between the vocabulary to be processed and context information corresponding to the vocabulary to be processed;
And obtaining text miswords according to the first association probability, wherein the text miswords are words to be processed, and the first association probability is smaller than a preset probability threshold.
In a specific implementation, after the part of speech segmentation is performed, whether single wrong words exist in the text to be processed can be effectively screened, after the words contained in the text to be processed are obtained, the obtained words are respectively matched with a corresponding analog word database, and further, the words in the professional field can be verified, and whether wrong words exist or not can be judged.
The word to be processed which fails to be matched with the word database is not represented as an erroneous word, and the word is not recorded in the word database, so that the embodiment can judge whether text miswords exist or not and improve the readability of the text by calculating the association probability of the word to be processed and the context information corresponding to the word to be processed.
In a specific implementation, the process of calculating the association probability may be the same as the above calculation formula of the combination probability of calculating the single character, and the calculation of the single character may be adjusted to the probability calculation of the character set.
Further, in order to improve the user experience of reading the text, the lexical analysis is performed on the text to be processed according to the vocabulary database and the character information, and after obtaining the text misword, the method further includes:
acquiring spelling information of the text misword;
inquiring at least one target vocabulary corresponding to the spelling information through a vocabulary database;
determining a target association probability between the target vocabulary and the context information corresponding to the target vocabulary;
and when the target association probability is greater than or equal to a preset probability threshold, adjusting the vocabulary to be processed according to the target vocabulary.
The embodiment also provides an adjustment method for text miswords, namely after determining text miswords, at least one target vocabulary of the same spelling information is found in a vocabulary database according to the spelling information of the text miswords, for example: for the text "draft severity-! The text misword is "draft", and the corresponding spelling information is "nihao", so that the searched target vocabulary may be "hello", "your number", and the like, which is not particularly limited in this embodiment.
After the target vocabulary is determined, the association probability with the context information is calculated again, if the association probability is larger, the probability of representing that the word is wrong is smaller, the vocabulary to be processed which is failed to be matched can be adjusted according to the target vocabulary, and the text' draft more-! "for example, referring to FIG. 4, adjusting text according to the target vocabulary" hello "may result in" hello severity-! And the modification is completed, so that the wrong characters or the wrong words are removed, and the fluency of text reading is improved.
Further, the embodiment also provides a method for identifying a missentence for text optimization, where in this embodiment, performing semantic analysis on the text to be processed according to the character information and/or the independent text segment to obtain a semantic analysis result includes:
classifying the character information of the text to be processed, and determining special characters and common characters;
Determining a common character set between adjacent special characters;
and carrying out syntactic analysis on the common character set to obtain text mispronounced sentences.
It should be noted that the special characters are several specific symbols among index point symbols, including but not limited to: periods, partials, ellipses, exclamation marks, and question marks, in conventional natural language, each sentence is separated by punctuation marks, but not between any two adjacent punctuations, for example: "illustrate: the Xiaoming, xiaohong and Xiaolan are all senior citizens. In this sentence, "Xiaoming" precedes two adjacent punctuations, but he is not a sentence alone and has no practical meaning.
Therefore, the present embodiment screens the common character set between adjacent special characters to obtain each sentence in the text to be processed, so as to perform syntactic analysis on the extracted common character set, and identify the text mispronounced sentence in the text to be processed.
Further, the performing syntactic analysis on the common character set includes:
determining a context common character set corresponding to the common character set in the text to be processed;
Respectively performing part-of-speech nesting test on the common character set and the contextual common character set;
When the test is successful, calculating a second association probability between the common character set and the contextual common character set;
And determining a text misplacement according to the second association probability, wherein the text misplacement is a common character set of which the second association probability is smaller than a preset probability threshold value.
In a specific implementation, a sentence, if complete, is typically composed of multiple syntactic constituent words, including, but not limited to: subjects, predicates, objects, subjects, and words, and each vocabulary may be generally divided into word parts such as verbs, nouns, adjectives, and adverbs according to different usage scenarios, and the word parts and the syntax components may not be in one-to-one correspondence.
In this embodiment, part-of-speech nesting of the common character sets and the context common character sets refers to performing complete syntactic analysis, local syntactic analysis and dependency analysis on the syntax cost of each common character set, where the complete syntactic analysis may be implemented by using an analysis model constructed based on RVNN neural network algorithm, and the local syntactic analysis and dependency analysis may be implemented by using an analysis model of a stacked long-short-term neural network algorithm, or may be implemented by using other network models capable of implementing the same or similar functions, which is not limited in this embodiment.
Further, in this embodiment, in order to determine the logical sequence of the text as a whole, the semantic analysis is performed on the text to be processed according to the character information and/or the independent text segment to obtain a semantic analysis result, which includes:
acquiring a target independent text segment of the text to be processed;
And carrying out the connectivity analysis on the target independent text segment to obtain text summary.
It should be noted that, the target independent text segment of the text to be processed may be determined by a specific punctuation mark or a paragraph setting in the text to be processed, for example: periods or exclamation marks, etc. may also be determined by text indentation in conjunction with specific punctuation marks, which is not particularly limited in this embodiment.
In a specific implementation, the context's connectivity analysis includes a context's internal connectivity analysis and a context's connectivity analysis between adjacent contexts, where the context's connectivity analysis refers to syntactic and lexical connectivity, and when part of the context depends on the interpretation or description of another part of the context, the two parts form a connection relationship, while there is such a connection relationship between adjacent contexts as well, but typically the same body between the same context, and there may be transitions between different bodies between adjacent contexts.
Further, the performing the link analysis on the target independent text segment includes:
determining a context independent section of the target independent section;
Determining a target text intent of a target independent text segment and a context text intent of the context independent text segment, respectively;
And carrying out association probability calculation on the target independent text segment and the context independent text segment according to the target text intention and the context text intention.
It should be noted that, the text is intended to represent the expression center content of the whole section of the independent text to a certain extent, generally, only one center content is interpreted or expanded by the other sections, if there is a large difference between the center contents of the adjacent text sections, the logic conflict is indicated, and meanwhile, if the center contents of the plurality of text sections are similar, text summary can be generated according to the center contents, so as to help the user to quickly understand the content of the text.
According to the method, the device and the system, the text to be processed is subjected to semantic analysis according to the character information and/or the independent text, so that various defects in the text to be processed are judged, corresponding optimization modes are provided, the readability of the text is improved, and the reading efficiency of a user is improved.
Referring to fig. 5, fig. 5 is a schematic flow chart of a third embodiment of a text optimization method according to the present invention.
Based on the above second embodiment, in this embodiment, the step S40 includes:
step S401: and when the current editing state is a storage state, modifying the current editing state of the text to be processed into the editing state through a program interface.
Note that, the current editing state includes: the text to be processed is not optimized when the text to be processed is in the state of being edited, so that the workload is reduced; if the text to be processed is in the "save state", selecting a corresponding optimization mode through the defects detected by the text to be processed, for example: the embodiment is not particularly limited to annotating, labeling, increasing or decreasing.
In addition, in the editing state, the embodiment can optimize the content of the independent text segment which is already edited in the text, detect the defect problem of the text in real time, improve the writing quality of the text of the user, and the two modes can operate together.
Step S402: marking the text to be processed through a program interface according to the text miswords, the text missentences and the text summaries so as to remind a user to modify the content of the text to be processed.
In a specific implementation, the embodiment can optimize text miswords, text missentences, text summarization and the like by calling a text editing application, optimize the text to be processed in modes of annotating, identifying and directly modifying, remind a user of optimized content, and store the optimized text after the user confirms.
Meanwhile, the text summary can be displayed at the beginning of the text to be processed in the form of a summary so as to be convenient for a user to read.
According to the embodiment, the text editing application corresponding to the text to be processed is called through the program interface, so that the text to be processed is optimized according to the recognized text miswords, text missentences and text summarization, the readability of the text is improved, and the defect problem is reduced.
In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium is stored with a text optimizing program, and the text optimizing program realizes the steps of the text optimizing method when being executed by a processor.
Because the storage medium adopts all the technical schemes of all the embodiments, the storage medium has at least all the beneficial effects brought by the technical schemes of the embodiments, and the description is omitted here.
Referring to fig. 6, fig. 6 is a block diagram showing the structure of a first embodiment of the text optimizing apparatus of the present invention.
As shown in fig. 6, the text optimizing device provided by the embodiment of the invention includes:
and the extracting module 10 is used for responding to the text optimizing instruction triggered by the text editing application and extracting the text to be processed corresponding to the text optimizing instruction.
And the analysis module 20 is used for carrying out semantic analysis on the text to be processed to obtain a semantic analysis result.
The generating module 30 is configured to generate an optimization mode corresponding to the text to be processed according to the current editing state of the text to be processed and the semantic analysis result.
And the optimizing module 40 is configured to perform optimizing processing on the text to be processed through a program interface based on the optimizing mode, where the program interface is a program interface for calling the text editing application.
In an embodiment, the analysis module 20 is further configured to determine a text category and character information of the text to be processed; determining a corresponding vocabulary database according to the text category; and performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain text miswords.
In an embodiment, the analysis module 20 is further configured to obtain context information corresponding to the character information; performing part-of-speech segmentation on the character information to obtain a plurality of vocabularies; and performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords.
In an embodiment, the analysis module 20 is further configured to obtain a continuous preset number of character information, obtain a target character set, and perform part-of-speech segmentation on the character information according to a matching result of the target character set and a dictionary database; and/or obtaining semantic rules of the labeled semantic analysis results in the corpus, counting the combination probability of the target character set and the context information according to the semantic rules, and performing part-of-speech segmentation on the character information according to the combination probability; and/or performing part-of-speech segmentation on the character information through the trained word segmentation model.
In an embodiment, the analysis module 20 is further configured to match each vocabulary in the text to be processed with the vocabulary database respectively; determining a word to be processed with failed matching; determining a first association probability between the vocabulary to be processed and context information corresponding to the vocabulary to be processed; and obtaining text miswords according to the first association probability, wherein the text miswords are words to be processed, and the first association probability is smaller than a preset probability threshold.
In an embodiment, the analysis module 20 is further configured to obtain spelling information of the text misword; inquiring at least one target vocabulary corresponding to the spelling information through a vocabulary database; determining a target association probability between the target vocabulary and the context information corresponding to the target vocabulary; and when the target association probability is greater than or equal to a preset probability threshold, adjusting the vocabulary to be processed according to the target vocabulary.
In an embodiment, the analysis module 20 is further configured to classify character information of the text to be processed, and determine a special character and an ordinary character; determining a common character set between adjacent special characters; and carrying out syntactic analysis on the common character set to obtain text mispronounced sentences.
In an embodiment, the analysis module 20 is further configured to determine a contextual common character set corresponding to the common character set in the text to be processed; respectively performing part-of-speech nesting test on the common character set and the contextual common character set; when the test is successful, calculating a second association probability between the common character set and the contextual common character set; and determining a text misplacement according to the second association probability, wherein the text misplacement is a common character set of which the second association probability is smaller than a preset probability threshold value.
In an embodiment, the analysis module 20 is further configured to obtain a target independent text segment of the text to be processed; and carrying out the connectivity analysis on the target independent text segment to obtain text summary.
In an embodiment, the analysis module 20 is further configured to determine a context independent segment of the target independent segment; determining a target text intent of a target independent text segment and a context text intent of the context independent text segment, respectively; and carrying out association probability calculation on the target independent text segment and the context independent text segment according to the target text intention and the context text intention.
In an embodiment, the optimizing module 40 is further configured to modify, through a program interface, the current editing state of the text to be processed to be editing when the current editing state is a save state; marking the text to be processed through a program interface according to the text miswords, the text missentences and the text summaries so as to remind a user to modify the content of the text to be processed.
According to the method, the device and the system, the text optimizing instruction triggered by the text editing application is responded, the text to be processed corresponding to the text optimizing instruction is further extracted from the text editing application, semantic analysis is carried out on the text to be processed, whether the text to be processed has defects affecting reading or not is judged according to the semantic recognition result, an optimizing mode of the text to be processed is generated according to the current editing state of the text to be processed and the semantic analysis result, so that the text defects are solved, finally, the text editing application is called through a program interface to optimize the text to be processed through the optimizing mode, the technical problem that the defects in the text in the prior art can affect reading of a user is avoided, and the reading experience of the user is improved.
It should be understood that the foregoing is illustrative only and is not limiting, and that in specific applications, those skilled in the art may set the invention as desired, and the invention is not limited thereto.
It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present invention, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.
In addition, technical details that are not described in detail in this embodiment may refer to the text optimization method provided in any embodiment of the present invention, and are not described herein again.
Furthermore, it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. Read Only Memory)/RAM, magnetic disk, optical disk) and including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
The invention discloses A1, a text optimization method, which comprises the following steps:
Responding to a text optimizing instruction triggered by a text editing application, and extracting a text to be processed corresponding to the text optimizing instruction;
carrying out semantic analysis on the text to be processed to obtain a semantic analysis result;
Generating an optimization mode corresponding to the text to be processed according to the current editing state of the text to be processed and the semantic analysis result;
And optimizing the text to be processed through a program interface based on the optimizing mode, wherein the program interface is a program interface for calling the text editing application.
A2, carrying out semantic analysis on the text to be processed to obtain a semantic analysis result according to the text optimization method as described in A1, wherein the method comprises the following steps:
Acquiring character information and independent text segments of the text to be processed;
And carrying out semantic analysis on the text to be processed according to the character information and/or the independent text segment to obtain a semantic analysis result.
A3, the text optimization method according to A2, wherein the semantic analysis comprises lexical analysis, and the semantic analysis result comprises text miswords;
Carrying out semantic analysis on the text to be processed according to the character information to obtain a semantic analysis result, wherein the semantic analysis result comprises the following steps:
Determining the text type and character information of the text to be processed;
Determining a corresponding vocabulary database according to the text category;
and performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain text miswords.
A4, performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain text miswords, wherein the text miswords are obtained by the text optimization method as shown in A3 and comprise the following steps:
Acquiring context information corresponding to the character information;
performing part-of-speech segmentation on the character information to obtain a plurality of vocabularies;
And performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords.
A5, the text optimization method according to A4, wherein the part-of-speech segmentation of the character information comprises at least one of the following steps:
Obtaining character information of continuous preset quantity, obtaining a target character group, and performing part-of-speech segmentation on the character information according to a matching result of the target character group and a dictionary database;
Acquiring semantic rules of labeled semantic analysis results in a corpus, counting the combination probability of a target character set and the context information according to the semantic rules, and performing part-of-speech segmentation on the character information according to the combination probability;
and performing part-of-speech segmentation on the character information through the trained word segmentation model.
A6, performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords, wherein the text miswords are obtained by the text optimization method as described in A4, and the text miswords comprise:
matching each vocabulary in the text to be processed with the vocabulary database respectively;
Determining a word to be processed with failed matching;
Determining a first association probability between the vocabulary to be processed and context information corresponding to the vocabulary to be processed;
And obtaining text miswords according to the first association probability, wherein the text miswords are words to be processed, and the first association probability is smaller than a preset probability threshold.
A7, the text optimization method according to A3, wherein the lexical analysis is performed on the text to be processed according to the lexical database and the character information, and after obtaining the text misword, the text optimization method further comprises the following steps:
acquiring spelling information of the text misword;
inquiring at least one target vocabulary corresponding to the spelling information through a vocabulary database;
determining a target association probability between the target vocabulary and the context information corresponding to the target vocabulary;
and when the target association probability is greater than or equal to a preset probability threshold, adjusting the vocabulary to be processed according to the target vocabulary.
A8, the text optimization method according to A2, wherein the semantic analysis comprises syntactic analysis, and the semantic analysis result comprises text mispronounces;
the semantic analysis is carried out on the text to be processed according to the character information to obtain a semantic analysis result, and the method comprises the following steps:
classifying the character information of the text to be processed, and determining special characters and common characters;
Determining a common character set between adjacent special characters;
and carrying out syntactic analysis on the common character set to obtain text mispronounced sentences.
A9, the text optimization method according to A8, wherein the syntactic analysis of the common character set comprises the following steps:
determining a context common character set corresponding to the common character set in the text to be processed;
Respectively performing part-of-speech nesting test on the common character set and the contextual common character set;
When the test is successful, calculating a second association probability between the common character set and the contextual common character set;
And determining a text misplacement according to the second association probability, wherein the text misplacement is a common character set of which the second association probability is smaller than a preset probability threshold value.
A10, the text optimization method according to A2, wherein the semantic analysis comprises a link analysis, and the semantic analysis result comprises a text summary;
Carrying out semantic analysis on the text to be processed according to the independent text segment to obtain a semantic analysis result, wherein the semantic analysis result comprises the following steps:
acquiring a target independent text segment of the text to be processed;
And carrying out the connectivity analysis on the target independent text segment to obtain text summary.
A11, the text optimization method of A10, wherein the performing the connectivity analysis on the target independent text segment comprises:
determining a context independent section of the target independent section;
Determining a target text intent of a target independent text segment and a context text intent of the context independent text segment, respectively;
And carrying out association probability calculation on the target independent text segment and the context independent text segment according to the target text intention and the context text intention.
A12, the text optimization method according to any one of A1-A11, wherein the optimizing processing of the text to be processed through a program interface based on the optimizing mode comprises the following steps:
when the current editing state is a storage state, modifying the current editing state of the text to be processed into the editing state through a program interface;
marking the text to be processed through a program interface according to the text miswords, the text missentences and the text summaries so as to remind a user to modify the content of the text to be processed.
The invention discloses a B13, a text optimizing device, which comprises:
The extraction module is used for responding to a text optimizing instruction triggered by the text editing application and extracting a text to be processed corresponding to the text optimizing instruction;
the analysis module is used for carrying out semantic analysis on the text to be processed to obtain a semantic analysis result;
the generation module is used for generating an optimization mode corresponding to the text to be processed according to the current editing state of the text to be processed and the semantic analysis result;
And the optimizing module is used for optimizing the text to be processed through a program interface based on the optimizing mode, wherein the program interface is a program interface for calling the text editing application.
The text optimizing device as described in the step B14, the analyzing module is further configured to determine text category and character information of the text to be processed;
Determining a corresponding vocabulary database according to the text category;
and performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain text miswords.
B15, the text optimizing device as described in B14, wherein the analysis module is further configured to obtain context information corresponding to the character information;
performing part-of-speech segmentation on the character information to obtain a plurality of vocabularies;
And performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords.
The text optimizing device as described in the step B16, the analyzing module is further configured to obtain continuous preset number of character information, obtain a target character set, and perform part-of-speech segmentation on the character information according to a matching result of the target character set and a dictionary database;
Acquiring semantic rules of labeled semantic analysis results in a corpus, counting the combination probability of a target character set and the context information according to the semantic rules, and performing part-of-speech segmentation on the character information according to the combination probability;
and performing part-of-speech segmentation on the character information through the trained word segmentation model.
B17, the text optimizing device as described in B15, wherein the analysis module is further configured to match each vocabulary in the text to be processed with the vocabulary database;
Determining a word to be processed with failed matching;
Determining a first association probability between the vocabulary to be processed and context information corresponding to the vocabulary to be processed;
And obtaining text miswords according to the first association probability, wherein the text miswords are words to be processed, and the first association probability is smaller than a preset probability threshold.
B18, the text optimizing device as described in B14, wherein the analysis module is further configured to obtain spelling information of the text misword;
inquiring at least one target vocabulary corresponding to the spelling information through a vocabulary database;
determining a target association probability between the target vocabulary and the context information corresponding to the target vocabulary;
and when the target association probability is greater than or equal to a preset probability threshold, adjusting the vocabulary to be processed according to the target vocabulary.
The invention also discloses C19, a text optimizing device, the text optimizing device includes: a memory, a processor, and a text optimization program stored on the memory and executable on the processor, the text optimization program configured to implement the text optimization method as described above.
The invention also discloses D20, a storage medium, wherein the storage medium stores a text optimizing program, and the text optimizing program realizes the text optimizing method when being executed by a processor.

Claims (10)

1. A text optimization method, characterized in that the text optimization method comprises:
Responding to a text optimizing instruction triggered by a text editing application, and extracting a text to be processed corresponding to the text optimizing instruction;
carrying out semantic analysis on the text to be processed to obtain a semantic analysis result;
Generating an optimization mode corresponding to the text to be processed according to the current editing state of the text to be processed and the semantic analysis result;
And optimizing the text to be processed through a program interface based on the optimizing mode, wherein the program interface is a program interface for calling the text editing application.
2. The text optimization method according to claim 1, wherein the performing semantic analysis on the text to be processed to obtain a semantic analysis result includes:
Acquiring character information and independent text segments of the text to be processed;
And carrying out semantic analysis on the text to be processed according to the character information and/or the independent text segment to obtain a semantic analysis result.
3. The text optimization method of claim 2, wherein the semantic analysis comprises lexical analysis, the semantic analysis results comprising text miswords;
Carrying out semantic analysis on the text to be processed according to the character information to obtain a semantic analysis result, wherein the semantic analysis result comprises the following steps:
Determining the text type and character information of the text to be processed;
Determining a corresponding vocabulary database according to the text category;
and performing lexical analysis on the text to be processed according to the vocabulary database and the character information to obtain text miswords.
4. The text optimization method according to claim 3, wherein the lexical analysis is performed on the text to be processed according to the vocabulary database and the character information to obtain text miswords, including:
Acquiring context information corresponding to the character information;
performing part-of-speech segmentation on the character information to obtain a plurality of vocabularies;
And performing lexical analysis on each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords.
5. The text optimization method of claim 4, wherein the manner of part-of-speech segmentation of the character information includes at least one of:
Obtaining character information of continuous preset quantity, obtaining a target character group, and performing part-of-speech segmentation on the character information according to a matching result of the target character group and a dictionary database;
Acquiring semantic rules of labeled semantic analysis results in a corpus, counting the combination probability of a target character set and the context information according to the semantic rules, and performing part-of-speech segmentation on the character information according to the combination probability;
and performing part-of-speech segmentation on the character information through the trained word segmentation model.
6. The text optimization method according to claim 4, wherein the lexical analysis of each vocabulary in the text to be processed according to the context information and the vocabulary database to obtain text miswords comprises:
matching each vocabulary in the text to be processed with the vocabulary database respectively;
Determining a word to be processed with failed matching;
Determining a first association probability between the vocabulary to be processed and context information corresponding to the vocabulary to be processed;
And obtaining text miswords according to the first association probability, wherein the text miswords are words to be processed, and the first association probability is smaller than a preset probability threshold.
7. The text optimization method according to claim 3, wherein the lexical analysis is performed on the text to be processed according to the vocabulary database and the character information, and after obtaining the text misword, the method further comprises:
acquiring spelling information of the text misword;
inquiring at least one target vocabulary corresponding to the spelling information through a vocabulary database;
determining a target association probability between the target vocabulary and the context information corresponding to the target vocabulary;
and when the target association probability is greater than or equal to a preset probability threshold, adjusting the vocabulary to be processed according to the target vocabulary.
8. A text optimizing device, characterized in that the text optimizing device comprises:
The extraction module is used for responding to a text optimizing instruction triggered by the text editing application and extracting a text to be processed corresponding to the text optimizing instruction;
the analysis module is used for carrying out semantic analysis on the text to be processed to obtain a semantic analysis result;
the generation module is used for generating an optimization mode corresponding to the text to be processed according to the current editing state of the text to be processed and the semantic analysis result;
And the optimizing module is used for optimizing the text to be processed through a program interface based on the optimizing mode, wherein the program interface is a program interface for calling the text editing application.
9. A text optimization device, the text optimization device comprising: a memory, a processor, and a text optimization program stored on the memory and executable on the processor, the text optimization program configured to implement the text optimization method of any one of claims 1-7.
10. A storage medium having stored thereon a text optimization program which, when executed by a processor, implements the text optimization method of any one of claims 1 to 7.
CN202311303646.2A 2023-10-09 2023-10-09 Text optimization method, device, equipment and storage medium Pending CN117973326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311303646.2A CN117973326A (en) 2023-10-09 2023-10-09 Text optimization method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311303646.2A CN117973326A (en) 2023-10-09 2023-10-09 Text optimization method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117973326A true CN117973326A (en) 2024-05-03

Family

ID=90861967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311303646.2A Pending CN117973326A (en) 2023-10-09 2023-10-09 Text optimization method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117973326A (en)

Similar Documents

Publication Publication Date Title
US10599767B1 (en) System for providing intelligent part of speech processing of complex natural language
US11210468B2 (en) System and method for comparing plurality of documents
CN112016310A (en) Text error correction method, system, device and readable storage medium
Thompson et al. A generative model for semantic role labeling
US6684201B1 (en) Linguistic disambiguation system and method using string-based pattern training to learn to resolve ambiguity sites
CN103154936B (en) For the method and system of robotization text correction
KR20190133931A (en) Method to response based on sentence paraphrase recognition for a dialog system
Suleman et al. Extending latent semantic analysis to manage its syntactic blindness
US20230069935A1 (en) Dialog system answering method based on sentence paraphrase recognition
US9141601B2 (en) Learning device, determination device, learning method, determination method, and computer program product
CN112084769B (en) Dependency syntax model optimization method, apparatus, device and readable storage medium
Lee Natural Language Processing: A Textbook with Python Implementation
CN118278543A (en) Answer evaluation model training method, evaluation method, device, equipment and medium
Aliero et al. Systematic review on text normalization techniques and its approach to non-standard words
CN113935331A (en) Abnormal semantic truncation detection method, device, equipment and medium
CN113705207A (en) Grammar error recognition method and device
Mekki et al. COTA 2.0: An automatic corrector of Tunisian Arabic social media texts
Potter A survey of knowledge acquisition from natural language
Schneider et al. Statistics for Linguists: A patient, slow-paced introduction to statistics and to the programming language R
CN113157932B (en) Metaphor calculation and device based on knowledge graph representation learning
CN116483314A (en) Automatic intelligent activity diagram generation method
CN117973326A (en) Text optimization method, device, equipment and storage medium
Aparna et al. A review on different approaches of pos tagging in NLP
CN116737913B (en) Reply text generation method, device, equipment and readable storage medium
Liu et al. Automatic Chinese knowledge-based question answering by the MGBA-LSTM-CNN model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination