CN113705207A - Grammar error recognition method and device - Google Patents

Grammar error recognition method and device Download PDF

Info

Publication number
CN113705207A
CN113705207A CN202110282569.1A CN202110282569A CN113705207A CN 113705207 A CN113705207 A CN 113705207A CN 202110282569 A CN202110282569 A CN 202110282569A CN 113705207 A CN113705207 A CN 113705207A
Authority
CN
China
Prior art keywords
sentence
recognized
grammar
error
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110282569.1A
Other languages
Chinese (zh)
Inventor
吴嫒博
刘萌
蔡晓凤
叶礼伟
滕达
覃伟枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110282569.1A priority Critical patent/CN113705207A/en
Publication of CN113705207A publication Critical patent/CN113705207A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the application provides a grammar error recognition method and a grammar error recognition device, and relates to the technical field of artificial intelligence. The grammar error recognition method in the embodiment of the application comprises the steps of obtaining a sentence to be recognized; generating a sentence characteristic vector corresponding to the sentence to be recognized based on the sentence to be recognized, wherein the sentence characteristic vector comprises a word vector corresponding to each vocabulary contained in the sentence to be recognized and a position characteristic vector of each vocabulary contained in the sentence to be recognized at the position of the sentence to be recognized; predicting a target transformation category label based on the sentence characteristic vector corresponding to the sentence to be recognized, wherein the target transformation category label is a transformation category label of each vocabulary contained in the sentence to be recognized when the vocabulary is transformed into a grammar correct sentence corresponding to the sentence to be recognized; and determining the grammar error type of the sentence to be recognized based on the sentence to be recognized and the target transformation class label. According to the technical scheme, the accuracy of grammar error recognition is improved.

Description

Grammar error recognition method and device
Technical Field
The application relates to the technical field of computers, in particular to a grammar error recognition method and device.
Background
With the development of internet technology, grammar error recognition is an important branch in natural language processing, and the main task of the grammar error recognition is to detect whether grammar errors exist in a segment of characters or not and automatically correct the detected grammar errors.
The grammar error recognition mode proposed in the related technology is mainly characterized in that features are extracted manually, a machine learning model is trained according to expert experience, and the trained machine learning model can be used for grammar error recognition. In a grammar error recognition mode in the related technology, different machine learning models are required to be trained respectively to recognize grammar errors aiming at different error types, and a universal unified model is not available; in addition, the features extracted by the machine learning model are shallow features, so that the accuracy of grammar error recognition by the machine learning model is low.
Disclosure of Invention
The embodiment of the application provides a grammar error recognition method and device, which can improve the accuracy of grammar error recognition.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, there is provided a syntax error recognition method, including: obtaining a sentence to be identified; generating a sentence characteristic vector corresponding to the sentence to be recognized based on the sentence to be recognized, wherein the sentence characteristic vector comprises word vectors corresponding to all vocabularies contained in the sentence to be recognized and position characteristic vectors of all vocabularies contained in the sentence to be recognized at the position of the sentence to be recognized; predicting a target transformation category label based on the sentence characteristic vector corresponding to the sentence to be recognized, wherein the target transformation category label is a transformation category label of each vocabulary contained in the sentence to be recognized when the vocabulary is transformed into a grammar correct sentence corresponding to the sentence to be recognized; and determining the grammar error type of the sentence to be recognized based on the sentence to be recognized and the target transformation class label.
According to an aspect of an embodiment of the present application, there is provided a syntax error recognition apparatus including: the first acquisition unit is used for acquiring the sentence to be identified; a first generating unit, configured to generate, based on the to-be-recognized sentence, a sentence feature vector corresponding to the to-be-recognized sentence, where the sentence feature vector includes a word vector corresponding to each vocabulary included in the to-be-recognized sentence and a position feature vector of each vocabulary included in the to-be-recognized sentence at a position of the to-be-recognized sentence; the prediction unit is used for predicting a target transformation category label based on the sentence characteristic vector corresponding to the sentence to be recognized, wherein the target transformation category label is a transformation category label of each vocabulary contained in the sentence to be recognized when the vocabulary is transformed into a grammar correct sentence corresponding to the sentence to be recognized; and the grammar error determining unit is used for determining the grammar error type of the sentence to be recognized based on the sentence to be recognized and the target transformation class label.
In some embodiments of the present application, based on the foregoing scheme, the syntax error determination unit includes: a grammar correct sentence determining subunit, configured to determine, based on the target transformation class label and the sentence to be recognized, a grammar correct sentence corresponding to the sentence to be recognized; and the grammar error recognition subunit is used for carrying out grammar error classification recognition on the sentences to be recognized based on the grammar correct sentences and preset grammar error classification rules and determining grammar error types of the sentences to be recognized.
In some embodiments of the present application, based on the foregoing scheme, the prediction unit includes: the fusion subunit is used for performing fusion processing on the sentence characteristic vector corresponding to the sentence to be identified to generate a semantic characteristic vector containing context semantic information; the full-connection subunit is used for performing full-connection processing on the semantic feature vector to obtain a processed semantic feature vector; and the predicting subunit is used for predicting the target transformation class label based on the processed semantic feature vector.
In some embodiments of the present application, based on the foregoing scheme, the prediction subunit includes: a generating module, configured to perform normalization processing on the processed semantic feature vector, and generate a probability corresponding to a candidate transformation category label when each vocabulary included in the sentence to be recognized is transformed into a grammatical correct sentence corresponding to the sentence to be recognized; and the selecting module is used for selecting the candidate transformation category label with the maximum probability as the target transformation category label.
In some embodiments of the application, based on the foregoing scheme, the syntax error identification subunit is specifically configured to, if the probability corresponding to the target transformation class label is higher than a predetermined probability threshold, perform syntax error classification identification on the sentence to be identified based on the syntax correct sentence and a preset syntax error classification rule, and determine the syntax error type of the sentence to be identified.
In some embodiments of the present application, based on the foregoing scheme, the syntax error recognition apparatus further includes: and the error correction suggestion generation unit is used for generating error correction suggestion information corresponding to the sentence to be recognized based on the grammar error type of the sentence to be recognized and the sentence to be recognized.
In some embodiments of the present application, based on the foregoing scheme, the target transformation class labels and the sentence feature vectors corresponding to the sentences to be recognized are generated through a pre-trained machine learning model.
In some embodiments of the present application, based on the foregoing scheme, the syntax error recognition apparatus further includes: the second obtaining unit is used for obtaining training set sample data used for training a machine learning model to be trained, wherein each piece of sample data in the training set sample data comprises a sample statement pair, and the sample statement pair comprises a grammar error sample statement and a grammar correct sample statement; the word segmentation unit is used for performing word segmentation processing on the grammar error sample sentences and the grammar correct sample sentences respectively to obtain first word segmentation results corresponding to the grammar error sample sentences and second word segmentation results corresponding to the grammar correct sample sentences; a second generating unit, configured to generate a statement feature vector corresponding to the syntax error sample statement based on the first segmentation result; a third generating unit, configured to generate a sample transformation type tag based on the first and second segmentation results, where the sample transformation type tag is a transformation type tag of each vocabulary included in the sample sentence with the wrong syntax when transforming into the sample sentence with the correct syntax; and the training unit is used for training the machine learning model to be trained through the sample transformation class labels and the sentence characteristic vectors corresponding to the grammar error sample sentences to obtain the pre-trained machine learning model.
In some embodiments of the present application, based on the foregoing scheme, the third generating unit includes: an editing subunit, configured to, for each vocabulary included in the first segmentation result, perform editing processing on the vocabulary so that the vocabulary is converted into a vocabulary included in the second segmentation result; an edit distance determining subunit, configured to determine an edit distance between the first segmentation result and the second segmentation result based on an edit processing category in which the vocabulary is edited; and the selection subunit is used for selecting the editing processing category of the vocabulary under the premise of the minimum editing distance for editing processing as the transformation category label.
According to an aspect of embodiments of the present application, there is provided a computer-readable medium on which a computer program is stored, the computer program, when executed by a processor, implementing a syntax error recognition method as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the syntax error recognition method as described in the above embodiments.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the syntax error recognition method provided in the various alternative embodiments described above.
In some embodiments of the present application, a sentence feature vector corresponding to a sentence to be recognized is generated based on the sentence to be recognized, and a target transformation category tag is predicted based on the sentence feature vector corresponding to the sentence to be recognized, where the target transformation category tag is a transformation category tag of each vocabulary contained in the sentence to be recognized when being transformed into a grammatical correct sentence corresponding to the sentence to be recognized, and a grammar error type of the sentence to be recognized is determined based on the sentence to be recognized and the target transformation category tag, and compared with a case where a grammar error contained in the sentence to be recognized is directly recognized through a feature contained in the sentence to be recognized, by determining a transformation category tag of each vocabulary contained in the sentence to be recognized when being transformed into a grammatical correct sentence corresponding to the sentence to be recognized, since the transformation category tag can reflect the grammar error existing in the sentence to be recognized more significantly, and further, the accuracy of grammar error recognition of the sentence to be recognized can be effectively improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
FIG. 2 shows a flow diagram of a grammar error recognition method according to an embodiment of the present application.
FIG. 3 shows a flow diagram of a grammar error recognition method according to an embodiment of the present application.
Fig. 4 shows a detailed flowchart of step S340 of the syntax error recognition method according to an embodiment of the present application.
Fig. 5 shows a detailed flowchart of step S230 of the syntax error recognition method according to an embodiment of the present application.
FIG. 6 illustrates a structural schematic of a pre-trained machine learning model according to one embodiment of the present application.
Fig. 7 shows a detailed flowchart of step S530 of the syntax error recognition method according to an embodiment of the present application.
Fig. 8 shows a detailed flowchart of step S240 of the syntax error recognition method according to an embodiment of the present application.
Fig. 9 shows an overall flow diagram for composition correction according to an embodiment of the present application.
FIG. 10 illustrates an interface diagram for entering text or pictures containing composition according to one embodiment of the present application.
FIG. 11 is a schematic overall flow diagram of training a pre-trained machine learning model according to one embodiment of the present application.
Fig. 12 is a schematic interface diagram illustrating a syntax error recognition result obtained by performing syntax error recognition on a composition according to an embodiment of the present application.
FIG. 13 shows a block diagram of a syntax error recognition apparatus according to an embodiment of the present application.
FIG. 14 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning. For example, in the embodiment of the present application, the sentence feature vector corresponding to the sentence to be recognized is input to the pre-trained machine learning model, and the machine learning model determines the target transformation category label when each vocabulary included in the sentence to be recognized is transformed into the sentence with correct grammar corresponding to the sentence to be recognized, and further determines the grammar error type of the sentence to be recognized according to the determined target transformation category label.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
As shown in fig. 1, the system architecture may include a client 101, a network 102, and a server 103. The client 101 and the server 103 are connected via a network 102, and perform data interaction based on the network 102, and the network may include various connection types, such as a wired communication link, a wireless communication link, and the like.
It should be understood that the number of clients 101, networks 102, and servers 103 in fig. 1 is merely illustrative. There may be any number of clients 101, networks 102, and servers 103, as desired for an implementation. For example, the server 103 may be a server providing a syntax error recognition service, or may be a server cluster formed by servers, and is not limited herein. The client 101 is a client corresponding to the server 103, and the client may be one or more of a mobile phone, a tablet, a portable computer, and a desktop computer, but is not limited thereto.
The client 101 acquires a sentence to be identified; generating a sentence characteristic vector corresponding to the sentence to be recognized based on the sentence to be recognized, wherein the sentence characteristic vector comprises word vectors corresponding to all vocabularies contained in the sentence to be recognized and position characteristic vectors of all vocabularies contained in the sentence to be recognized at the position of the sentence to be recognized; predicting a target transformation category label based on the sentence characteristic vector corresponding to the sentence to be recognized, wherein the target transformation category label is a transformation category label of each vocabulary contained in the sentence to be recognized when the vocabulary is transformed into a grammar correct sentence corresponding to the sentence to be recognized; and determining the grammar error type of the sentence to be recognized based on the sentence to be recognized and the target transformation class label.
Compared with the method for directly recognizing the grammar error contained in the sentence to be recognized through the characteristics contained in the sentence to be recognized, the method for recognizing the grammar error of the sentence to be recognized has the advantages that the grammar error of the sentence to be recognized can be more obviously reflected due to the transformation class label by firstly determining the transformation class label when each vocabulary contained in the sentence to be recognized is transformed into the grammar correct sentence corresponding to the sentence to be recognized, and the accuracy of recognizing the grammar error of the sentence to be recognized can be effectively improved.
It should be noted that the syntax error recognition method provided in the embodiment of the present application is generally executed by the client 101, and accordingly, the syntax error recognition apparatus is generally disposed in the client 101. However, in other embodiments of the present application, the server 103 may also have similar functions as the client 101, so as to execute the scheme of the syntax error recognition method provided in the embodiments of the present application. The details of implementation of the technical solution of the embodiments of the present application are set forth in the following.
Fig. 2 shows a flow diagram of a syntax error recognition method according to an embodiment of the present application, which may be performed by a client, which may be the client 101 shown in fig. 1. Referring to fig. 2, the syntax error recognition method at least includes steps S210 to S240, which are described in detail below.
In step S210, a sentence to be recognized is acquired.
In one embodiment of the present application, the sentence to be recognized refers to a single sentence information that needs to be subjected to syntax error recognition, such as "I eat applets yesterday. The sentence to be recognized can be obtained from a text or a picture which is input to the client and contains the sentence information. If the input is a picture, the input picture can be processed through Optical Character Recognition (OCR), and Character information included in the picture is determined as a sentence to be recognized.
It can be understood that when a text or a picture input to the client contains a plurality of sentences to be recognized, for the text, sentence division processing can be directly performed through specific characters, for example, sentence division processing can be performed on the text through specific characters such as punctuation marks to obtain a plurality of sentences to be recognized. For a picture, OCR optical character recognition needs to be performed on an input picture to obtain a recognition result, and then sentence division processing is performed on the recognition result to obtain a plurality of sentences to be recognized.
In step S220, a sentence feature vector corresponding to the sentence to be recognized is generated based on the sentence to be recognized, where the sentence feature vector includes a word vector corresponding to each vocabulary included in the sentence to be recognized and a position feature vector of a position of each vocabulary included in the sentence to be recognized.
In an embodiment of the present application, a sentence feature vector corresponding to a sentence to be recognized is used as feature information generated according to the sentence to be recognized, and the sentence feature vector may reflect characteristic information of the sentence to be recognized. The sentence feature vector comprises a word vector corresponding to each vocabulary contained in the sentence to be recognized and a position feature vector of each vocabulary contained in the sentence to be recognized at the position of the sentence to be recognized.
When generating word vectors corresponding to words contained in the sentence to be recognized, the sentence to be recognized may be subjected to word segmentation to obtain word segmentation results corresponding to the sentence to be recognized, the word segmentation results include the words obtained by performing word segmentation on the sentence to be recognized, and further, the word vectors corresponding to the words are generated according to the words in the word segmentation results. When generating Word vectors corresponding to each vocabulary, the generation may be implemented by using a pre-trained machine learning model, which may be a Word2vec Word vector calculation model or a GloVe Word vector model, and the like, and is not limited herein.
When the position feature vector of each vocabulary contained in the sentence to be recognized at the position of the sentence to be recognized is generated, the position feature vector can be obtained by performing position coding processing on each vocabulary contained in the sentence to be recognized at the position of the sentence to be recognized. The Position Encoding process may be performed by generating a corresponding word vector according to the Position information of each vocabulary contained in the sentence to be recognized, or generating a Position feature vector by using a Sinusoidal Position Encoding (Sinusoidal Position Encoding) algorithm, which is not limited herein.
In step S230, a target transformation class label is predicted based on the term feature vector corresponding to the term to be recognized, where the target transformation class label is a transformation class label when each vocabulary included in the term to be recognized is transformed into a grammatically correct term corresponding to the term to be recognized.
In an embodiment of the present application, the transformation class label refers to a class label corresponding to a transformation operation performed on a vocabulary contained in an error sentence during a process of transforming the error sentence into a correct sentence, and may include a basic operation transformation label and a special operation transformation label.
Specifically, the basic operation transformation tags include (keep) reserved operation tags, (delete) deleted operation tags, (replace) replaced operation tags, and (add) inserted operation tags. Taking the error sentence as "I has a applet", and the correct sentence as "I have applet", as an example, the transformation category tag corresponding to the transformation operation performed by the word "I" in the error sentence is the reserved operation tag, and the transformation category tag corresponding to the transformation operation performed by the word "a" in the error sentence is the delete operation tag.
The special operation transformation tags comprise verb tense transformation operation tags, noun single-complex number transformation operation tags and the like. Taking the error sentence as "I eat applet yesterday", the correct sentence as "I ate applet yesterday", and the transformation category tag corresponding to the transformation operation performed by the word "eat" in the error sentence is a verb tense transformation operation tag.
It will be appreciated that the verb tense transformation operation tags may also be subdivided into more elaborate operation transformation tags such as "tense now", "verb indeterminate", "verb past form", "verb past complete form", and so on.
In one embodiment, the prediction of the target transformation class label may be implemented by a pre-trained machine learning model based on the sentence feature vector corresponding to the sentence to be recognized. The pre-trained machine learning model is obtained by training sample data of a sample sentence pair comprising a grammar error sample sentence and a grammar correct sample sentence.
It should be noted that, when the machine learning model is trained, it is necessary to generate a sentence feature vector corresponding to a sentence to be recognized according to the sentence to be recognized, generate a data preprocessing process of the sample transformation class label according to an error sample sentence and a grammar correct sample sentence, and train the machine learning model according to data obtained in the data preprocessing process. The pre-trained machine learning model can be a BERT model, a RoBERTA model, an xlnet model and the like.
Referring to fig. 3, fig. 3 shows a flowchart of a syntax error recognition method according to an embodiment of the present application, and the syntax error recognition method in this embodiment may further include step S310 to step S350, which is described in detail as follows.
In step S310, training set sample data for training a machine learning model to be trained is obtained, where each sample data in the training set sample data includes a sample sentence pair, and the sample sentence pair includes a grammar error sample sentence and a grammar correct sample sentence.
In one embodiment of the present application, the training set sample data is data for training a machine learning model to be trained, the training set sample data comprising a plurality of sample sentence pairs. The sample sentence pair is a sentence pair composed of a grammar error sample sentence and a grammar correct sample sentence corresponding to the grammar error sample sentence. The sample sentence pairs may be obtained from a topic database of an existing grammar practice question, or may be obtained from another database containing a large number of sample sentence pairs, which is not limited herein.
In step S320, the syntax error sample sentence and the syntax correct sample sentence are respectively subjected to word segmentation processing, so as to obtain a first word segmentation result corresponding to the syntax error sample sentence and a second word segmentation result corresponding to the syntax correct sample sentence.
In an embodiment of the application, for each sample sentence, the syntax error sample sentence and the syntax correct sample sentence included in the sample sentence may be subjected to word segmentation processing respectively, so as to obtain a first word segmentation result corresponding to the syntax error sample sentence and a second word segmentation result corresponding to the syntax correct sample sentence.
In step S330, a sentence feature vector corresponding to the syntax error sample sentence is generated based on the first word segmentation result.
In an embodiment of the present application, when generating a sentence feature vector corresponding to a sample sentence with a grammatical error based on a first Word segmentation result, the generation may be specifically implemented by using a pre-trained machine learning model, where the machine learning model may be a Word2vec Word vector calculation model, or a GloVe Word vector model, and the like, and is not limited herein.
In step S340, a sample transformation type label is generated based on the first and second word segmentation results, where the sample transformation type label is a transformation type label of each word included in the sample sentence with the wrong syntax when the word is transformed into the sample sentence with the correct syntax.
In an embodiment of the present application, the sample transformation class tag refers to a transformation class tag of each vocabulary contained in the syntax error sample sentence when the vocabulary is transformed into the syntax-correct sentence sample, and the transformation class tags are preset as [ "I has applets.", "I" -holding operation tag, "has" -verb tense transformation operation tag, "applets" -holding operation tag, "" - "holding operation tag, for example, the syntax error sample sentence is" I has applets. ", and the syntax-correct sample sentence is" I have applets.
In one embodiment, the process of determining the sample transformation class labels may be pre-fabricated by a human.
In one embodiment, when generating the sample transformation type tag, the first segmentation result and the second segmentation result may also be processed by an alignment algorithm based on the edit distance, so as to generate the transformation type tag of each vocabulary included in the sample sentence with the wrong syntax when transforming into the sample sentence with the correct syntax.
Referring to fig. 4, fig. 4 shows a detailed flowchart of step S340 of the syntax error recognition method according to an embodiment of the present application, and the step S340 may include steps S410 to S430, which are described in detail as follows.
In step S410, for each vocabulary included in the first segmentation result, the vocabulary is edited so that the vocabulary is converted into a vocabulary included in the second segmentation result.
In one embodiment of the present application, the object targeted when determining the sample transformation class label by an alignment algorithm based on edit distance is a vocabulary. Specifically, each vocabulary included in the first segmentation result is aligned with a vocabulary included in the second segmentation result, and the vocabulary in the first segmentation result is edited between the aligned two vocabularies so as to be converted into a vocabulary aligned with the second segmentation result. Performing the editing process on the vocabulary may include a (keep) save operation, a (delete) delete operation, a (replace) replace operation, an add) insert operation, a verb tense transform operation, and a noun single-complex transform operation, among others.
In step S420, the edit distance between the first and second segmentation results is determined based on the edit processing category of the vocabulary-based edit processing.
In one embodiment of the present application, the edit distance between the first segmentation result and the second segmentation result refers to a sum of edit distances corresponding to edit processing categories for each vocabulary in the first segmentation result to perform edit processing.
It can be understood that, for the vocabulary of the (keep) operation, the edit distance value corresponding to the edit processing category may be considered to be 0; for other editing processing categories such as delete) delete operation, replace operation, add operation, verb tense transformation operation, and noun single-complex transformation operation, the editing distance value corresponding to the editing processing category may be considered to be a predetermined normal number.
When the editing processing category for performing editing processing on each vocabulary in the first segmentation result is determined, the editing distance between the first segmentation result and the second segmentation result can be determined according to the sum of the editing distance values corresponding to the editing processing categories.
In step S430, an editing process category in which the vocabulary in the minimum editing distance is edited is selected as the conversion category tag.
In one embodiment of the present application, for the editing process category in which each vocabulary in the first segmentation result is subjected to editing process, the editing process category in which the vocabulary is subjected to editing process on the premise of the minimum editing distance is used as the transformation category tag.
Still referring to fig. 3, in step S350, the machine learning model to be trained is trained through the sample transformation class labels and the sentence feature vectors corresponding to the sample sentences with syntax errors, so as to obtain a pre-trained machine learning model.
In an embodiment of the application, after the sentence feature vector and the sample transformation class label corresponding to the grammatical error sample sentence are obtained, the machine learning model to be trained is trained according to the sentence feature vector and the sample transformation class label corresponding to the grammatical error sample sentence, so as to obtain a pre-trained machine learning model. The process of training the machine learning model is to adjust each coefficient in the network layer corresponding to the machine learning model, so that the sentence characteristic vector corresponding to the input sentence to be recognized is operated by each coefficient in the network layer corresponding to the machine learning model, and the target transformation category label of each vocabulary contained in the sentence to be recognized when the vocabulary is transformed into the grammar correct sentence corresponding to the sentence to be recognized is output.
Referring to fig. 5, fig. 5 is a detailed flowchart illustrating step S230 of a syntax error recognition method according to an embodiment of the present application, where step S230 may specifically include step S510 to step S530.
Referring to fig. 6, fig. 6 illustrates a schematic structural diagram of a pre-trained machine learning model according to an embodiment of the present application, and the pre-trained machine learning model illustrated in fig. 6 may specifically include a transformer network layer 602, fully connected layers (full connected layers)603, and a Softmax network layer 604. Step S510 to step S530 are described in detail below with reference to fig. 5 and 6.
In step S510, a sentence feature vector corresponding to the sentence to be recognized is subjected to fusion processing, and a semantic feature vector including context semantic information is generated.
In an embodiment of the present application, as shown in fig. 6, the input data 601 is a sentence feature vector corresponding to a sentence to be recognized, where the sentence feature vector specifically includes a word vector corresponding to each vocabulary included in the sentence to be recognized and a position feature vector of a position of each vocabulary included in the sentence to be recognized. After the input data 601 is input to the pre-trained machine learning model, the machine learning model performs fusion processing on the sentence feature vectors corresponding to the sentences to be recognized through the transformer network layer 602 to generate semantic feature vectors containing context semantic information.
the transformer network layer 602 performs fusion processing on the sentence feature vectors corresponding to the sentences to be recognized to generate a semantic feature vector, where the semantic feature vector is a feature vector containing context semantic information. the transform network layer 602 includes a self-attention (self-attention) network layer, and the self-attention network layer may fully extract a relationship between each vocabulary, thereby achieving full extraction of a context semantic relationship between vocabularies included in a sentence to be recognized, and further generating a semantic feature vector including context semantic information.
In step S520, the semantic feature vector is subjected to full concatenation processing to obtain a processed semantic feature vector.
In an embodiment of the present application, for a semantic feature vector obtained after being processed by the transform network layer 602, the machine learning model performs full connection processing on the semantic feature vector through a full connected layers (full connected layers)603, so as to obtain a processed semantic feature vector. The process of fully connecting the semantic feature vectors is actually a process of performing convolution calculation on the semantic feature vectors, and is used for realizing dimension reduction on the semantic feature vectors, and the dimension reduction on the semantic feature vectors can further dig out semantic relations in the sentences to be recognized so as to predict target transformation category labels of the sentences to be recognized when the sentences are transformed into grammatically correct sentences according to the semantic feature vectors.
In step S530, a target transform class label is predicted based on the processed semantic feature vector.
In an embodiment of the present application, for the processed semantic feature vectors obtained through the fully connected layers (full connected layers)603, the machine learning model performs classification prediction on the processed semantic feature vectors through the Softmax network layer 604, and obtains a target transformation category tag through prediction, that is, a transformation category tag 605 when each vocabulary included in the sentence to be recognized is transformed into a grammar-correct sentence corresponding to the sentence to be recognized is obtained.
Referring to fig. 7, fig. 7 is a specific flowchart illustrating step S530 of the syntax error recognition method according to an embodiment of the present application, where step S530 in this embodiment may specifically include step S710 to step S720, which is described in detail as follows.
In step S710, the semantic feature vectors after the processing are normalized to generate probabilities corresponding to candidate transformation class labels when each vocabulary included in the sentence to be recognized is transformed into a grammatically correct sentence corresponding to the sentence to be recognized.
In an embodiment of the present application, when the machine learning model performs classification prediction on the processed semantic feature vector through the Softmax network layer 604, specifically, the processed semantic feature vector may be normalized through a preset Softmax function, where a result of the normalization processing includes probabilities corresponding to various candidate transformation category labels corresponding to transformation operations performed on words included in a sentence to be recognized when the sentence to be recognized is transformed into a sentence with a correct grammar, and for each word, a probability corresponding to each candidate transformation category label corresponding to the transformation operation performed on the word is a real number between 0 and 1, and a sum of the probabilities corresponding to each candidate transformation category label is 1.
In step S720, the candidate transformation category label with the highest probability is selected as the target transformation category label when each vocabulary included in the sentence to be recognized is transformed into the grammatical correct sentence corresponding to the sentence to be recognized.
In an embodiment of the present application, the candidate transformation category label with the highest probability is selected as a target transformation category label when each vocabulary contained in the sentence to be recognized is transformed into the grammar correct sentence corresponding to the sentence to be recognized, so that the target transformation category label when each vocabulary contained in the sentence to be recognized is transformed into the grammar correct sentence corresponding to the sentence to be recognized can be predicted.
In the technical solutions of the embodiments shown in fig. 5 and fig. 7, the pre-trained machine learning model can accurately determine the target transformation category labels corresponding to the transformation operations performed on the vocabularies included in the sentence to be recognized when the sentence to be recognized is transformed into the sentence with correct grammar, which is beneficial to improving the accuracy of performing grammar misrecognition.
Still referring to fig. 2, in step S240, a syntax error type of the sentence to be recognized is determined based on the sentence to be recognized and the target transformation class label.
In an embodiment of the application, after the target transformation category tag is obtained, the grammar error of the sentence to be recognized may be analyzed according to the sentence to be recognized and the determined target transformation category tag, so as to determine the grammar error type of the sentence to be recognized.
Optionally, when the syntax error type of the sentence to be recognized is determined based on the sentence to be recognized and the target transformation category tag, the syntax error type of the sentence to be recognized may be determined according to the sentence to be recognized and a preset corresponding relationship between the target transformation category tag and the syntax error type.
Referring to fig. 8, fig. 8 shows a detailed flowchart of step S240 of the syntax error recognition method according to an embodiment of the present application, and the step S240 may include steps S810 to S820, which are described in detail as follows.
In step S810, a grammatical correct sentence corresponding to the sentence to be recognized is determined based on the target transformation category tag and the sentence to be recognized.
In an embodiment of the application, when the grammar error type of the sentence to be recognized is determined based on the target transformation category tag and the sentence to be recognized, a grammar correct sentence corresponding to the sentence to be recognized is determined to be generated firstly based on the target transformation category tag and the sentence to be recognized. Specifically, a transformation operation corresponding to the target transformation category label may be performed on each vocabulary included in the sentence to be recognized, and a transformation result obtained after the transformation operation is performed may be used as a grammar-correct sentence corresponding to the sentence to be recognized.
In step S820, based on the grammar correct sentence and the preset grammar error classification rule, performing grammar error classification and recognition on the sentence to be recognized, and determining a grammar error type of the sentence to be recognized.
In one embodiment of the present application, the grammar error type refers to a grammar error category existing in the sentence to be recognized, and may include article or qualifier error, verb tense error, cardinal congruence error, preposition collocation error, noun error, and the like.
The grammar error classification rule includes a plurality of rules for performing grammar error classification recognition, and specifically may include a grammar error rule for determining a grammar error based on a part of speech of a vocabulary, a grammar error rule for determining a grammar error based on a target transformation class tag of a vocabulary, and a grammar error rule for determining a grammar error based on a vocabulary exchange position. If the sentence to be recognized is "I eat applet yesterday", and the grammar correct sentence corresponding to the sentence to be recognized is "I ate applet yesterday", the grammar error rule for judging the grammar error based on the part of speech of the vocabulary can determine that the sentence to be recognized is to write the verb past tense "ate" into the general present tense "eat", so that the grammar error type of the sentence to be recognized includes a verb tense error.
In an embodiment of the present application, step S820 may specifically include: and if the probability corresponding to the target transformation category label is higher than a preset probability threshold, carrying out grammar error classification recognition on the sentence to be recognized based on the grammar correct sentence and a preset grammar error classification rule, and determining the grammar error type of the sentence to be recognized.
In an embodiment of the application, the pre-trained machine learning model may further output a probability corresponding to the target transformation class tag, and when it is determined whether the sentence to be recognized has a syntax error, the probability corresponding to the target transformation class tag may further be compared with a predetermined probability threshold, where the predetermined probability threshold is a preset error threshold for determining whether the sentence to be recognized has the syntax error. When the probability corresponding to the target transformation category label is higher than the preset probability threshold, the sentence to be recognized is considered to have a grammar error, and by setting the error threshold, the situation that the sentence without the grammar error is recognized to have the grammar error can be effectively avoided, and the accuracy of performing grammar error recognition on the sentence to be recognized is improved.
As can be seen from the above, by generating a sentence feature vector corresponding to a sentence to be recognized based on the sentence to be recognized, and predicting a target transformation category tag based on the sentence feature vector corresponding to the sentence to be recognized, where the target transformation category tag is a transformation category tag of each vocabulary contained in the sentence to be recognized when being transformed into a grammar-correct sentence corresponding to the sentence to be recognized, and determining a grammar error type of the sentence to be recognized based on the sentence to be recognized and the target transformation category tag, compared with directly recognizing a grammar error contained in the sentence to be recognized through a feature contained in the sentence to be recognized, by determining the transformation category tag of each vocabulary contained in the sentence to be recognized when being transformed into a grammar-correct sentence corresponding to the sentence to be recognized, since the transformation category tag can reflect the grammar error of the sentence to be recognized more significantly, and further, the accuracy of grammar error recognition of the sentence to be recognized can be effectively improved.
In an embodiment of the present application, after the step S240 of determining the syntax error type of the sentence to be recognized based on the target transformation class label, the syntax error recognition method in this embodiment further includes: and generating error correction suggestion information corresponding to the sentence to be recognized based on the grammar error type of the sentence to be recognized and the sentence to be recognized.
In one embodiment, after the grammar error type of the sentence to be recognized is determined, error correction suggestion information corresponding to the sentence to be recognized is generated based on the grammar error type of the sentence to be recognized and the sentence to be recognized, and the error correction suggestion information is information describing the grammar error type of the sentence to be recognized and a modification mode corresponding to each grammar error type. If the sentence to be recognized is "I eat applets yesterday" and the grammar error type corresponding to the sentence to be recognized is verb tense error, the generated error correction suggestion information corresponding to the sentence to be recognized is "[ verb tense error ] suspected eat error, and the sentence to be recognized should be replaced by ate. "
Through the generated error correction suggestion information, the suggestion for correcting the grammar error of the sentence to be recognized can be obtained in time, so that the purpose of learning a mathematical method or the purpose of modifying the text can be achieved in scenes such as language learning or text modification.
The following describes a process of the syntax error recognition method by taking a scenario of composition modification as an example.
Fig. 9 is a schematic overall flow chart illustrating composition correction according to an embodiment of the present application, and fig. 10 is a schematic interface diagram illustrating an embodiment of the present application for inputting text or pictures containing composition.
With reference to fig. 9 and fig. 10, when a composition is modified, a syntax error recognition method in the embodiment of the present application may be used to perform syntax error recognition on each sentence in the composition. Firstly, a user needs to input a composition to be corrected into a client side such as a smart phone and a notebook computer.
As shown in fig. 10, the user may input a composition containing a sentence to be recognized in a display interface provided by the client. Specifically, a composition that needs to be modified may be input in the form of a text in the text editing bar 1001 of the display interface, or the composition including the sentence to be recognized may be input in the form of a picture by clicking the picture input button 1002 (uploading a picture by one key) on the display interface. After entering composition in text or picture form, recognition of grammatical errors of the entered composition may be triggered by clicking on the approval button 1003.
After the composition content text is input to the client, the client performs sentence division processing on the composition content text through specific characters, for example, the text is subjected to sentence division processing through specific characters such as punctuations, and a plurality of sentences to be recognized are obtained. For a picture, the client needs to perform OCR optical character recognition on an input picture to obtain a recognition result containing a composition content text, and then perform sentence division processing on the recognition result to obtain a plurality of sentences to be recognized.
The client side is pre-stored with a pre-trained machine learning model, and a plurality of sentences to be recognized are input into the pre-trained machine learning model. The pre-trained machine learning model may specifically include a transform network layer, a full connection layer, and a Softmax network layer, where the network layer is configured to process a plurality of sentences to be recognized, and a result output by the pre-trained machine learning model after processing is a transformation category label when each vocabulary included in a sentence to be recognized is transformed into a sentence with correct grammar corresponding to the sentence to be recognized.
FIG. 11 illustrates an overall flow diagram for training a pre-trained machine learning model according to one embodiment of the present application. The pre-trained machine learning model pre-stored in the client needs to be trained, and when the pre-trained machine learning model is trained, a large number of sample data pairs need to be acquired, wherein the sample statement pairs refer to statement pairs formed by grammar error sample statements and grammar correct sample statements corresponding to the grammar error sample statements. The sample sentence pairs can be obtained from the existing question database of the grammar practice problems, or can be obtained from other databases containing a large number of sample sentence pairs. And aiming at the sample sentence pair, determining a transformation class label of each vocabulary contained in the grammar error sample sentence when the vocabulary is transformed into a grammar correct sentence sample, generating training sample data input to a pre-trained machine learning model based on the sample sentence pair and the transformation class label of each vocabulary contained in the grammar error sample sentence when the vocabulary is transformed into the grammar correct sentence sample, and performing multiple rounds of iterative training on the pre-trained machine learning model through the training sample data to be used for processing the sentence to be recognized when the training is completed.
And the client acquires a result output by the pre-trained machine learning model, namely a transformation class label when a grammar corresponding to the sentence to be recognized is correct. On the premise that the sentence to be recognized has a grammar error, the client generates a grammar error type, an error correction suggestion and an error probability of the sentence to be recognized based on the transformation class label when the grammar corresponding to the sentence to be recognized is correct. It will be appreciated that the above error probability can be used to compare with a predetermined probability threshold, and only if the error probability is higher than the predetermined probability threshold, the grammar error type and the error correction suggestion for correcting the grammar error of the sentence to be recognized are output. The client displays the grammar error types and the error correction suggestions for correcting the grammar errors of the sentences to be recognized on the display interface, so that a user can conveniently view the sentences with the grammar errors, the grammar error types of the sentences with the grammar errors and the error correction suggestions corresponding to the sentences with the grammar errors in the composition.
Fig. 12 is a schematic interface diagram illustrating a syntax error recognition result obtained by performing syntax error recognition on a composition according to an embodiment of the present application, where as shown in the figure, "all syntax child with syntax compounding about the individual metrics" is a sentence 1201 with a syntax error, "verb tense error" is a syntax error type 1202 corresponding to the sentence with the syntax error, "an error may be used in the future, and here, it is suggested to replace the wild compounding with the compounding s. "is the error correction suggestion 1203 corresponding to the sentence with grammar error.
Compared with the method for directly identifying the grammar errors contained in the sentences in the composition through the characteristics contained in the sentences, the method for identifying the grammar errors in the composition can effectively improve the accuracy of identifying the grammar errors in the composition by firstly determining the transformation category labels when each vocabulary contained in the sentences in the composition is transformed into the grammar correct sentences corresponding to the sentences, and because the transformation category labels can more obviously reflect the grammar errors existing in the sentences in the composition.
The following describes embodiments of the apparatus of the present application, which can be used to perform the syntax error recognition method in the above embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method described above in the present application.
FIG. 13 shows a block diagram of a syntax error recognition apparatus according to an embodiment of the present application.
Referring to fig. 13, an apparatus 1300 according to an embodiment of the present application includes: a first obtaining unit 1310, a first generating unit 1320, a predicting unit 1330, and a syntax error determining unit 1340. The first obtaining unit 1310 is configured to obtain a statement to be recognized; a first generating unit 1320, configured to generate, based on the to-be-recognized sentence, a sentence feature vector corresponding to the to-be-recognized sentence, where the sentence feature vector includes a word vector corresponding to each vocabulary included in the to-be-recognized sentence and a position feature vector of each vocabulary included in the to-be-recognized sentence at a position of the to-be-recognized sentence; a predicting unit 1330, configured to predict, based on the sentence feature vector corresponding to the sentence to be recognized, a target transformation category tag, where the target transformation category tag is a transformation category tag of each vocabulary included in the sentence to be recognized when transforming into a grammatical correct sentence corresponding to the sentence to be recognized; a syntax error determining unit 1340, configured to determine a syntax error type of the sentence to be recognized based on the sentence to be recognized and the target transformation class label.
In some embodiments of the present application, based on the foregoing scheme, the syntax error determination unit 1340 includes: a grammar correct sentence determining subunit, configured to determine, based on the target transformation class label and the sentence to be recognized, a grammar correct sentence corresponding to the sentence to be recognized; and the grammar error recognition subunit is used for carrying out grammar error classification recognition on the sentences to be recognized based on the grammar correct sentences and preset grammar error classification rules and determining grammar error types of the sentences to be recognized.
In some embodiments of the present application, based on the foregoing scheme, the prediction unit 1330 includes: the fusion subunit is used for performing fusion processing on the sentence characteristic vector corresponding to the sentence to be identified to generate a semantic characteristic vector containing context semantic information; the full-connection subunit is used for performing full-connection processing on the semantic feature vector to obtain a processed semantic feature vector; and the predicting subunit is used for predicting the target transformation class label based on the processed semantic feature vector.
In some embodiments of the present application, based on the foregoing scheme, the prediction subunit includes: a generating module, configured to perform normalization processing on the processed semantic feature vector, and generate a probability corresponding to a candidate transformation category label when each vocabulary included in the sentence to be recognized is transformed into a grammatical correct sentence corresponding to the sentence to be recognized; and the selecting module is used for selecting the candidate transformation category label with the maximum probability as the target transformation category label.
In some embodiments of the application, based on the foregoing scheme, the syntax error identification subunit is specifically configured to, if the probability corresponding to the target transformation class label is higher than a predetermined probability threshold, perform syntax error classification identification on the sentence to be identified based on the syntax correct sentence and a preset syntax error classification rule, and determine the syntax error type of the sentence to be identified.
In some embodiments of the present application, based on the foregoing scheme, the syntax error recognition apparatus further includes: and the error correction suggestion generation unit is used for generating error correction suggestion information corresponding to the sentence to be recognized based on the grammar error type of the sentence to be recognized and the sentence to be recognized.
In some embodiments of the present application, based on the foregoing scheme, the target transformation class labels and the sentence feature vectors corresponding to the sentences to be recognized are generated through a pre-trained machine learning model.
In some embodiments of the present application, based on the foregoing scheme, the syntax error recognition apparatus further includes: the second obtaining unit is used for obtaining training set sample data used for training a machine learning model to be trained, wherein each piece of sample data in the training set sample data comprises a sample statement pair, and the sample statement pair comprises a grammar error sample statement and a grammar correct sample statement; the word segmentation unit is used for performing word segmentation processing on the grammar error sample sentences and the grammar correct sample sentences respectively to obtain first word segmentation results corresponding to the grammar error sample sentences and second word segmentation results corresponding to the grammar correct sample sentences; a second generating unit, configured to generate a statement feature vector corresponding to the syntax error sample statement based on the first segmentation result; a third generating unit, configured to generate a sample transformation type tag based on the first and second segmentation results, where the sample transformation type tag is a transformation type tag of each vocabulary included in the sample sentence with the wrong syntax when transforming into the sample sentence with the correct syntax; and the training unit is used for training the machine learning model to be trained through the sample transformation class labels and the sentence characteristic vectors corresponding to the grammar error sample sentences to obtain the pre-trained machine learning model.
In some embodiments of the present application, based on the foregoing scheme, the third generating unit includes: an editing subunit, configured to, for each vocabulary included in the first segmentation result, perform editing processing on the vocabulary so that the vocabulary is converted into a vocabulary included in the second segmentation result; an edit distance determining subunit, configured to determine an edit distance between the first segmentation result and the second segmentation result based on an edit processing category in which the vocabulary is edited; and the selection subunit is used for selecting the editing processing category of the vocabulary under the premise of the minimum editing distance for editing processing as the transformation category label.
FIG. 14 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that the computer system 1400 of the electronic device shown in fig. 14 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 14, a computer system 1400 includes a Central Processing Unit (CPU)1401, which can perform various appropriate actions and processes, such as executing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1402 or a program loaded from a storage portion 1408 into a Random Access Memory (RAM) 1403. In the RAM 1403, various programs and data necessary for system operation are also stored. The CPU 1401, ROM 1402, and RAM 1403 are connected to each other via a bus 1404. An Input/Output (I/O) interface 1405 is also connected to the bus 1404.
The following components are connected to the I/O interface 1405: an input portion 1406 including a keyboard, a mouse, and the like; an output portion 1407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 1408 including a hard disk and the like; and a communication section 1409 including a Network interface card such as a LAN (Local Area Network) card, a modem, and the like. The communication section 1409 performs communication processing via a network such as the internet. The driver 1410 is also connected to the I/O interface 1405 as necessary. A removable medium 1411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1410 as necessary, so that a computer program read out therefrom is installed into the storage section 1408 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1409 and/or installed from the removable medium 1411. When the computer program is executed by a Central Processing Unit (CPU)1401, various functions defined in the system of the present application are executed.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for identifying a syntax error, comprising:
obtaining a sentence to be identified;
generating a sentence characteristic vector corresponding to the sentence to be recognized based on the sentence to be recognized, wherein the sentence characteristic vector comprises word vectors corresponding to all vocabularies contained in the sentence to be recognized and position characteristic vectors of all vocabularies contained in the sentence to be recognized at the position of the sentence to be recognized;
predicting a target transformation category label based on the sentence characteristic vector corresponding to the sentence to be recognized, wherein the target transformation category label is a transformation category label of each vocabulary contained in the sentence to be recognized when the vocabulary is transformed into a grammar correct sentence corresponding to the sentence to be recognized;
and determining the grammar error type of the sentence to be recognized based on the sentence to be recognized and the target transformation class label.
2. The method for recognizing the grammar error as claimed in claim 1, wherein the determining the grammar error type of the sentence to be recognized based on the sentence to be recognized and the target transformation class label comprises:
determining a grammar correct statement corresponding to the statement to be recognized based on the target transformation category label and the statement to be recognized;
and carrying out grammar error classification recognition on the sentence to be recognized based on the grammar correct sentence and a preset grammar error classification rule, and determining the grammar error type of the sentence to be recognized.
3. The syntax error recognition method of claim 2, wherein predicting a target transformation class label based on the sentence feature vector corresponding to the sentence to be recognized comprises:
performing fusion processing on the sentence characteristic vectors corresponding to the sentences to be recognized to generate semantic characteristic vectors containing context semantic information;
performing full-connection processing on the semantic feature vector to obtain a processed semantic feature vector;
and predicting the target transformation class label based on the processed semantic feature vector.
4. The method for recognizing syntax errors as claimed in claim 3, wherein said predicting the target transformation class label based on the processed semantic feature vector comprises:
normalizing the processed semantic feature vector to generate the probability corresponding to the candidate transformation category label when each vocabulary contained in the sentence to be recognized is transformed into a grammar correct sentence corresponding to the sentence to be recognized;
and selecting the candidate transformation class label with the maximum probability as the target transformation class label.
5. The method for recognizing the grammar error as claimed in claim 4, wherein the recognizing the grammar error classification of the sentence to be recognized based on the grammar correct sentence and a preset grammar error classification rule, and determining the grammar error type of the sentence to be recognized comprises:
and if the probability corresponding to the target transformation category label is higher than a preset probability threshold, carrying out grammar error classification recognition on the sentence to be recognized based on the grammar correct sentence and a preset grammar error classification rule, and determining the grammar error type of the sentence to be recognized.
6. The syntax error recognition method according to claim 1, wherein after determining the syntax error type of the sentence to be recognized based on the sentence to be recognized and the target transformation class label, the syntax error recognition method further comprises:
and generating error correction suggestion information corresponding to the sentence to be recognized based on the grammar error type of the sentence to be recognized and the sentence to be recognized.
7. The method according to any one of claims 1 to 6, wherein the target transformation class labels and the sentence feature vectors corresponding to the sentences to be recognized are generated by a pre-trained machine learning model.
8. The syntax error recognition method according to claim 7, further comprising:
acquiring training set sample data for training a machine learning model to be trained, wherein each piece of sample data in the training set sample data comprises a sample statement pair, and the sample statement pair comprises a grammar error sample statement and a grammar correct sample statement;
respectively performing word segmentation processing on the grammar error sample sentences and the grammar correct sample sentences to obtain first word segmentation results corresponding to the grammar error sample sentences and second word segmentation results corresponding to the grammar correct sample sentences;
generating a sentence characteristic vector corresponding to the grammar error sample sentence based on the first word segmentation result;
generating a sample transformation class label based on the first word segmentation result and the second word segmentation result, wherein the sample transformation class label is a transformation class label of each vocabulary contained in the grammar error sample sentence when the vocabulary is transformed into the grammar correct sentence sample;
and training the machine learning model to be trained through the sample transformation class labels and the sentence characteristic vectors corresponding to the grammar error sample sentences to obtain a pre-trained machine learning model.
9. The method of claim 8, wherein the generating a sample transformation class label based on the first and second segmentation results comprises:
editing the vocabulary contained in the first segmentation result so that the vocabulary is converted into the vocabulary contained in the second segmentation result;
determining an editing distance between the first segmentation result and the second segmentation result based on an editing processing category for editing the vocabulary;
and selecting the editing processing category of the vocabulary under the premise of the minimum editing distance for editing processing as the transformation category label.
10. A syntax error recognition apparatus, comprising:
the first acquisition unit is used for acquiring the sentence to be identified;
a first generating unit, configured to generate, based on the to-be-recognized sentence, a sentence feature vector corresponding to the to-be-recognized sentence, where the sentence feature vector includes a word vector corresponding to each vocabulary included in the to-be-recognized sentence and a position feature vector of each vocabulary included in the to-be-recognized sentence at a position of the to-be-recognized sentence;
the prediction unit is used for predicting a target transformation category label based on the sentence characteristic vector corresponding to the sentence to be recognized, wherein the target transformation category label is a transformation category label of each vocabulary contained in the sentence to be recognized when the vocabulary is transformed into a grammar correct sentence corresponding to the sentence to be recognized;
and the grammar error determining unit is used for determining the grammar error type of the sentence to be recognized based on the sentence to be recognized and the target transformation class label.
CN202110282569.1A 2021-03-16 2021-03-16 Grammar error recognition method and device Pending CN113705207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110282569.1A CN113705207A (en) 2021-03-16 2021-03-16 Grammar error recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110282569.1A CN113705207A (en) 2021-03-16 2021-03-16 Grammar error recognition method and device

Publications (1)

Publication Number Publication Date
CN113705207A true CN113705207A (en) 2021-11-26

Family

ID=78647823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110282569.1A Pending CN113705207A (en) 2021-03-16 2021-03-16 Grammar error recognition method and device

Country Status (1)

Country Link
CN (1) CN113705207A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116011461A (en) * 2023-03-02 2023-04-25 文灵科技(北京)有限公司 Concept abstraction system and method based on event classification model
CN117350283A (en) * 2023-10-11 2024-01-05 西安栗子互娱网络科技有限公司 Text defect detection method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116011461A (en) * 2023-03-02 2023-04-25 文灵科技(北京)有限公司 Concept abstraction system and method based on event classification model
CN117350283A (en) * 2023-10-11 2024-01-05 西安栗子互娱网络科技有限公司 Text defect detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US11501182B2 (en) Method and apparatus for generating model
CN110096570B (en) Intention identification method and device applied to intelligent customer service robot
CN109241524B (en) Semantic analysis method and device, computer-readable storage medium and electronic equipment
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN108846077B (en) Semantic matching method, device, medium and electronic equipment for question and answer text
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN111309915A (en) Method, system, device and storage medium for training natural language of joint learning
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN116795973B (en) Text processing method and device based on artificial intelligence, electronic equipment and medium
US11170169B2 (en) System and method for language-independent contextual embedding
US11669740B2 (en) Graph-based labeling rule augmentation for weakly supervised training of machine-learning-based named entity recognition
CN112883193A (en) Training method, device and equipment of text classification model and readable medium
CN109933792A (en) Viewpoint type problem based on multi-layer biaxially oriented LSTM and verifying model reads understanding method
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN113705207A (en) Grammar error recognition method and device
CN113486174B (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN110852071A (en) Knowledge point detection method, device, equipment and readable storage medium
CN114357195A (en) Knowledge graph-based question-answer pair generation method, device, equipment and medium
CN114330483A (en) Data processing method, model training method, device, equipment and storage medium
CN111723583B (en) Statement processing method, device, equipment and storage medium based on intention role
CN113704393A (en) Keyword extraction method, device, equipment and medium
CN112349294A (en) Voice processing method and device, computer readable medium and electronic equipment
CN116362242A (en) Small sample slot value extraction method, device, equipment and storage medium
CN114911940A (en) Text emotion recognition method and device, electronic equipment and storage medium
CN113012685B (en) Audio recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination