CN106649294B

CN106649294B - Classification model training and clause recognition method and device

Info

Publication number: CN106649294B
Application number: CN201611250331.6A
Authority: CN
Inventors: 郭祥; 杨君; 赵博洋; 田东东; 王思月; 柴静
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: 3600 Technology Group Co ltd
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2020-11-06
Anticipated expiration: 2036-12-29
Also published as: CN106649294A

Abstract

The embodiment of the invention provides a method and a device for training a classification model and identifying clauses thereof, wherein the training method comprises the following steps: setting English sentences with English clauses as training samples; converting the training samples into a characteristic text sequence; and training a classification model for identifying English clauses by adopting the characteristic text sequence. The method and the device can automatically identify the type of the clause contained in the English sentence, improve the information diversity of the English sentence, reduce the comparison of the English sentence by a user manually by inquiring other data, reduce the time spent, improve the efficiency and reduce the error probability under the condition of less knowledge mastery.

Description

Classification model training and clause recognition method and device

Technical Field

The invention relates to the technical field of computer processing, in particular to a method for training a classification model of an English clause, a method for identifying the English clause based on the classification model, a corresponding device for training the classification model of the English clause and a device for identifying the English clause based on the classification model.

Background

With the development of globalization, english has become one of the basic subjects of people's study as one of the international common languages.

When people read English articles and watch English movies, most people can translate by means of translation application when people encounter English sentences which are not understood.

At present, translation application usually translates English sentences to obtain corresponding meanings, but for people with learning purposes, especially students can have other requirements on the English sentences, at the moment, people need to manually compare the English sentences by inquiring other data, more time is spent, the efficiency is low, and mistakes are easily made under the condition of less knowledge mastering.

Disclosure of Invention

In view of the above problems, the present invention has been made to provide a method for training a classification model of english clauses, a method for identifying english clauses based on a classification model and a corresponding apparatus for training a classification model of english clauses, an apparatus for identifying english clauses based on a classification model, which overcome or at least partially solve the above problems.

According to an aspect of the present invention, there is provided a method for training a classification model of english clauses, comprising:

setting English sentences with English clauses as training samples;

converting the training samples into a characteristic text sequence;

and training a classification model for identifying English clauses by adopting the characteristic text sequence.

Optionally, the step of converting the training samples into a feature text sequence includes:

identifying a composition structure of the training sample;

and forming a feature sequence text by adopting the composition structure.

Optionally, the step of training a classification model for identifying english clauses by using the feature text sequence includes:

inputting the characteristic text sequence into a convolutional neural network;

and training a classification model for identifying English clauses in the convolutional neural network by adopting the characteristic text sequence based on the sequence of the words in the training sample.

According to another aspect of the present invention, there is provided a method for identifying english clauses based on a classification model, comprising:

determining an English sentence to be recognized;

converting the English sentence into a characteristic text sequence;

and inputting the characteristic text sequence into a preset classification model to identify the clause types contained in the English sentence.

Optionally, the step of converting the english sentence into a feature text sequence includes:

identifying the composition structure of the English sentence;

and forming a feature sequence text by adopting the composition structure.

Optionally, the step of inputting the characteristic text sequence into a preset classification model to identify a clause type contained in the english sentence includes:

inputting the characteristic text sequence into a classification model trained by a convolutional neural network;

and identifying the clause types contained in the English sentence by adopting the characteristic text sequence in the classification model based on the sequence of the words in the English sentence.

According to another aspect of the present invention, there is provided an apparatus for training a classification model of english clauses, comprising:

the training sample setting module is suitable for setting English sentences with English clauses as training samples;

a training sample conversion module, which is suitable for converting the training sample into a characteristic text sequence;

and the classification model training module is suitable for training a classification model for identifying English clauses by adopting the characteristic text sequence.

Optionally, the training sample conversion module includes:

the sample structure identification submodule is suitable for identifying the composition structure of the training sample;

and the sample characteristic forming submodule is suitable for forming a characteristic sequence text by adopting the composition structure.

Optionally, the classification model training module includes:

a convolutional neural network input submodule adapted to input the characteristic text sequence into a convolutional neural network;

and the convolutional neural network training submodule is suitable for training a classification model for identifying English clauses in the convolutional neural network by adopting the characteristic text sequence based on the sequence of the words in the training sample.

According to another aspect of the present invention, there is provided an apparatus for recognizing english clauses based on a classification model, comprising:

the English sentence determining module is suitable for determining an English sentence to be recognized;

the English sentence conversion module is suitable for converting the English sentence into a characteristic text sequence;

and the clause type identification module is suitable for inputting the characteristic text sequence into a preset classification model so as to identify the clause type contained in the English sentence.

Optionally, the english sentence conversion module includes:

the sentence structure recognition submodule is suitable for recognizing the composition structure of the English sentence;

and the sentence characteristic forming submodule is suitable for forming a characteristic sequence text by adopting the composition structure.

Optionally, the clause type identifying module includes:

the classification model input submodule is suitable for inputting the characteristic text sequence into a classification model trained by a convolutional neural network;

and the classification model identification submodule is suitable for identifying clause types contained in the English sentence in the classification model by adopting the characteristic text sequence based on the sequence of the words in the English sentence.

The embodiment of the invention sets the English sentence with the English clause as the training sample and converts the training sample into the characteristic text sequence, and trains the classification model for identifying the English clause by adopting the characteristic text sequence, so that the type of the clause contained in the English sentence can be automatically identified, the information diversity of the English sentence is improved, the comparison of the English sentence by inquiring other information manually by a user is reduced, the time spent can be reduced, the efficiency is improved, and the error probability is reduced under the condition of less knowledge mastery.

The embodiment of the invention converts the English sentences into the characteristic text sequence and inputs the preset classification model to identify the clause types contained in the English sentences, thereby realizing the automatic identification of the clause types contained in the English sentences, improving the information diversity of the English sentences, reducing the comparison of the English sentences by a user manually inquiring other data, reducing the time spent and improving the efficiency, and reducing the error probability under the condition of less knowledge mastery.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart illustrating steps of a method for identifying english information according to an embodiment of the present invention;

FIGS. 2A-2E are diagrams illustrating an example of an English sentence recognizing operation according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating steps of another method for identifying english information according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating the steps of a method for training a classification model of English clauses according to an embodiment of the present invention;

FIG. 5 illustrates an example view of an identification of a component structure according to one embodiment of the invention;

FIG. 6 is a flow chart illustrating steps of a method for identifying English clauses based on classification models, according to an embodiment of the present invention;

fig. 7 is a block diagram illustrating an apparatus for recognizing english information according to an embodiment of the present invention;

fig. 8 is a block diagram illustrating another apparatus for recognizing english information according to an embodiment of the present invention;

FIG. 9 is a block diagram of an apparatus for training English clause classification models according to an embodiment of the present invention; and

fig. 10 is a block diagram illustrating an apparatus for recognizing english clauses based on a classification model according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Referring to fig. 1, a flowchart illustrating steps of a method for identifying english information according to an embodiment of the present invention is shown, which may specifically include the following steps:

step 101, selecting target image data.

In a specific implementation, the embodiment of the present invention may be applied to a mobile terminal, for example, a mobile phone, a PDA (personal digital Assistant), a laptop computer, a palmtop computer, and the like, which is not limited in this respect.

The mobile terminals may be installed with operating systems such as Windows, Android (Android), IOS, Windows phone, and the like, and in these operating systems, an english recognition application may be installed to perform recognition of english information, where the english recognition application may be a system application in the operating system or a third party application.

In the embodiment of the present invention, the english recognition application may select the target image data recorded with the english information according to an operation instruction of the user to be recognized.

In a specific implementation, the english recognition application may select the target image data by:

firstly, shooting.

In this manner, the mobile terminal is configured with a camera (camera), as shown in fig. 2A, after the user starts the english recognition application, the user clicks a "photograph recognition sentence" control on an interface of the english recognition application, and pops up a menu bar as shown in fig. 2B, so that the user can click the "photograph" control.

The English recognition application can call a camera to acquire preview image data according to the photographing control.

Taking the Android system as an example, the english recognition application declares the use of a camera and other related features (functions, such as auto-focus, etc.) in a manifest file.

An intent (e.g., mediastore, action _ IMAGE _ CAPTURE) is used in a main activity of an english recognition application to notify a camera application built in an operating system, the camera application executes the intent of a camera by a startactiveforresult () method, a user returns preview IMAGE data to the main activity after taking a picture by using the camera, a method for receiving the preview IMAGE data (e.g., an inactivyreresult () method) is added to the main activity, and the returned preview IMAGE data is operated.

Since the number of english information may be small, in order to reduce interference of other things and improve recognition accuracy, a preview frame, for example, a rectangle with white dots at four corners as shown in fig. 2C, may be loaded in the preview image data, and the user may adjust the shape, position, and size of the preview frame so that the english information is included in the position of the preview frame and other things are excluded.

Of course, the user may also directly select the whole frame of preview image data as the target image data, which is not limited in the embodiment of the present invention.

If the user clicks the "√" control shown in fig. 2C, the preview image data in the preview box can be extracted as target image data.

And secondly, local uploading.

In this manner, as shown in fig. 2A, after the user starts the english recognition application, the user clicks the "photograph recognition sentence" control on the interface of the english recognition application, pops up the menu bar as shown in fig. 2B, and the user can click the "select from the mobile phone album" control, thereby selecting the local image data.

The english recognition application may import the locally stored image data as the target image data according to a selection of the user.

It should be noted that the image data locally stored in the mobile terminal may be image data obtained by taking a picture in advance, image data obtained by capturing a picture, or image data obtained by other methods, which is not limited in this embodiment of the present invention.

Of course, the above-mentioned manner of selecting the target image data is only an example, and when implementing the embodiment of the present invention, other manners of selecting the target image data may be set according to actual situations, and the embodiment of the present invention is not limited thereto. In addition, besides the above-mentioned manner of selecting the target image data, a person skilled in the art may also adopt other manners of selecting the target image data according to actual needs, and the embodiment of the present invention is not limited to this.

And 102, identifying English information from the target image data, and splitting one or more English sentences.

For the target image data, english information can be recognized from the target image data by OCR (Optical Character Recognition).

In this way, pre-processing including binarization, noise removal, tilt correction, etc. may be performed on the target image data to improve the accuracy of the recognition.

For the target image data after preprocessing, character features can be extracted, which generally include the following two types:

1. and (5) statistical characteristics. For example, the black/white point ratios in the text area, when the text area is divided into several areas, the combination of the black/white point ratios of the individual areas becomes a numerical vector of the space.

2. Is a structural feature. For example, after the text image is thinned, the number and positions of the stroke end points and intersection points of the character are obtained, or the character is characterized by the stroke segments.

Comparing the extracted features with all English letters to be identified stored in the database, and identifying the English letters corresponding to the features by using a Euclidean space comparison method, a Relaxation comparison method (relax) and a Dynamic Programming comparison (DP) method.

Then, the English letters after comparison and possible similar candidate character groups can be utilized to find out the most logical English letters according to the English letters recognized before and after, and correction is carried out.

In the embodiment of the present invention, the target image data may include one or more english sentences, and each sentence may be identified and split based on a period number or the like.

In practical applications, in order to save resource consumption of the mobile terminal, the recognition of the english information and the splitting of the english sentence may be performed by the server.

In this manner, the english recognition application may send the target image data to the server, and the server recognizes the english information from the target image data by the optical character recognition method, splits one or more english sentences from the english information, and returns the english sentence to the english recognition application.

And the English recognition application receives English information which is returned by the server and recognized from the target image data in an optical character recognition mode and one or more English sentences split from the English information.

As shown in fig. 2D, since it takes a long time for the server to recognize the english information and split the english sentence, information such as "recognizing …" is displayed on the interface of the english recognition application, and the user is prompted to wait.

Of course, the recognition of the english information and the splitting of the english sentence may also be performed by the english recognition application, and the embodiment of the present invention is not limited thereto.

Step 103, the english sentence is divided into interactive elements that can be clicked on each word, and the sentence pattern factor of the english sentence is identified.

In the embodiment of the present invention, each word constituting an english sentence may be split, and then a clickable interactive element, such as JSON (JavaScript Object notato, a lightweight data exchange format) data, may be generated.

Each word can generate an independent interactive element, namely the interactive element represents the word by recording the word and the like, and the interactive elements are distributed according to the distribution of the word and can form a complete English sentence.

The user may select one or more interactive elements by clicking, etc., to select one or more words, to translate, etc., the selected words.

For example, as shown in fig. 2E, for The english sentence "The query heat it is right or right depends on The result", a selectable interactive element may be generated for each of "The", "query", "heat", "it", "is", "right", "or", "wrong", "depends", "on", "The", "result".

In addition, the sentence form factor of the English sentence, namely the English attribute in the English sentence, can be identified, so that the query of the user is facilitated.

In embodiments of the present invention, the schema factor may include one or more of the following:

1. sentence structure

The structure of the english sentence may include one or more of:

1.1, structure of Supper, in which the predicate is a missing verb, e.g., He run quickly

1.2 Main family Structure, where the predicate is a family verb, e.g., He is old than it seems

1.3, the Master-predicate Structure, where the predicate is the transitive verb and thus there is an object, e.g., I saw a fileviterdeay (I saw a movie yesterday.)

1.4, the Master-Lobster Structure, where the predicate is an transitive verb with two objects, e.g., He gaveme a book/a book to me (He gives me a book)

1.5, a Master-predicate-Bingbu architecture, in which the predicate is a transitive verb with an object complement, e.g., Theymade the girl angry (they make the girl angry.)

2. Clause type

A Subordinate sentence (Subordinate sentence) is relative to a main sentence, that is, in a compound sentence, the Subordinate sentence depends on a certain main sentence and cannot be made as a single sentence, but has a main part and a predicate part and is guided by a guidance word (Connective) such as that, who, when, and the like.

In english, there are mainly three kinds of clauses, namely, noun clauses (including subject clauses, object clauses, table clauses, colloquial clauses), adjective clauses (i.e., idiom clauses), and adverbial clauses (i.e., idiom clauses, including time, condition, result, purpose, reason, yield, place, manner, etc.).

Specifically, the method comprises the following steps:

2.1, subject clauses, the sentences used as subjects in the compound sentence are called subject clauses.

For example, the last he finished writing the composition in sub a short timed total us all (he written the composition in this short time to surprise us.)

2.2, object clauses, the sentences used as objects in the compound sentence are called subject clauses.

For example, Telll him which class you are in (telling him which level.)

2.3, Tao-lingual clauses, the sentences used as table languages in the compound sentence are called subject clauses.

For example, China is no longer a past Chinese (China) used to be

2.4, clauses with collocations, wherein the sentences used as the collocations in the compound sentence are called collocations clauses.

For example, I board the news at team had won (I hear the message we won the team)

2.5, phrase-fixed clauses, wherein the sentences used as phrases-fixed in the compound sentence are called phrase-fixed clauses.

For example, The dog that is at/while was lost has been seen as found in The following found

2.6, the idiom clauses, wherein the sentences used as idioms in the compound sentence are called idiom clauses.

For example, I will not go to her party if doesn't invite me (I will not go to her party, if she does not invite me.)

In one embodiment of the invention, clause types may be identified by:

substep S1031, determining English sentences to be recognized;

substep S1032, convert the english sentence into a characteristic text sequence;

and a substep S1033, inputting the characteristic text sequence into a preset classification model to identify clause types contained in the english sentence.

In the embodiment of the present invention, since sub-step S1031, sub-step S1032 and sub-step S1033 are basically similar to the applications of step 501, step 502 and step 503, the description is relatively simple, and relevant points can be referred to the partial descriptions of step 501, step 502 and step 503, and the embodiment of the present invention is not described in detail herein.

3. Sentence tense

The tense of the english sentence may include one or more of:

3.1, commonly at present, represents a frequent occurrence, a frequent action, or a general fact.

For example, She doesn't often write to her family, only once a month (She often writes to her home)

3.2, the general past time can be used for describing actions or existing states which occur at a certain time in the past, and can also be used for representing habitual actions which occur frequently in a certain time in the past.

For example, He got his driving license month (He got a driving license in the previous month.)

3.3, general future, may be used to describe actions that will occur or that will exist in the future.

For example, He will area where this evolution, arrived this evening)

3.4, now in progress, can be used to describe the action that is taking place "when speaking, writing an article" or the action that is being performed "at the present time".

For example, the y are having a football match (They are playing football.)

3.5, past progress can indicate an action that occurred or proceeded at a certain point in time in the past.

For example, At this moment yesterday, I was packing for camp (this time yesterday I is picking up something to camp.)

3.6 past completion, which means that an action has occurred or completed before some time or action in the past.

For example, When I woke up, it had stopped raining

4. Part of speech

Parts of speech are also called parts of speech, and english words may include one or more of the following according to their roles in sentences:

4.1 noun (non, n.), e.g., student.

4.2 pronouns (pronoun, pronoun.), e.g., you.

4.3 adjectives (adj.), e.g., happy.

4.4, adverbs (adverb, adv.), e.g., Quickly (quickly).

4.5 verbs (verbs, v.), for example cut.

4.6, numeric, num, e.g., three (three).

4.7, article (art), e.g. a (one).

4.8 preposition, e.g., at.

4.9, conjunctions (conj.), for example, and.

4.10, interjection, interj.), for example, oh.

It should be noted that one english word may have multiple parts of speech, and the parts of speech in the embodiment of the present invention may refer to the parts of speech of the english word in the english sentence to be recognized, and may assist in recognizing the parts of speech of the english word in the english sentence to be recognized through the context information.

Of course, the above-mentioned sentence pattern factors are only examples, and when implementing the embodiment of the present invention, other sentence pattern factors may be set according to practical situations, which is not limited in the embodiment of the present invention. In addition, besides the above-mentioned pattern factors, those skilled in the art may also adopt other pattern factors according to actual needs, and the embodiment of the present invention is not limited thereto.

Because the data size of the sentence pattern factor may be large, the sentence pattern factor may be identified and displayed in batches, or the sentence pattern factor may be identified and displayed in batches together, which is not limited in the embodiment of the present invention.

For example, as shown in the interface shown in fig. 2E, if the user clicks the "sentence analysis" control, the sentence structure and the clause type may be displayed, if the user clicks the "tense analysis" control, the sentence tense may be displayed, and if the user clicks the "part of speech analysis" control, the part of speech may be displayed.

In practical applications, in order to save the resource consumption of the mobile terminal, the splitting of the english words and the recognition of the sentence pattern factors can be performed by the server.

In this manner, the english recognition application may send the english sentence to the server, and the server splits each word from the english sentence, and returns one or more information of the sentence structure, clause type, sentence tense, part of speech of the word in the english sentence recognized from the english sentence to the english recognition application.

The English recognition application receives each word which is split from the English sentence and one or more of sentence structure, clause information, sentence tense and part of speech of the word in the English sentence, wherein the sentence structure, the clause information, the sentence tense and the part of speech of the word in the English sentence are recognized from the English sentence.

Thereafter, the English recognition application generates a clickable interactive element with each word in the interface.

Of course, the splitting of the english word and the recognition of the sentence pattern factor may also be performed by the english recognition application, and the embodiment of the present invention is not limited thereto.

According to the embodiment of the invention, the English information is identified from the selected target image data, one or more English sentences are split, the English sentences are split into the interactive elements with the selectable words and the sentence pattern factors of the English sentences, on one hand, a user can select one or more required words from the interactive elements to perform subsequent translation and other operations, on the other hand, the sentence pattern factors of the English sentences are automatically identified, the information diversity of the English sentences is improved, the comparison of the English sentences by the user manually inquiring other data is reduced, the time spent can be reduced, the efficiency is improved, and the error probability is reduced under the condition of less knowledge mastery.

Referring to fig. 3, a flowchart illustrating steps of another method for identifying english information according to an embodiment of the present invention is shown, which may specifically include the following steps:

step 301, target image data is selected.

Step 302, identifying english information from the target image data, and splitting one or more english sentences.

Step 303, the english sentence is divided into interactive elements that can be clicked on each word, and a sentence pattern factor of the english sentence is identified.

Step 304, selecting one or more target english sentences from the one or more english sentences.

Step 305, translating the one or more target english sentences to obtain target language information.

In the embodiment of the invention, the user can select the target English sentence from the recognized English sentences for translation, and obtain the required target language information, such as Chinese translation, Korean translation, meglumine translation and the like.

For example, as shown in FIG. 2E, for The English sentence "The question heat it is right or The right depends on The result", it can be translated into "whether The question is right or wrong, depending on The result".

In addition, the english sentence may be a single sentence translation or a multiple sentence translation.

In practical applications, in order to save the resource consumption of the mobile terminal, the translation of the target english sentence may be performed by the server.

In this manner, the english recognition application may send one or more target english sentences to the server, and the server translates the one or more target english sentences to obtain target language information and returns the target language information to the english recognition application.

And receiving target language information obtained by translating the one or more target English sentences returned by the server by the English recognition application.

Of course, the translation of the target english sentence may also be performed by the english recognition application, and the embodiment of the present invention is not limited thereto.

Step 306, selecting a target word from the words in the English sentence based on the interactive element.

Step 307, translating the target word to obtain target language information.

In the embodiment of the invention, a user can select a target word from a certain English sentence to translate, and obtain required target language information, such as a Chinese translation, a Korean translation, a gulf-language translation and the like.

For example, as shown in FIG. 2E, for The English sentence "The query heat it is right orwrong terms on The result", The user can click and select "query", "terms" and "on" as The target words, and click on The "turn" control for translation.

In practical applications, the translation of the target word may be performed by the server in order to save resource consumption of the mobile terminal.

In this manner, the english recognition application may send the target word to the server, and the server translates the target word to obtain the target language information and returns the target language information to the english recognition application.

And receiving target language information obtained by translating the target word returned by the server by the English recognition application.

Of course, the translation of the target word may also be performed by the english recognition application, and the embodiment of the present invention is not limited thereto.

Referring to fig. 4, a flowchart illustrating steps of a method for training a classification model of english clauses according to an embodiment of the present invention is shown, which may specifically include the following steps:

step 401, an english sentence with an english clause is set as a training sample.

In the embodiment of the present invention, an english Clause (basic class) may be collected as a training sample of the classification model.

The clauses are related to the main sentence, that is, in the compound sentence, the clauses are subordinate to a certain main sentence and cannot be made as a single sentence, but have a main part and a predicate part and are guided by a guidance word (Connective) such as that, who, and while.

Specifically, the method comprises the following steps:

the subject clauses, the sentences used as subjects in the compound sentence are called subject clauses.

The object clauses, the sentences used as objects in the compound sentence are called subject clauses.

For example, Telll him which class you are in (telling him which level.)

The phrase clauses, the sentences used as phrases in the compound sentence are called subject clauses.

For example, China is no longer a past Chinese (China) used to be

The sentence used as the colloquial in the compound sentence is called the colloquial clause.

The phrase "is used to mean a phrase" in which the phrase is used as a phrase "in the compound sentence.

The phrase clauses, the sentences used as the phrase in the compound sentence, are called the sibling clauses.

For example, I will not go to her party if she doesn't invite me (I will not go to her party, if she does not invite me.)

Step 402, converting the training samples into a characteristic text sequence.

In a specific implementation, the features of the training sample (i.e., english clause) may be identified, and the training sample (i.e., english clause) may be replaced with the features to form a feature text sequence.

In one embodiment of the present invention, step 402 may include the following sub-steps:

substep S4021, identifying a constituent structure of the training sample;

and a substep S4022 of forming a feature sequence text using the constituent structure.

In the embodiment of the present invention, a stanford parser (stanford parser) may be configured in advance, wherein the stanford parser is a lexical probabilistic context-free parser, and dependency parsing is also used.

Through a Stanford parser, dependency parsing can be performed on training samples (i.e. English clauses), and dependency relationships of the English sentences are output.

Stanford parser (stanford parser) is used for natural language processing, mainly to implement several functions:

1) identifying and marking the part of speech of the words in the sentence;

2) creating a grammatical relation Stanford Dependencies between every two words in a sentence;

3) a syntactic structure of a sentence is obtained.

Further, the stanford parser can give a syntax parse tree of a sentence, and the part of speech and the constituent components of each word.

For English clauses, English words do not have too much meaning, and the composition structure of the English clauses is strong characteristics, so that the embodiment of the invention can extract the strong characteristics and remove useless characteristics.

In one example, as shown in fig. 5, The english sentence "The boy who is presenting The powerpoint is The most hand man" can be converted into a feature text sequence "ROOT S NP DT NN SBAR WHNP WP S VP VBZ VPVBG NP DT JJ VP VBZ NP DT RBS JJ NN." by performing dependency syntax analysis on The english sentence "The boy who is presenting The powerpoint is The most hand man" by a stanford parser (stanford parser), wherein ROOT represents a sentence of text to be processed, NP represents a noun phrase, DT (determiner) represents a qualifier, NN represents a common noun, and so on.

Besides the stanford syntax analyzer, other ways may be adopted to identify the component structure of the training sample, and the embodiment of the present invention is not limited thereto.

And 403, training a classification model for identifying English clauses by using the characteristic text sequence.

In practical application, the characteristic text sequence can be used for training through a machine learning method to obtain a classification model for identifying English clauses.

In one embodiment of the present invention, step 403 may include the following sub-steps:

substep S4031, input the said characteristic text sequence into the convolution neural network;

and a substep S4032 of training a classification model for identifying English clauses in the convolutional neural network by using the characteristic text sequence based on the sequence of the words in the training sample.

The Convolutional Neural Network (CNN) is a feedforward Neural Network, and can extract a topological structure from a two-dimensional image, optimize the Network structure by using a back propagation algorithm, and solve unknown parameters in the Network.

For Natural Language Processing (NLP), the input to the convolutional neural network is no longer a pixel but a characteristic text sequence represented in the form of a matrix or the like, and the matrix is equivalent to an "image".

When the convolutional neural network is classified, the sequence of words in English sentences can be considered, so that sentence pattern structures of English clauses can be learned.

In a particular implementation, a convolutional neural network structure includes: convolutional layer, downsampling layer, full link layer. Each layer has a plurality of feature maps, each feature map extracting a feature of the input through a convolution filter, each feature map having a plurality of neurons.

And (3) rolling layers: the reason for using convolution layers is that an important feature of convolution operation is that original signal characteristics can be enhanced and noise can be reduced by convolution operation.

A down-sampling layer: the reason for using downsampling is that subsampling an image can reduce the amount of computation while preserving image rotation invariance, according to the principle of local correlation of images.

The purpose of sampling is mainly to confuse the specific position of a feature, because the specific position of a feature is not important after being found, we only need the relative position of the feature with other positions, such as an "8", when we obtain the upper "o", we do not need to know the specific position of the feature in the image, we only need to know the specific position of the feature below the "o", and we can know the position of the feature as an "8", because the left deviation or the right deviation of the "8" in the image does not influence the understanding of the feature, and the strategy of confusing the specific position can identify the deformed and distorted image.

Full connection layer: and (4) fully connecting by adopting softmax, and obtaining an activation value, namely the picture characteristics extracted by the convolutional neural network.

After a convolutional neural network is constructed, solving is carried out on convolutional nerves, and training mainly comprises four steps which are divided into two stages:

first, forward propagation phase:

1) taking a sample from the sample set, and inputting the sample into a convolution nerve;

2) calculating a corresponding actual output; at this stage, information is passed from the input layer to the output layer via a stepwise transformation.

Second, back propagation stage:

1) calculating the difference between the actual output and the corresponding ideal output;

2) the weight matrix is adjusted in a manner to minimize the error.

Further, the training process of the network is as follows:

(1) selecting a training set, and respectively and randomly seeking N samples from the sample set to be used as the training set;

(2) setting each weight value and each threshold value to be small random values close to 0, and initializing precision control parameters and a learning rate;

(3) taking an input mode from the training set, adding the input mode to the network, and giving out a target output vector of the input mode;

(4) calculating the intermediate layer output vector and calculating the actual output vector of the network;

(5) comparing elements in the output vector with elements in the target vector to calculate an output error; errors also need to be calculated for hidden units in the middle layer;

(6) sequentially calculating the adjustment quantity of each weight and the adjustment quantity of the threshold;

(7) adjusting the weight and the threshold;

(8) after M, judging whether the index meets the precision requirement, if not, returning to the step (3) and continuing iteration; if yes, entering the next step;

(9) and after training, storing the weight and the threshold in a file. At this point, it can be considered that the respective weights have reached a stable value and a classifier has been formed. And training again, directly deriving the weight and the threshold from the file for training without initializing.

In addition to the convolutional neural network, other Machine learning methods may be used to train a classification model for recognizing english clauses, for example, SVM (Support Vector Machine), adaboost, and the like, which is not limited in this embodiment of the present invention.

Referring to fig. 6, a flowchart illustrating steps of a method for identifying english clauses based on a classification model according to an embodiment of the present invention is shown, which may specifically include the following steps:

step 601, determining an English sentence to be recognized.

In a specific implementation, as shown in fig. 2E, for a certain english sentence, if the user clicks the "sentence analysis" control, the english sentence can be used as the english sentence to be recognized, so as to recognize the sentence structure and the clause type.

At this time, if the recognition of the sentence pattern factor (including the type of the clause) can be performed by the server, the server may receive the english sentence uploaded by the english recognition application as the english sentence to be recognized.

Of course, if the recognition of the sentence pattern factor (including the clause type) can be performed by the english recognition application, the english recognition application can directly extract the english sentence as the english sentence to be recognized.

Besides, besides the above manners, the english sentence to be recognized may be determined in other manners, for example, the user directly inputs the english sentence to be recognized, and the like, which is not limited in this embodiment of the present invention.

Step 602, converting the english sentence into a characteristic text sequence.

In a specific implementation, the characteristics of an english sentence can be recognized, and the english sentence is replaced with the characteristics to form a characteristic text sequence.

In one embodiment of the present invention, step 602 may include the following sub-steps:

substep S6021, discern the composition structure of the said English sentence;

and a substep S6022 of forming a feature sequence text by using the composition structure.

1) identifying and marking the part of speech of the words in the sentence;

3) a syntactic structure of a sentence is obtained.

Besides the stanford syntax analyzer, other manners may be adopted to identify the composition structure of the english sentence, which is not limited in the embodiment of the present invention.

Step 603, inputting the characteristic text sequence into a preset classification model to identify clause types contained in the english sentence.

By applying the embodiment of the invention, the characteristic text sequence converted from the training sample can be used for training by a machine learning method to obtain the classification model for identifying English clauses.

In one embodiment of the invention, the classification model may be trained by:

substep S6031, setting an english sentence with an english clause as a training sample;

substep S6032, converting the training sample into a characteristic text sequence;

and a substep S6033 of training a classification model for identifying English clauses by using the characteristic text sequence.

In the embodiment of the present invention, since sub-step S6031, sub-step S6032, and sub-step S6033 are basically similar to the applications of step 401, step 402, and step 403, the description is relatively simple, and the relevant points can be referred to the partial descriptions of step 401, step 402, and step 403, and the embodiment of the present invention is not described in detail herein.

In a specific implementation, the feature text sequence can be input into the classification model to identify the clause type contained in the english sentence.

In one embodiment of the present invention, step 603 may include the following sub-steps:

substep S6034, inputting the characteristic text sequence into a classification model trained by a convolutional neural network;

and a substep S6035 of identifying, in the classification model, clause types contained in the english sentence by using the feature text sequence based on the order of words in the english sentence.

In an embodiment of the invention, the classification model is trained based on a convolutional neural network.

When the convolutional neural network is classified, the sequence of words in English sentences can be considered, so that the sentence pattern structure of English clauses can be learned, and the clause types contained in the English sentences can be identified.

For simplicity of explanation, the method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the embodiments of the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 7, a block diagram of a device for recognizing english information according to an embodiment of the present invention is shown, which may specifically include the following modules:

a target image data selection module 701 adapted to select target image data;

a sentence splitting module 702, adapted to identify english information from the target image data and split one or more english sentences;

the sentence attribute identifying module 703 is adapted to break the english sentence into interactive elements that can be clicked on each word, and identify a sentence pattern factor of the english sentence.

In one embodiment of the present invention, the target image data selection module 701 includes:

the preview image data acquisition submodule is suitable for calling a camera to acquire preview image data;

the preview frame loading submodule is suitable for loading a preview frame in the preview image data;

the preview image data extraction submodule is suitable for extracting the preview image data in the preview frame as target image data;

and/or the presence of a gas in the gas,

and the image data importing submodule is suitable for importing the locally stored image data as target image data.

In one embodiment of the present invention, the sentence splitting module 702 comprises:

the target image data sending submodule is suitable for sending the target image data to a server;

and the splitting information receiving submodule is suitable for receiving English information which is returned by the server and is identified from the target image data in an optical character identification mode and one or more English sentences split from the English information.

In one embodiment of the present invention, the sentence attribute identification module 703 includes:

the English sentence sending submodule is suitable for sending the English sentence to a server;

the sentence attribute receiving submodule is suitable for receiving each word which is returned by the server and is split from the English sentence, and one or more information of sentence structures, clause types, sentence tenses and parts of speech of the words in the English sentence, wherein the information is identified from the English sentence;

clickable interactive elements are generated with the respective words.

Referring to fig. 8, a block diagram of another apparatus for identifying english information according to an embodiment of the present invention is shown, and may specifically include the following modules:

a target image data selection module 801 adapted to select target image data;

a sentence splitting module 802, adapted to identify english information from the target image data, and split one or more english sentences;

the sentence attribute identifying module 803 is adapted to break the english sentence into interactive elements that can be clicked on each word, and identify a sentence pattern factor of the english sentence.

A target english sentence selecting module 804, adapted to select one or more target english sentences from the one or more english sentences;

the target english sentence translation module 805 is adapted to translate the one or more target english sentences to obtain target language information.

A target word selection module 806 adapted to select a target word from a word in the english sentence based on the interactive element;

a target word translation module 807 adapted to translate the target word to obtain target language information.

In an embodiment of the present invention, the target english sentence translation module 805 includes:

the target English sentence sending submodule is suitable for sending the one or more target English sentences to the server;

and the target English sentence translation information receiving submodule is suitable for receiving target language information which is returned by the server and obtained by translating the one or more target English sentences.

In one embodiment of the present invention, the target word translation module 707 includes:

a target word sending submodule adapted to send the target word to a server;

and the target word translation information receiving submodule is suitable for receiving target language information which is returned by the server and obtained by translating the target word.

Referring to fig. 9, a block diagram of a structure of a training apparatus for a classification model of english clauses according to an embodiment of the present invention is shown, which may specifically include the following modules:

a training sample setting module 901 adapted to set an english sentence with an english clause as a training sample;

a training sample conversion module 902 adapted to convert the training samples into a feature text sequence;

and the classification model training module 903 is suitable for training a classification model for identifying English clauses by adopting the characteristic text sequence.

In one embodiment of the present invention, the training sample conversion module 902 comprises:

In one embodiment of the present invention, the classification model training module 903 comprises:

Referring to fig. 10, a block diagram of a device for recognizing english clauses based on a classification model according to an embodiment of the present invention is shown, which may specifically include the following modules:

an english sentence determination module 1001 adapted to determine an english sentence to be recognized;

an english sentence conversion module 1002 adapted to convert the english sentence into a feature text sequence;

a clause type identifying module 1003 adapted to input the characteristic text sequence into a preset classification model to identify a clause type contained in the english sentence.

In an embodiment of the present invention, the english sentence conversion module 1002 includes:

In one embodiment of the present invention, the clause type identifying module 1003 includes:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the training apparatus for classification models of english clauses, the apparatus for identifying english clauses based on classification models, and the like according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A training method of a classification model of English clauses comprises the following steps:

setting English sentences with English clauses as training samples;

converting the training samples into a characteristic text sequence;

training a classification model for identifying English clauses by adopting the characteristic text sequence;

the method further comprises the following steps:

identifying English information from the target image data, and splitting one or more English sentences;

splitting the English sentence into interactive elements which can be clicked by each word, and identifying sentence pattern factors of the English sentence;

translating the clicked interactive elements; or

Selecting one or more target English sentences from the one or more English sentences;

translating the one or more target English sentences to obtain target language information;

the converting the training samples into a feature text sequence comprises:

and converting the training sample into a characteristic text sequence according to the part of speech of each word in the training sample and the grammatical relation between every two words in the training sample.

2. The method of claim 1, wherein the step of converting the training samples into a feature text sequence comprises:

identifying a composition structure of the training sample;

and forming a feature sequence text by adopting the composition structure.

3. A method according to claim 1 or 2, wherein the step of training a classification model for identifying english clauses using the sequence of feature texts comprises:

inputting the characteristic text sequence into a convolutional neural network;

4. A method of identifying english clauses based on a classification model, comprising:

determining an English sentence to be recognized;

converting the English sentence into a characteristic text sequence;

inputting the characteristic text sequence into a preset classification model to identify clause types contained in the English sentence;

the method further comprises the following steps:

translating the clicked interactive elements; or

the converting the English sentence into a characteristic text sequence comprises:

and converting the English sentence into a characteristic text sequence according to the part of speech of each word in the English sentence and the grammatical relation between every two words in the English sentence.

5. The method of claim 4, wherein the step of converting the English sentence into a characteristic text sequence comprises:

identifying the composition structure of the English sentence;

and forming a feature sequence text by adopting the composition structure.

6. The method of claim 5, wherein the step of inputting the characteristic text sequence into a preset classification model to identify the clause types contained in the english sentence comprises:

7. An apparatus for training a classification model of English clauses, comprising:

the classification model training module is suitable for adopting the characteristic text sequence to train a classification model for identifying English clauses;

the device also includes:

the first identification module is suitable for identifying English information from the target image data and splitting one or more English sentences; splitting the English sentence into interactive elements which can be clicked by each word, and identifying sentence pattern factors of the English sentence;

the training sample setting module is suitable for translating the clicked interactive elements;

the target English sentence selection module is suitable for selecting one or more target English sentences from the one or more English sentences;

the target English sentence translation module is suitable for translating the one or more target English sentences to obtain target language information;

the training sample conversion module is used for converting the training samples into characteristic text sequences according to the part of speech of each word in the training samples and the grammatical relation between every two words in the training samples.

8. The apparatus of claim 7, wherein the training sample conversion module comprises:

9. The apparatus of claim 7 or 8, wherein the classification model training module comprises:

10. An apparatus for recognizing English clauses based on a classification model, comprising:

the clause type identification module is suitable for inputting the characteristic text sequence into a preset classification model so as to identify the clause type contained in the English sentence;

the device also includes:

the second identification module is used for identifying English information from the target image data and splitting one or more English sentences; splitting the English sentence into interactive elements which can be clicked by each word, and identifying sentence pattern factors of the English sentence;

the English sentence determining module is suitable for translating the clicked interactive elements;

and the English sentence conversion module is used for converting the English sentence into a characteristic text sequence according to the part of speech of each word in the English sentence and the grammatical relation between every two words in the English sentence.

11. The apparatus of claim 10, wherein the english sentence conversion module comprises:

12. The apparatus of claim 10 or 11, wherein the clause type identification module comprises: