CN109670178A - Sentence-level bilingual alignment method and device, computer readable storage medium - Google Patents
Sentence-level bilingual alignment method and device, computer readable storage medium Download PDFInfo
- Publication number
- CN109670178A CN109670178A CN201811562126.2A CN201811562126A CN109670178A CN 109670178 A CN109670178 A CN 109670178A CN 201811562126 A CN201811562126 A CN 201811562126A CN 109670178 A CN109670178 A CN 109670178A
- Authority
- CN
- China
- Prior art keywords
- text
- sentence
- punctuate
- handled
- aligned
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000011159 matrix material Substances 0.000 claims abstract description 135
- 238000012545 processing Methods 0.000 claims abstract description 28
- 238000005457 optimization Methods 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims description 76
- 238000013519 translation Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005194 fractionation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of Sentence-level bilingual alignment method and devices, computer readable storage medium, this method comprises: step S1: obtaining Z trained convolution kernels, wherein Z is the integer more than or equal to 1;Step S2: punctuate processing is carried out to two texts to be aligned respectively, and establish the text similarity matrix U of described two texts to be aligned: step S3: convolution being carried out to the text similarity matrix U using each of the Z trained convolution kernels convolution kernel, obtains Z optimization text similarity matrix;Step S4: optimize text similarity matrix using described Z and obtain the sentence alignment result of described two texts to be aligned.The present invention is conducive to the efficiency of sentence alignment between raising text.
Description
Technical field
The present invention relates to natural language processing technique field, especially a kind of Sentence-level bilingual alignment method and device, meter
Calculation machine readable storage medium storing program for executing.
Background technique
Parallel Corpus is more important data for the translation algorithm based on natural language processing, parallel/right
Answering corpus is by source text and its parallel corresponding bilingual/multi-lingual corpus translating Chinese language and originally constituting, and degree of registration can
It is several to be divided into word grade, sentence grade, section grade and piece grade, wherein the parallel corpora of sentence grade is therefore most common corpus usually can
The parallel corpora of section grade, piece grade will be converted to the parallel corpora for the grade that forms a complete sentence, but in corpus, original text and translation might not
It is one-to-one, for example, being likely to result in 15 Chinese sentence pairs due to the difference that text structure and author write habit and answering
22 English sentences, it is also possible to will cause 16 Chinese sentence pairs and answer 50 English sentences, so needing to consider complicated multiplicity
Sentence match situation, presently mainly the fractionation of the corpus of paragraph and chapter is combined into using manual type one-to-one
Sentence, it will take a lot of manpower and time for this mode, to be unfavorable for the raising of sentence alignment efficiency.
Summary of the invention
In view of this, one of the objects of the present invention is to provide a kind of Sentence-level bilingual alignment method and devices, computer
Readable storage medium storing program for executing is conducive to the raising of sentence alignment efficiency.
In order to achieve the above objectives, technical solution of the present invention provides a kind of Sentence-level bilingual alignment method, comprising:
Step S1: Z trained convolution kernels are obtained, wherein Z is the integer more than or equal to 1, is trained described in each
Convolution kernel obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text of described two trained texts
This similarity matrix B:
Wherein, n is the sentence that described two training are handled with text by punctuate with a training in text
Quantity, m are the quantity for the sentence that described two training are handled with text by punctuate with another training in text, text
Element K in this similarity matrix BijI-th of the sentence handled by punctuate with text for one training with it is described
The text similarity for j-th of sentence that another training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: it is rolled up using text similarity matrix B of the current convolution kernel to described two training text
Product, obtains matrix P, and calculate penalty values loss, if penalty values loss meets preset requirement, thens follow the steps S14, otherwise, hold
Row step S16;
Wherein, if i-th of sentence and described another that one training is handled with text by punctuate are trained
It is matched with text by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default want
It asks, if so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reaches
To preset times, if so, step S15 is executed, if it is not, repeating step S13;
Step S2: carrying out punctuate processing to two texts to be aligned respectively, and establishes the text of described two texts to be aligned
This similarity matrix U:
Wherein, a is the sentence that a text to be aligned in described two texts to be aligned is handled by punctuate
Quantity, b are the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, text
Element K in this similarity matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is described
The text similarity for j-th of sentence that another text to be aligned is handled by punctuate;
Step S3: each of the Z trained convolution kernels convolution kernel is respectively adopted to the text similarity
Matrix U carries out convolution, obtains Z optimization text similarity matrix;
Step S4: optimize text similarity matrix using described Z and obtain the sentence alignment of described two texts to be aligned
As a result.
Further, Z is integer more than or equal to 2, and the size and weighted of different trained convolution kernels.
Further, the step S4 includes:
Step S41: text matches degree matrix T is calculated according to the Z optimization text similarity matrix, wherein the text
Element Y in this matching degree matrix TijI-th of the sentence handled for one text to be aligned by punctuate with it is described
The text matches degree for j-th of sentence that another text to be aligned is handled by punctuate, and the text matches degree matrix T
Each of element value be described Z optimize text similarity matrix in same position element average value;
Step S42: each row element in the text matches degree matrix T is successively traversed, is chosen from each row element
It is worth maximum element, and corresponding two sentences of the element of the selection is matched.
Further, after the step S42 further include:
Step S43: judge that another described text to be aligned passes through in the b sentence that punctuate is handled with the presence or absence of not
The sentence of pairing, if so, lookup and its maximum sentence of text matches degree in the text matches degree matrix T, and will be described
The sentence found is matched with it.
Further, after the step S4 further include:
Step S5: the b sentence handled according to another described text to be aligned by punctuate it is described another
The a sentence that sequence of positions, one text to be aligned in text to be aligned are handled by punctuate is one
Sequence of positions in text to be aligned detects sentence alignment result.
Further, the step S5 includes:
Step S51: according to sequence of positions of the b sentence in another described text to be aligned and the sentence
Alignment result is ranked up a sentence;
Step S52: if there are two sentences in a sentence, described two sentences pass through the position sorted and obtained
Sequence is set with sequence of positions of described two sentences in one text to be aligned on the contrary, then there are mistakes for judgement.
Further, include an English text in described two trained texts and described two texts to be aligned with
An and non English language text, wherein calculate in the following ways each sentence that English text is handled by punctuate with it is non-
The text similarity K for each sentence that English text is handled by punctuate:
Non English language text is translated by the sentence that punctuate is handled, obtains corresponding English text;
To two sentences of text similarity to be calculated, compare sentence that wherein English text is handled by punctuate with
Pass through the quantity of word in the English text that the statement translation that punctuate is handled obtains by non English language text;
It calculates
Wherein, E is the word quantity of a fairly large number of one of word in the comparison result, NvFor the comparison
As a result in a fairly large number of one of middle word v-th of word value, if in the comparison result word negligible amounts
One of include root identical as v-th of word word, then NvValue be 1, be otherwise 0.
To achieve the above object, technical solution of the present invention additionally provides a kind of Sentence-level bilingual alignment device, comprising:
Module is obtained, for obtaining Z trained convolution kernels, wherein Z is the integer more than or equal to 1, described in each
Trained convolution kernel is obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text of described two trained texts
This similarity matrix B:
Wherein, n is the sentence that described two training are handled with text by punctuate with a training in text
Quantity, m are the quantity for the sentence that described two training are handled with text by punctuate with another training in text, text
Element K in this similarity matrix BijI-th of the sentence handled by punctuate with text for one training with it is described
The text similarity for j-th of sentence that another training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: it is rolled up using text similarity matrix B of the current convolution kernel to described two training text
Product, obtains matrix P, and calculate penalty values loss, if penalty values loss meets preset requirement, thens follow the steps S14, otherwise, hold
Row step S16;
Wherein, if i-th of sentence and described another that one training is handled with text by punctuate are trained
It is matched with text by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default want
It asks, if so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reaches
To preset times, if so, step S15 is executed, if it is not, repeating step S13;
First processing module for carrying out punctuate processing to two texts to be aligned respectively, and is established described two to right
The text similarity matrix U of neat text:
Wherein, a is the sentence that a text to be aligned in described two texts to be aligned is handled by punctuate
Quantity, b are the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, text
Element K in this similarity matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is described
The text similarity for j-th of sentence that another text to be aligned is handled by punctuate;
Second processing module, for each of the Z trained convolution kernels convolution kernel to be respectively adopted to described
Text similarity matrix U carries out convolution, obtains Z optimization text similarity matrix;
Third processing module, for obtaining described two texts to be aligned using the Z optimization text similarity matrix
Sentence be aligned result.
To achieve the above object, technical solution of the present invention additionally provides a kind of Sentence-level bilingual alignment device, including place
Reason device and the memory that couple with the processor, wherein the processor is for executing the instruction in memory, in realization
State Sentence-level bilingual alignment method.
To achieve the above object, technical solution of the present invention additionally provides a kind of computer readable storage medium, the meter
Calculation machine readable storage medium storing program for executing is stored with computer program, and the computer program realizes that above-mentioned Sentence-level is double when being executed by processor
The step of language alignment schemes.
Sentence-level bilingual alignment method provided by the invention, by using trained convolution kernel to two texts to be aligned
Text similarity matrix carry out convolution, and sentence alignment are carried out to two texts to be aligned according to the result of convolution, not only may be used
It to reduce artificial participation, realizes sentence automatic aligning, the accuracy rate of alignment can also be improved, be conducive to sentence pair between raising text
Neat efficiency.
Detailed description of the invention
By referring to the drawings to the description of the embodiment of the present invention, the above and other purposes of the present invention, feature and
Advantage will be apparent from, in the accompanying drawings:
Fig. 1 is a kind of flow chart of Sentence-level bilingual alignment method provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of training convolutional core provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram of text similarity matrix provided in an embodiment of the present invention;
Fig. 4 is a kind of schematic diagram of objective matrix provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram for calculating text matches degree matrix provided in an embodiment of the present invention.
Specific embodiment
Below based on embodiment, present invention is described, but the present invention is not restricted to these embodiments.Under
Text is detailed to describe some specific detail sections in datail description of the invention, in order to avoid obscuring essence of the invention,
There is no narrations in detail for well known method, process, process, element.
In addition, it should be understood by one skilled in the art that provided herein attached drawing be provided to explanation purpose, and
What attached drawing was not necessarily drawn to scale.
Unless the context clearly requires otherwise, "include", "comprise" otherwise throughout the specification and claims etc. are similar
Word should be construed as the meaning for including rather than exclusive or exhaustive meaning;That is, be " including but not limited to " contains
Justice.
In the description of the present invention, it is to be understood that, term " first ", " second " etc. are used for description purposes only, without
It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present invention, unless otherwise indicated, the meaning of " multiple "
It is two or more.
It is a kind of flow chart of Sentence-level bilingual alignment method provided in an embodiment of the present invention, this method referring to Fig. 1, Fig. 1
Include:
Step S1: Z trained convolution kernels are obtained, wherein Z is the integer more than or equal to 1, is trained described in each
Convolution kernel obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text of described two trained texts
This similarity matrix B:
Wherein, n is the sentence that described two training are handled with text by punctuate with a training in text
Quantity, m are the quantity for the sentence that described two training are handled with text by punctuate with another training in text, text
Element K in this similarity matrix BijI-th of the sentence handled by punctuate with text for one training with it is described
The text similarity for j-th of sentence that another training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: it is rolled up using text similarity matrix B of the current convolution kernel to described two training text
Product, obtains matrix P, and calculate penalty values loss, if penalty values loss meets preset requirement, thens follow the steps S14, otherwise, hold
Row step S16;
Wherein, if i-th of sentence and described another that one training is handled with text by punctuate are trained
It is matched with text by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default want
It asks, if so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reaches
To preset times, if so, step S15 is executed, if it is not, repeating step S13;
Step S2: carrying out punctuate processing to two texts to be aligned respectively, and establishes the text of described two texts to be aligned
This similarity matrix U:
Wherein, a is the sentence that a text to be aligned in described two texts to be aligned is handled by punctuate
Quantity, b are the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, text
Element K in this similarity matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is described
The text similarity for j-th of sentence that another text to be aligned is handled by punctuate;
Step S3: each of the Z trained convolution kernels convolution kernel is respectively adopted to the text similarity
Matrix U carries out convolution, obtains Z optimization text similarity matrix;
Step S4: optimize text similarity matrix using described Z and obtain the sentence alignment of described two texts to be aligned
As a result.
Sentence-level bilingual alignment method provided in an embodiment of the present invention, by using trained convolution kernel to two to right
The text similarity matrix of neat text carries out convolution, and carries out sentence alignment to two texts to be aligned according to the result of convolution,
It can not only reduce artificial participation, realize sentence automatic aligning, the accuracy rate of alignment can also be improved, be conducive to improve between text
The efficiency of sentence alignment.
The trained convolution kernel of each of embodiment of the present invention can be obtained by convolutional neural networks training, such as Fig. 2
It is shown, by using sentence be aligned result known to two training use the text similarity matrix B of text as the input of training set,
And objective matrix is inputted, objective matrix (i.e. model answer) is used for compared with the matrix that neural network returns, so that nerve net
The output of network is infinitely close to objective matrix, to obtain required convolution kernel, detailed process is as follows:
Step A1: obtaining two trained texts from training set, for example, one of training is English text with text
(original text), another training are Chinese text (translation) with text, and the sentence of two trained texts is aligned known to result;
Step A2: punctuate processing is carried out with text to two training respectively;
Punctuate processing is carried out for dividing the marking symbols of sentence for example, can use in text, with bilingual Chinese-English right
For neat, Chinese with ".","!" it is ending, English is ending with " ", is made pauses in reading unpunctuated ancient writings if there are above-mentioned marking symbols, is broken
Two lists are obtained after sentence, respectively one English (original text) sentence list and one including n English sentence includes m Chinese
Chinese (translation) sentence list of sentence, each of English sentence list sentence is independent a word in original text, middle sentence
Each of list sentence is independent a word in translation,, can be with for each sentence list in addition, for convenient for processing
Each sentence therein is numbered according to text tandem (i.e. the sequence of positions of sentence in the text), as sentence rope
Draw, for example, the number of the sentence of beginning location is 1 in English text ... in English sentence list, the sentence of end position
Number be n, in Chinese sentence list, the number of the sentence of beginning location is 1 in Chinese text ..., the language of end position
The number of sentence is m;
Step A3: establishing the text similarity matrix B of two trained texts, i.e., for the m word in Chinese list,
All with each progress similarity system design of n word in English list, detailed process is as follows:
Firstly, using translation tool by the identical language of translator of Chinese Cheng Yuyuan (English) text, i.e., in Chinese sentence list
Each sentence translated, obtain the wherein corresponding English text of each sentence;
To two sentences (a Chinese sentence and an English sentence) of text similarity to be calculated, compare wherein english statement
The quantity of word in the English text obtained with Chinese statement translation;
It calculates later
Wherein, E is the word quantity of a fairly large number of one of word in the comparison result, NvFor the comparison
As a result in a fairly large number of one of middle word v-th of word value, if in the comparison result word negligible amounts
One of include root identical as v-th of word word, then NvValue be 1, be otherwise 0;
It should be noted that if comparison result is identical for the word quantity of the two, then it can be using any one as word
A fairly large number of one, negligible amounts one of of the another one as word;
I.e. by taking root to exactly match the word in sentence, and the text between two sentences is calculated using above-mentioned formula
Similarity, if root is identical, coupling number adds 1. matched sums as molecule, the length of the sentence (number of word i.e. in sentence
Amount) it is used as denominator to take the word quantity of longer sentence as denominator if length is inconsistent;
By the above-mentioned means, available m*n text similarity, is indicated, i.e., using the matrix that a size is m*n
As text similarity matrix B;
Wherein, the element K in text similarity matrix BijIt (is numbered for i-th of sentence in above-mentioned English sentence list
For the sentence of i) text similarity with j-th of sentence (i.e. number be j sentence) in above-mentioned Chinese sentence list;
For example, obtaining its text similarity matrix as shown in figure 3, can see after being handled with text two training
Out the element aggregation of matrix intermediate value larger (i.e. text similarity is higher) since the upper left corner to the diagonal line that the lower right corner terminates
Position, this is because China and British text sentence sequencing having the same;
Step A4: initialization convolution kernel, and the convolution kernel that initialization is obtained executes step A5 as current convolution kernel;
Step A5: result is aligned according to the sentence of above-mentioned two training text and establishes objective matrix J;
Wherein, the element L in objective matrix JijI-th of sentence and above-mentioned Chinese in corresponding above-mentioned English sentence list
J-th of sentence in sentence list, and the value of element is determined by known sentence alignment result, if i-th in English sentence list
J-th of sentence in a sentence and Chinese sentence list matches, LijValue be 1, be otherwise 0;
For example, as shown in Figure 4 according to the objective matrix J that above-mentioned two training is established with text;
Step A6: carrying out convolution using text similarity matrix B of the current convolution kernel to described two trained texts,
Matrix P is obtained, and calculates penalty values loss using the objective matrix J established, if penalty values loss meets preset requirement (such as less than
One threshold value), A7 is thened follow the steps, otherwise, executes step A9;
Step A7: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default want
It asks, if so, step A8 is executed, if it is not, executing step A9;
Wherein, which includes several verifying texts pair, each verifying text is to including an English text
This (original text) and a Chinese text (translation);
Wherein, verification process is substantially similar to training process, and details are not described herein again, when the damage for verifying collection in the result of verifying
Mistake value loss is less than a certain threshold value, and when the accuracy rate for verifying collection is greater than a certain threshold value, it is default to determine that the result of verifying meets
It is required that;
Step A8: using current convolution kernel as trained convolution kernel;
Step A9: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reaches
To preset times, if so, step A8 is executed, if it is not, repeating step A6.
Preferably, in one embodiment, Z is integer more than or equal to 2, and the size of different trained convolution kernels and
Weighted, for example, the value of Z can be 3,5 or 6;
To obtain multiple trained convolution kernels, multiple convolution kernels (different volumes that initialization obtains can be initialized respectively
The size and weight of product core are different), later using each convolution kernel respectively to the text of above-mentioned two trained text
Similarity matrix B carries out convolution algorithm, operation the result is that multiple changed matrixes of numerical value, later by obtain each
Matrix obtains the penalty values loss of different convolutional neural networks compared with objective matrix, wherein the more big then table of penalty values loss
Show that neural network effect is more bad, need parameter adjustment bigger, penalty values loss is smaller, indicates that neural network effect is better, needs
Want parameter adjustment smaller, therefore can be according to respectively different penalty values loss, reverse transfer is to corresponding convolution mind
Through network, each convolutional neural networks reversely successively adjusts network parameter according to respective penalty values loss, i.e. adjustment convolution
The weight of core, the weighted value that each backpropagation of each convolution kernel is adjusted is not identical, until penalty values loss reaches pre-
Phase requires.
It should be noted that memory can be stored it in after obtaining trained convolution kernel through the above way
In, when need to use, it can read and obtain directly from memory.
For example, in one embodiment, in two texts to be aligned, one of them text to be aligned is English text
(original text), another text to be aligned are Chinese text (translation), wherein establish the text similarity of two texts to be aligned
The method (i.e. above-mentioned steps A1, A2, A3) that matrix U and the text similarity matrix B for establishing above-mentioned two training text are adopted
Identical, details are not described herein again;
In above-mentioned steps S3, by by the text similarity matrix U of two texts to be aligned and trained convolution kernel
Convolution is carried out, realizes that the optimization to text similarity matrix U is corrected, obtains optimization text similarity matrix;
For example, in one embodiment, above-mentioned steps S4 includes:
Step S41: text matches degree matrix T is calculated according to the Z optimization text similarity matrix, wherein the text
Element Y in this matching degree matrix TijI-th of the sentence handled for one text to be aligned by punctuate with it is described
The text matches degree for j-th of sentence that another text to be aligned is handled by punctuate, and the text matches degree matrix T
Each of element value be described Z optimize text similarity matrix in same position element average value;
After the contraposition of obtained Z optimization text similarity matrix is added, the element of each position is averaging, is obtained
To text matches degree matrix T;
It should be noted that if the value of Z is 1, it can be directly using optimization text similarity matrix as text matches degree square
Battle array;
For example, with reference to Fig. 5, the text similarity matrix U of two texts to be aligned and 3 trained convolution kernels are carried out
Convolution obtains 3 optimization text similarity matrixes, text matches degree matrix is calculated later;
Step S42: each row element in the text matches degree matrix T is successively traversed, is chosen from each row element
It is worth maximum element, and corresponding two sentences of the element of the selection is matched;
For example, for each row element, therefrom selective value is maximum for text matches degree matrix obtained in Fig. 5
Element matches corresponding two sentences of the element of selection, obtain three pairing as a result, i.e. the 1st row (i.e. said one waits for
Aligning texts pass through the 1st sentence that punctuate is handled) (another i.e. above-mentioned text to be aligned is by punctuate with the 1st column
Manage the 1st obtained sentence) pairing, the 2nd row (i.e. said one text to be aligned passes through the 2nd sentence that punctuate is handled)
With the 3rd column (another i.e. above-mentioned text to be aligned passes through the 3rd sentence that punctuate is handled) pairing, the 3rd row (i.e. above-mentioned one
A text to be aligned passes through the 3rd sentence that punctuate is handled) (another i.e. above-mentioned text to be aligned is by disconnected with the 3rd column
The 3rd sentence that sentence processing obtains) pairing:
Wherein, in this step, if in a line, there are multiple maximum elements of value are (i.e. same in text matches degree matrix T
The value of multiple elements is maximum value in a line), then determine the value with the maximum element of a line intermediate value first, and as
Current lookup value is searched in above-mentioned Z optimization text similarity matrix and above-mentioned multiple maximum same positions of element of value later
Element, and determine the most position of current lookup value number wherein occur, and by determining corresponding two sentences in position into
Row pairing, for example, the element in the first row is [0.7,0.7,0.3], wherein first for the text matches degree matrix in Fig. 5
The value of the element of the second column position of element and the first row of the first column position of row is maximum value 0.7, then searches 3 optimization texts
The element of the second column position of the element of the first column position of the first row and the first row in this similarity matrix, due to 3 optimization texts
The first row element in this similarity matrix is respectively [0.7,0.6,0.3], [0.7,0.6,0.2] [0.7,0.9,0.4], can be with
See the element of the first column position of the first row occur 0.7 number it is most, therefore by the 1st row (i.e. said one text to be aligned
The 1st sentence handled by punctuate) (another i.e. above-mentioned text to be aligned is handled by punctuate with the 1st column
1st sentence) pairing, in addition, if in text matches degree matrix T in a line there are the maximum elements of multiple values, can also be from
An element is randomly choosed in multiple maximum element of value is used as the maximum element of value;
S42 can match each of said one text to be aligned sentence through the above steps, but may
It is unpaired in the presence of the sentence in another one or more above-mentioned text to be aligned, it is preferable that in step S4, the step
After rapid S42 further include:
Step S43: judge that another described text to be aligned passes through in the b sentence that punctuate is handled with the presence or absence of not
The sentence of pairing, if so, lookup and its maximum sentence of text matches degree in the text matches degree matrix T, and will be described
The sentence found is matched with it, is realized to the column leakage detection in matrix;
For example, after being matched by step S42, there are still the 2nd for text matches degree matrix obtained in Fig. 5
Arranging not matching row, (i.e. another text to be aligned is unpaired language by the 2nd sentence that punctuate is handled
Sentence), then wherein maximum value element is searched in the 2nd column in text matches degree matrix T, obtained result is that the 1st row the 2nd arranges position
The element set, thus by the 1st row (i.e. said one text to be aligned passes through the 1st sentence that punctuate is handled) and the 2nd column
(another i.e. above-mentioned text to be aligned passes through the 2nd sentence that punctuate is handled) matches, through the above steps S42-S43,
The pairing result that text matches degree matrix in Fig. 5 obtains are as follows: the 1st row and the 1st column pairing, the 1st row and the 2nd column pairing, the 2nd row
It is matched with the 3rd column pairing, the 3rd row and the 3rd column;
Preferably, in one embodiment, after the step S4 further include:
Step S5: the b sentence handled according to another described text to be aligned by punctuate it is described another
The a sentence that sequence of positions, one text to be aligned in text to be aligned are handled by punctuate is one
Sequence of positions in text to be aligned detects sentence alignment result;
For example, the step S5 can be specifically included:
Step S51: according to sequence of positions of the b sentence in another described text to be aligned and the sentence
Alignment result is ranked up a sentence;
Step S52: if there are two sentences in a sentence, described two sentences pass through the position sorted and obtained
Sequence is set with sequence of positions of described two sentences in one text to be aligned on the contrary, then there are mistakes for judgement, is needed
Illustrate, sequence of positions herein refers on the contrary: for two sentences in one text to be aligned, if passed through
The sequence of positions that sequence in step S51 obtains be one of sentence be located at before another sentence, but it is one to
Said one sentence is located at behind another above-mentioned sentence in aligning texts, it is determined that sequence of positions is opposite.
For example, said one text to be aligned is English text, another text to be aligned is Chinese text, in this
After two texts to be aligned of English carry out sentence alignment, it will usually obtain shaped like [in 20, English 25] such matching pair, for into one
Step ground improves the accuracy of pairing, can detect to matched result, specifically, will obtain matching to according to Chinese first
The number (i.e. sequence of positions of all Chinese sentences made pauses in reading unpunctuated ancient writings of Chinese text in Chinese text) of sentence carry out from it is small to
Big sequence is ranked up all English sentences that English text is made pauses in reading unpunctuated ancient writings to realize, then according to the result of the sequence
Detect the number (i.e. sequence of positions of all English sentences made pauses in reading unpunctuated ancient writings of English text in English text) of english sentence
Variation, judges whether it is the variation of monotonic increase, wherein monotonic increase is are as follows: inside a collating sequence, if in rear position
The number set is greater than the number in front position, then this sequence is monotonic increase, if not meeting the variation of monotonic increase, can incite somebody to action
The matching of monotonic increase is not met to being marked, to carry out error prompting to user.
Sentence-level bilingual alignment method provided in an embodiment of the present invention, it is contemplated that since complexity is more in sentence alignment procedure
The difference that the text structure of sample and author write habit causes complicated and diversified sentence pairing situation, by using multiple training
Good convolution kernel carries out convolution to the text similarity matrix of two texts to be aligned, realizes to the excellent of text similarity matrix
Change amendment, the matrix after making optimization considers the time sequencing (namely sequence of positions) that sentence occurs in the text, not only avoids
The interference that identical sentence generates when matching to sentence, and also avoid doing caused by complicated and diversified sentence pairing situation
It disturbs, ensure that the matched accuracy rate of sentence, substantially increase the robustness of algorithm.
The embodiment of the invention also provides a kind of Sentence-level bilingual alignment devices, comprising:
Module is obtained, for obtaining Z trained convolution kernels, wherein Z is the integer more than or equal to 1, described in each
Trained convolution kernel is obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text of described two trained texts
This similarity matrix B:
Wherein, n is the sentence that described two training are handled with text by punctuate with a training in text
Quantity, m are the quantity for the sentence that described two training are handled with text by punctuate with another training in text, text
Element K in this similarity matrix BijI-th of the sentence handled by punctuate with text for one training with it is described
The text similarity for j-th of sentence that another training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: it is rolled up using text similarity matrix B of the current convolution kernel to described two training text
Product, obtains matrix P, and calculate penalty values loss, if penalty values loss meets preset requirement, thens follow the steps S14, otherwise, hold
Row step S16;
Wherein, if i-th of sentence and described another that one training is handled with text by punctuate are trained
It is matched with text by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judges whether the result of verifying meets default want
It asks, if so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reaches
To preset times, if so, step S15 is executed, if it is not, repeating step S13;
First processing module for carrying out punctuate processing to two texts to be aligned respectively, and is established described two to right
The text similarity matrix U of neat text:
Wherein, a is the sentence that a text to be aligned in described two texts to be aligned is handled by punctuate
Quantity, b are the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, text
Element K in this similarity matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is described
The text similarity for j-th of sentence that another text to be aligned is handled by punctuate;
Second processing module, for each of the Z trained convolution kernels convolution kernel to be respectively adopted to described
Text similarity matrix U carries out convolution, obtains Z optimization text similarity matrix;
Third processing module, for obtaining described two texts to be aligned using the Z optimization text similarity matrix
Sentence be aligned result.
Wherein, in one embodiment, Z is integer more than or equal to 2, and the size and power of different trained convolution kernels
Weight is different.
Wherein, in one embodiment, the third processing module includes:
Computing unit, for calculating text matches degree matrix T according to the Z optimization text similarity matrix, wherein institute
State the element Y in text matches degree matrix TijI-th of the sentence handled for one text to be aligned by punctuate with
The text matches degree for j-th of sentence that another described text to be aligned is handled by punctuate, and the text matches degree
The value of each of matrix T element is the average value of same position element in described Z optimization text similarity matrix;
First pairing unit, for successively traversing each row element in the text matches degree matrix T, from every a line member
The maximum element of selected value in element, and corresponding two sentences of the element of the selection are matched.
Wherein, in one embodiment, the third processing module further include:
Second pairing unit, for judging that another described text to be aligned passes through in the b sentence that punctuate is handled
With the presence or absence of unpaired sentence, if so, being searched and its maximum language of text matches degree in the text matches degree matrix T
Sentence, and the sentence found is matched with it.
Wherein, in one embodiment, the Sentence-level bilingual alignment device further include:
As a result detection module, for being existed according to another described text to be aligned by the b sentence that punctuate is handled
Sequence of positions, one text to be aligned in another described text to be aligned pass through a sentence that punctuate is handled
Sequence of positions in one text to be aligned detects sentence alignment result.
Wherein, in one embodiment, the result detection module includes:
Sequencing unit, for according to sequence of positions of the b sentence in another described text to be aligned and institute
The neat result of predicate sentence pair is ranked up a sentence;
Detection unit, if for, there are two sentences, described two sentences to be obtained by the sequence in a sentence
Sequence of positions in one text to be aligned of sequence of positions and described two sentences on the contrary, then there are mistakes for judgement.
It wherein, in one embodiment, include one in described two trained texts and described two texts to be aligned
English text and a non English language text, wherein calculating English text is handled each by punctuate in the following ways
The text similarity K for each sentence that a sentence and non English language text are handled by punctuate:
Non English language text is translated by the sentence that punctuate is handled, obtains corresponding English text;
To two sentences of text similarity to be calculated, compare sentence that wherein English text is handled by punctuate with
Pass through the quantity of word in the English text that the statement translation that punctuate is handled obtains by non English language text;
It calculates
Wherein, E is the word quantity of a fairly large number of one of word in the comparison result, NvFor the comparison
As a result in a fairly large number of one of middle word v-th of word value, if in the comparison result word negligible amounts
One of include root identical as v-th of word word, then NvValue be 1, be otherwise 0.
The embodiment of the invention also provides a kind of Sentence-level bilingual alignment device, including processor and with the processor
The memory of coupling, wherein the processor is used to execute the instruction in memory, realizes above-mentioned Sentence-level bilingual alignment side
Method.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage
There is the step of computer program, the computer program realizes above-mentioned Sentence-level bilingual alignment method when being executed by processor.
Those skilled in the art will readily recognize that above-mentioned each preferred embodiment can be free under the premise of not conflicting
Ground combination, superposition.
It should be appreciated that above-mentioned embodiment is merely exemplary, and not restrictive, without departing from of the invention basic
In the case where principle, those skilled in the art can be directed to the various apparent or equivalent modification or replace that above-mentioned details is made
It changes, is all included in scope of the presently claimed invention.
Claims (10)
1. a kind of Sentence-level bilingual alignment method characterized by comprising
Step S1: Z trained convolution kernels are obtained, wherein Z is the integer more than or equal to 1, each described trained volume
Product core is obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text phase of described two trained texts
Like degree matrix B:
Wherein, n is the quantity for the sentence that described two training are handled with text by punctuate with a training in text,
M is the quantity for the sentence that described two training are handled with text by punctuate with another training in text, text phase
Like the element K in degree matrix BijI-th of the sentence handled by punctuate with text for one training with it is described another
The text similarity for j-th of sentence that a training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: convolution is carried out using text similarity matrix B of the current convolution kernel to described two trained texts, is obtained
To matrix P, and penalty values loss is calculated, if penalty values loss meets preset requirement, then follow the steps S14, otherwise, executes step
S16;
Wherein, if i-th of sentence and another described training text that one training is handled with text by punctuate
This is matched by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judge whether the result of verifying meets preset requirement,
If so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reaches pre-
If number, if so, step S15 is executed, if it is not, repeating step S13;
Step S2: carrying out punctuate processing to two texts to be aligned respectively, and establishes the text phase of described two texts to be aligned
Like degree matrix U:
Wherein, a is the quantity for the sentence that a text to be aligned in described two texts to be aligned is handled by punctuate,
B is the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, text phase
Like the element K in degree matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is described another
The text similarity for j-th of sentence that a text to be aligned is handled by punctuate;
Step S3: the text similarity matrix U is carried out using each of the Z trained convolution kernels convolution kernel
Convolution obtains Z optimization text similarity matrix;
Step S4: optimize text similarity matrix using described Z and obtain the sentence alignment result of described two texts to be aligned.
2. the method according to claim 1, wherein Z is integer more than or equal to 2, and different trained volumes
The size and weighted of product core.
3. the method according to claim 1, wherein the step S4 includes:
Step S41: text matches degree matrix T is calculated according to the Z optimization text similarity matrix, wherein the text
With the element Y in degree matrix TijI-th of the sentence handled for one text to be aligned by punctuate with it is described another
The text matches degree for j-th of sentence that a text to be aligned is handled by punctuate, and in the text matches degree matrix T
The value of each element is the average value of same position element in described Z optimization text similarity matrix;
Step S42: successively traversing each row element in the text matches degree matrix T, and selected value is most from each row element
Big element, and corresponding two sentences of the element of the selection are matched.
4. according to the method described in claim 3, it is characterized in that, after the step S42 further include:
Step S43: judge that another described text to be aligned passes through in the b sentence that punctuate is handled with the presence or absence of unpaired
Sentence, if so, searched in the text matches degree matrix T with its maximum sentence of text matches degree, and by the lookup
To sentence matched with it.
5. the method according to claim 1, wherein after the step S4 further include:
Step S5: the b sentence handled according to another described text to be aligned by punctuate waits for pair in described another
The a sentence that sequence of positions, one text to be aligned in neat text are handled by punctuate is one to right
Sequence of positions in neat text detects sentence alignment result.
6. according to the method described in claim 5, it is characterized in that, the step S5 includes:
Step S51: according to sequence of positions and sentence alignment of the b sentence in another described text to be aligned
As a result a sentence is ranked up;
Step S52: if there are two sentences in a sentence, described two sentences are suitable by the position that the sequence obtains
The sequence of positions of sequence and described two sentences in one text to be aligned is on the contrary, then there are mistakes for judgement.
7. -6 any method according to claim 1, which is characterized in that described two trained texts and described two
It include an English text and a non English language text in text to be aligned, wherein calculate English text warp in the following ways
It is similar to the text of each sentence that non English language text is handled by punctuate to cross each sentence that punctuate is handled
Spend K:
Non English language text is translated by the sentence that punctuate is handled, obtains corresponding English text;
To two sentences of text similarity to be calculated, compare sentence that wherein English text is handled by punctuate with by non-
English text passes through the quantity of word in the English text that the statement translation that punctuate is handled obtains;
It calculates
Wherein, E is the word quantity of a fairly large number of one of word in the comparison result, NvFor the comparison result
The value of v-th of word in a fairly large number of one of middle word, if in the comparison result negligible amounts of word one
Person includes the word of root identical as v-th of word, then NvValue be 1, be otherwise 0.
8. a kind of Sentence-level bilingual alignment device characterized by comprising
Module is obtained, for obtaining Z trained convolution kernels, wherein Z is the integer more than or equal to 1, each described training
Good convolution kernel is obtained by step S11- step S15;
Step S11: punctuate processing is carried out with text to two training respectively, and establishes the text phase of described two trained texts
Like degree matrix B:
Wherein, n is the quantity for the sentence that described two training are handled with text by punctuate with a training in text,
M is the quantity for the sentence that described two training are handled with text by punctuate with another training in text, text phase
Like the element K in degree matrix BijI-th of the sentence handled by punctuate with text for one training with it is described another
The text similarity for j-th of sentence that a training is handled with text by punctuate;
Step S12: initialization convolution kernel;
Step S13: convolution is carried out using text similarity matrix B of the current convolution kernel to described two trained texts, is obtained
To matrix P, and penalty values loss is calculated, if penalty values loss meets preset requirement, then follow the steps S14, otherwise, executes step
S16;
Wherein, if i-th of sentence and another described training text that one training is handled with text by punctuate
This is matched by j-th of sentence that punctuate is handled, then LijIt is 1, is otherwise 0;
Step S14: verifying current convolution kernel using verifying collection, judge whether the result of verifying meets preset requirement,
If so, step S15 is executed, if it is not, executing step S16;
Step S15: using current convolution kernel as trained convolution kernel;
Step S16: adjusting the weight of current convolution kernel according to penalty values loss, judges whether current frequency of training reaches pre-
If number, if so, step S15 is executed, if it is not, repeating step S13;
First processing module for carrying out punctuate processing to two texts to be aligned respectively, and establishes described two texts to be aligned
This text similarity matrix U:
Wherein, a is the quantity for the sentence that a text to be aligned in described two texts to be aligned is handled by punctuate,
B is the quantity for the sentence that another text to be aligned in described two texts to be aligned is handled by punctuate, text phase
Like the element K in degree matrix UijI-th of the sentence handled for one text to be aligned by punctuate with it is described another
The text similarity for j-th of sentence that a text to be aligned is handled by punctuate;
Second processing module, for similar to the text using each of the Z trained convolution kernels convolution kernel
It spends matrix U and carries out convolution, obtain Z optimization text similarity matrix;
Third processing module, for obtaining the language of described two texts to be aligned using described Z optimization text similarity matrix
The neat result of sentence pair.
9. a kind of Sentence-level bilingual alignment device, which is characterized in that including processor and the storage coupled with the processor
Device, wherein the processor is used to execute the instruction in memory, realizes the described in any item methods of claim 1-7.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence, when the computer program is executed by processor the step of any one of realization claim 1-7 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811562126.2A CN109670178B (en) | 2018-12-20 | 2018-12-20 | Sentence-level bilingual alignment method and device, computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811562126.2A CN109670178B (en) | 2018-12-20 | 2018-12-20 | Sentence-level bilingual alignment method and device, computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109670178A true CN109670178A (en) | 2019-04-23 |
CN109670178B CN109670178B (en) | 2019-10-08 |
Family
ID=66144024
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811562126.2A Active CN109670178B (en) | 2018-12-20 | 2018-12-20 | Sentence-level bilingual alignment method and device, computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109670178B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723587A (en) * | 2020-06-23 | 2020-09-29 | 桂林电子科技大学 | Chinese-Thai entity alignment method oriented to cross-language knowledge graph |
CN112906371A (en) * | 2021-02-08 | 2021-06-04 | 北京有竹居网络技术有限公司 | Parallel corpus acquisition method, device, equipment and storage medium |
CN113657421A (en) * | 2021-06-17 | 2021-11-16 | 中国科学院自动化研究所 | Convolutional neural network compression method and device and image classification method and device |
CN114564932A (en) * | 2021-11-25 | 2022-05-31 | 阿里巴巴达摩院(杭州)科技有限公司 | Chapter alignment method, apparatus, computer device and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868187A (en) * | 2016-03-25 | 2016-08-17 | 北京语言大学 | A multi-translation version parallel corpus establishing method |
US20170004121A1 (en) * | 2015-06-30 | 2017-01-05 | Facebook, Inc. | Machine-translation based corrections |
CN108897740A (en) * | 2018-05-07 | 2018-11-27 | 内蒙古工业大学 | A kind of illiteracy Chinese machine translation method based on confrontation neural network |
-
2018
- 2018-12-20 CN CN201811562126.2A patent/CN109670178B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170004121A1 (en) * | 2015-06-30 | 2017-01-05 | Facebook, Inc. | Machine-translation based corrections |
CN105868187A (en) * | 2016-03-25 | 2016-08-17 | 北京语言大学 | A multi-translation version parallel corpus establishing method |
CN108897740A (en) * | 2018-05-07 | 2018-11-27 | 内蒙古工业大学 | A kind of illiteracy Chinese machine translation method based on confrontation neural network |
Non-Patent Citations (2)
Title |
---|
WUHONGLIN 等: "A Sentence Alignment Model Based on Combined Clues and Kernel Extensional Matrix Matching Method", 《AASRI PROCEDIA》 * |
丁颖 等: "基于词对建模的句子对齐研究", 《计算机工程(网络首发)》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723587A (en) * | 2020-06-23 | 2020-09-29 | 桂林电子科技大学 | Chinese-Thai entity alignment method oriented to cross-language knowledge graph |
CN112906371A (en) * | 2021-02-08 | 2021-06-04 | 北京有竹居网络技术有限公司 | Parallel corpus acquisition method, device, equipment and storage medium |
CN112906371B (en) * | 2021-02-08 | 2024-03-01 | 北京有竹居网络技术有限公司 | Parallel corpus acquisition method, device, equipment and storage medium |
CN113657421A (en) * | 2021-06-17 | 2021-11-16 | 中国科学院自动化研究所 | Convolutional neural network compression method and device and image classification method and device |
CN113657421B (en) * | 2021-06-17 | 2024-05-28 | 中国科学院自动化研究所 | Convolutional neural network compression method and device, and image classification method and device |
CN114564932A (en) * | 2021-11-25 | 2022-05-31 | 阿里巴巴达摩院(杭州)科技有限公司 | Chapter alignment method, apparatus, computer device and medium |
Also Published As
Publication number | Publication date |
---|---|
CN109670178B (en) | 2019-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109670178B (en) | Sentence-level bilingual alignment method and device, computer readable storage medium | |
US11016966B2 (en) | Semantic analysis-based query result retrieval for natural language procedural queries | |
Wu et al. | Learning to extract coherent summary via deep reinforcement learning | |
CN109032375A (en) | Candidate text sort method, device, equipment and storage medium | |
CN110489523B (en) | Fine-grained emotion analysis method based on online shopping evaluation | |
CN108804677A (en) | In conjunction with the deep learning question classification method and system of multi-layer attention mechanism | |
CN106815252A (en) | A kind of searching method and equipment | |
CN104636466A (en) | Entity attribute extraction method and system oriented to open web page | |
CN105068997B (en) | The construction method and device of parallel corpora | |
CN104679728A (en) | Text similarity detection device | |
JP2005122533A (en) | Question-answering system and question-answering processing method | |
CN106569993A (en) | Method and device for mining hypernym-hyponym relation between domain-specific terms | |
CN109657204A (en) | Use the automatic matching font of asymmetric metric learning | |
CN105760359B (en) | Question processing system and method thereof | |
CN109697288B (en) | Instance alignment method based on deep learning | |
Rodríguez-Fernández et al. | Semantics-driven recognition of collocations using word embeddings | |
CN113010657B (en) | Answer processing method and answer recommendation method based on answer text | |
CN110489554B (en) | Attribute-level emotion classification method based on location-aware mutual attention network model | |
WO2022151594A1 (en) | Intelligent recommendation method and apparatus, and computer device | |
Rücklé et al. | Representation learning for answer selection with LSTM-based importance weighting | |
CN111026815B (en) | Entity pair specific relation extraction method based on user-assisted correction | |
CN110633467A (en) | Semantic relation extraction method based on improved feature fusion | |
CN109766547B (en) | Sentence similarity calculation method | |
Rahman et al. | NLP-based automatic answer script evaluation | |
CN112434134A (en) | Search model training method and device, terminal equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder |
Address after: 519031 office 1316, No. 1, lianao Road, Hengqin new area, Zhuhai, Guangdong Patentee after: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd. Address before: 519031 room 417, building 20, creative Valley, Hengqin New District, Zhuhai City, Guangdong Province Patentee before: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd. |
|
CP02 | Change in the address of a patent holder | ||
PP01 | Preservation of patent right |
Effective date of registration: 20240718 Granted publication date: 20191008 |
|
PP01 | Preservation of patent right |