CN108287824A - Semantic similarity calculation method and device - Google Patents
Semantic similarity calculation method and device Download PDFInfo
- Publication number
- CN108287824A CN108287824A CN201810188175.8A CN201810188175A CN108287824A CN 108287824 A CN108287824 A CN 108287824A CN 201810188175 A CN201810188175 A CN 201810188175A CN 108287824 A CN108287824 A CN 108287824A
- Authority
- CN
- China
- Prior art keywords
- sentence
- similarity
- tentatively
- word
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The present invention relates to a kind of semantic similarity calculation method and devices, wherein method includes:The first sentence of sentence centering and the second sentence are pre-processed respectively, the statistical nature between the first syntax of extraction, the second syntax and the first sentence and the second sentence;Respectively by the first sentence and the second sentence word and part of speech be converted to vector, obtain corresponding fisrt feature matrix and second characteristic matrix;Determine that corresponding first sentence tentatively indicates and the second sentence tentatively indicates according to preset first deep neural network model;The similarity between the first sentence and the second sentence is determined according to preset second deep neural network model;Determine whether the first sentence and the second sentence are similar according to the similarity between the first sentence and the second sentence.With this solution, the statistical nature in word feature, word order feature, phrase feature and sentence level has been merged, the similarity between sentence can be determined more accurately.
Description
Technical field
The present invention relates to semantics recognition technical field more particularly to a kind of semantic similarity calculation method and devices.
Background technology
Semantic Similarity Measurement mainly judges whether two sentence semantics are similar, for example judges " what animal the arctic has "
Whether " having which animal to live in the arctic " be similar.Present semantic similarity is mainly based upon literal syntactic feature, leads to
Feature selecting is crossed, by sentence expression at vector, cosine similarity then is calculated to two sentences, is more than setting similarity then phase
Seemingly, otherwise dissimilar.
Existing similarity calculation is primarily present problems with:
1) lack to the word order of sentence and portraying for semanteme;
2) a large amount of high accurately synonyms or alignment phrase resource are relied on.
Invention content
A kind of semantic similarity calculation method of offer of the embodiment of the present invention and device more accurately determine sentence to realize
Between similarity.
According to a first aspect of the embodiments of the present invention, a kind of semantic similarity calculation method is provided, including:
The first sentence of sentence centering and the second sentence are pre-processed respectively, the first sentence of extraction is first corresponding
Statistical nature between corresponding second syntax of method, the second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, it is special to obtain corresponding first
Levy matrix and second characteristic matrix;
It is determined and is corresponded to according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model
The first sentence tentatively indicate and the second sentence tentatively indicate;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature of the statistical nature
Preset second deep neural network model of vector sum determines the similarity between first sentence and second sentence;
First sentence and described second are determined according to the similarity between first sentence and second sentence
Whether sentence is similar.
In one embodiment, it is described respectively by first sentence and the second sentence word and part of speech be converted to
Amount, determines corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, is obtained
The corresponding first word feature matrix of first sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, is obtained
To the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix,
Splice the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.
In one embodiment, described according to the fisrt feature matrix, second characteristic matrix and preset first depth
Neural network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as first deep neural network model
Input, obtain corresponding first sentence tentatively indicate and the second sentence tentatively indicate.
In one embodiment, described tentatively indicated according to first sentence, the second sentence tentatively indicates, the statistics
The corresponding feature vector of feature and preset second deep neural network model determine first sentence and second sentence
Between similarity, including:
First sentence is tentatively indicated respectively tentatively to indicate to do with second sentence to subtract each other point by point and be multiplied point by point
Operation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spelled
It connects, obtains splicing result;
Using the splicing result as the input of second deep neural network model, first sentence is calculated
With the similarity of second sentence.
In one embodiment, described in the similarity according between first sentence and second sentence determines
Whether the first sentence and second sentence are similar, including:
When the similarity between first sentence and second sentence is more than default similarity, described first is determined
Sentence is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, determine
Second sentence and second sentence are dissimilar.
According to a second aspect of the embodiments of the present invention, a kind of Semantic Similarity Measurement device is provided, including:
Processor;
Memory for storing processor-executable instruction;
Wherein, the processor is configured as:
The first sentence of sentence centering and the second sentence are pre-processed respectively, the first sentence of extraction is first corresponding
Statistical nature between corresponding second syntax of method, the second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, it is special to obtain corresponding first
Levy matrix and second characteristic matrix;
It is determined and is corresponded to according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model
The first sentence tentatively indicate and the second sentence tentatively indicate;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature of the statistical nature
Preset second deep neural network model of vector sum determines the similarity between first sentence and second sentence;
First sentence and described second are determined according to the similarity between first sentence and second sentence
Whether sentence is similar.
In one embodiment, it is described respectively by first sentence and the second sentence word and part of speech be converted to
Amount, determines corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, is obtained
The corresponding first word feature matrix of first sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, is obtained
To the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix,
Splice the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.
In one embodiment, described according to the fisrt feature matrix, second characteristic matrix and preset first depth
Neural network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as first deep neural network model
Input, obtain corresponding first sentence tentatively indicate and the second sentence tentatively indicate.
In one embodiment, described tentatively indicated according to first sentence, the second sentence tentatively indicates, the statistics
The corresponding feature vector of feature and preset second deep neural network model determine first sentence and second sentence
Between similarity, including:
First sentence is tentatively indicated respectively tentatively to indicate to do with second sentence to subtract each other point by point and be multiplied point by point
Operation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
By statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix
Spliced, obtains splicing result;
Using the splicing result as the input of second deep neural network model, first sentence is calculated
With the similarity of second sentence.
In one embodiment, described in the similarity according between first sentence and second sentence determines
Whether the first sentence and second sentence are similar, including:
When the similarity between first sentence and second sentence is more than default similarity, described first is determined
Sentence is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, determine
Second sentence and second sentence are dissimilar.
It should be understood that above general description and following detailed description is only exemplary and explanatory, not
It can the limitation present invention.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
It obtains it is clear that understand through the implementation of the invention.The purpose of the present invention and other advantages can be by the explanations write
Specifically noted structure is realized and is obtained in book, claims and attached drawing.
Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Description of the drawings
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the present invention
Example, and be used to explain the principle of the present invention together with specification.
Fig. 1 is a kind of flow chart of semantic similarity calculation method shown according to an exemplary embodiment.
Fig. 2 is the flow of step S102 in a kind of semantic similarity calculation method shown according to an exemplary embodiment
Figure.
Fig. 3 is the flow chart of another semantic similarity calculation method shown according to an exemplary embodiment.
Fig. 4 is the flow of step S104 in a kind of semantic similarity calculation method shown according to an exemplary embodiment
Figure.
Fig. 5 is the flow of step S105 in a kind of semantic similarity calculation method shown according to an exemplary embodiment
Figure.
Specific implementation mode
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent and the consistent all embodiments of the present invention.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects being described in detail in claims, of the invention.
Fig. 1 is a kind of flow chart of semantic similarity calculation method shown according to an exemplary embodiment.The voice phase
It can be applied in terminal device or server like degree computational methods, which can be mobile phone, and computer, number is extensively
Broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, the equipment such as personal digital assistant.
As shown in Figure 1, the method comprising the steps of S101-S105:
In step S101, the first sentence of sentence centering and the second sentence are pre-processed respectively, extract first
System between corresponding first syntax of son, corresponding second syntax of the second sentence and first sentence and second sentence
Count feature;
Wherein, the gram statistics feature of syntax, that is, sentence includes part of speech feature, word order feature, and the similarity between part of speech is special
It levies, the matching degree feature between sentence, matching degree feature of word etc..
In step s 102, respectively by first sentence and the second sentence word and part of speech be converted to vector, obtain
To corresponding fisrt feature matrix and second characteristic matrix;
In step s 103, according to the fisrt feature matrix, second characteristic matrix and preset first depth nerve net
Network model determines that corresponding first sentence tentatively indicates and the second sentence tentatively indicates;
In step S104, tentatively indicated according to first sentence, the second sentence tentatively indicates, the statistical nature pair
Preset second deep neural network model of statistical nature vector sum answered determine first sentence and second sentence it
Between similarity;
In step S105, described first is determined according to the similarity between first sentence and second sentence
Whether sub and described second sentence is similar.
In this embodiment, according to the word of sentence, part of speech, statistical nature and the first depth nerve net between sentence pair
Network model determines the space length and COS distance between sentence, and then according to the space length and COS distance between sentence
The similarity between sentence is determined, in this way, the statistics merged in word feature, word order feature, phrase feature and sentence level is special
Sign, can be determined more accurately the similarity between sentence.
Fig. 2 is the flow of step S102 in a kind of semantic similarity calculation method shown according to an exemplary embodiment
Figure.
As shown in Fig. 2, in one embodiment, above-mentioned steps S102 includes step S201-S203:
Word in first sentence and second sentence is converted to word by step S201 respectively using word2vec
Vector obtains the corresponding first word feature matrix of the first sentence and the corresponding second word feature matrix of the second sentence;
Part of speech in first sentence and second sentence is converted to word by step S202 respectively using pos2vec
Property vector, obtain the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence;
Step S203 splices the first word feature matrix and the first part of speech feature matrix to obtain described first
Eigenmatrix splices the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.
In this embodiment, respectively by sentence word and part of speech be characterized as vector, and then obtain the corresponding spy of sentence
Matrix is levied, subsequently to determine the similarity between sentence according to eigenmatrix.
Fig. 3 is the flow chart of another semantic similarity calculation method shown according to an exemplary embodiment.
As shown in figure 3, in one embodiment, above-mentioned steps S103 includes step S301:
Step S301, respectively using the fisrt feature matrix and the second characteristic matrix as first depth nerve
The input of network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence.
In this embodiment, using fisrt feature matrix as the input of the first deep neural network model, first is obtained
Sub preliminary expression, using second characteristic matrix as the input of the first deep neural network model, obtains the second sentence and tentatively indicates,
Tentatively to indicate to determine the similarity between sentence subsequently according to sentence.
Fig. 4 is the flow of step S104 in a kind of semantic similarity calculation method shown according to an exemplary embodiment
Figure.
As shown in figure 4, in one embodiment, above-mentioned steps S104 includes step S401- steps S404:
Step S401, first sentence is tentatively indicated respectively and second sentence tentatively indicate to do it is point-by-point subtract each other and
Point-by-point multiplication operation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
If input is two sentences of A, B, their preliminary expression is denoted as RAAnd RB, then the geometric distance table between them
Be shown as dist (| RA-RB|), angular distance is expressed as angle (RA⊙RB)。
The statistical nature is encoded into vector by step S402, obtains corresponding statistical nature vector;
Step S403, by statistical nature vector, the geometric distance eigenmatrix and the angular distance feature square
Battle array is spliced, and splicing result is obtained;
Institute is calculated using the splicing result as the input of second deep neural network model in step S404
State the similarity of the first sentence and second sentence.
In this embodiment, by statistical nature vector, geometric distance eigenmatrix and the angular distance feature between sentence
Input of the matrix as the second deep neural network model, to sentence it is final expression convert, and then obtain two sentences it
Between similar probability value, i.e. similarity between sentence, in this way, having merged word feature, word order feature, phrase feature and Sentence-level
The similarity between sentence can be determined more accurately in statistical nature on not.
Fig. 5 is the flow of step S105 in a kind of semantic similarity calculation method shown according to an exemplary embodiment
Figure.
As shown in figure 5, in one embodiment, above-mentioned steps S105 includes step S501-S502:
Step S501, when the similarity between first sentence and second sentence is more than default similarity, really
Fixed first sentence is similar with second sentence;
Step S502, when the similarity between first sentence and second sentence is similar less than or equal to default
When spending, determine that second sentence and second sentence are dissimilar.
In this embodiment it is possible to default similarity is arranged, such as 80%, then the similarity between two sentences be more than
Determine that two sentences are similar when 80%, otherwise, it determines two sentence dissmilarities.
Following is apparatus of the present invention embodiment, can be used for executing the method for the present invention embodiment.
According to a second aspect of the embodiments of the present invention, a kind of Semantic Similarity Measurement device is provided, including:
Processor;
Memory for storing processor-executable instruction;
Wherein, the processor is configured as:
The first sentence of sentence centering and the second sentence are pre-processed respectively, the first sentence of extraction is first corresponding
Statistical nature between corresponding second syntax of method, the second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, it is special to obtain corresponding first
Levy matrix and second characteristic matrix;
It is determined and is corresponded to according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model
The first sentence tentatively indicate and the second sentence tentatively indicate;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature of the statistical nature
Preset second deep neural network model of vector sum determines the similarity between first sentence and second sentence;
First sentence and described second are determined according to the similarity between first sentence and second sentence
Whether sentence is similar.
In one embodiment, it is described respectively by first sentence and the second sentence word and part of speech be converted to
Amount, determines corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, is obtained
The corresponding first word feature matrix of first sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, is obtained
To the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix,
Splice the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.
In one embodiment, described according to the fisrt feature matrix, second characteristic matrix and preset first depth
Neural network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as first deep neural network model
Input, obtain corresponding first sentence tentatively indicate and the second sentence tentatively indicate.
In one embodiment, described tentatively indicated according to first sentence, the second sentence tentatively indicates, the statistics
The corresponding feature vector of feature and preset second deep neural network model determine first sentence and second sentence
Between similarity, including:
First sentence is tentatively indicated respectively tentatively to indicate to do with second sentence to subtract each other point by point and be multiplied point by point
Operation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spelled
It connects, obtains splicing result;
Using the splicing result as the input of second deep neural network model, first sentence is calculated
With the similarity of second sentence.
In one embodiment, described in the similarity according between first sentence and second sentence determines
Whether the first sentence and second sentence are similar, including:
When the similarity between first sentence and second sentence is more than default similarity, described first is determined
Sentence is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, determine
Second sentence and second sentence are dissimilar.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, the present invention can be used in one or more wherein include computer usable program code computer
The shape for the computer program product implemented in usable storage medium (including but not limited to magnetic disk storage and optical memory etc.)
Formula.
The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram
The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided
Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real
The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to
Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or
The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (10)
1. a kind of semantic similarity calculation method, which is characterized in that including:
The first sentence of sentence centering and the second sentence are pre-processed respectively, corresponding first syntax of the first sentence of extraction,
Statistical nature between corresponding second syntax of second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, obtain corresponding fisrt feature square
Battle array and second characteristic matrix;
Corresponding is determined according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model
One sentence tentatively indicates and the second sentence tentatively indicates;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature vector of the statistical nature
And preset second deep neural network model determines the similarity between first sentence and second sentence;
First sentence and second sentence are determined according to the similarity between first sentence and second sentence
It is whether similar.
2. semantic similarity calculation method according to claim 1, which is characterized in that described respectively by first sentence
With in the second sentence word and part of speech be converted to vector, determine corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, obtains first
The corresponding first word feature matrix of sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, obtains the
The corresponding first part of speech feature matrix of one sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix, by institute
It states the second word feature matrix and the second word eigenmatrix splices to obtain the second characteristic matrix.
3. semantic similarity calculation method according to claim 1, which is characterized in that described according to the fisrt feature square
Battle array, second characteristic matrix and preset first deep neural network model obtain the tentatively expression of corresponding first sentence and second
Sub preliminary expression, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as the defeated of first deep neural network model
Enter, obtains corresponding first sentence and tentatively indicate tentatively to indicate with the second sentence.
4. semantic similarity calculation method according to claim 1, which is characterized in that described according at the beginning of first sentence
Step indicates, the second sentence tentatively indicates, the corresponding feature vector of the statistical nature and preset second deep neural network mould
Type determines the similarity between first sentence and second sentence, including:
First sentence is tentatively indicated respectively and second sentence tentatively indicate to do it is point-by-point subtract each other with point-by-point multiplication operation,
Obtain corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spliced, obtained
To splicing result;
Using the splicing result as the input of second deep neural network model, first sentence and institute is calculated
State the similarity of the second sentence.
5. semantic similarity calculation method according to any one of claim 1 to 4, which is characterized in that described according to institute
It states the similarity between the first sentence and second sentence and determines whether first sentence and second sentence are similar, wrap
It includes:
When the similarity between first sentence and second sentence is more than default similarity, first sentence is determined
It is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, described in determination
Second sentence and second sentence are dissimilar.
6. a kind of Semantic Similarity Measurement device, which is characterized in that including:
Processor;
Memory for storing processor-executable instruction;
Wherein, the processor is configured as:
The first sentence of sentence centering and the second sentence are pre-processed respectively, corresponding first syntax of the first sentence of extraction,
Statistical nature between corresponding second syntax of second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, obtain corresponding fisrt feature square
Battle array and second characteristic matrix;
Corresponding is determined according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model
One sentence tentatively indicates and the second sentence tentatively indicates;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature vector of the statistical nature
And preset second deep neural network model determines the similarity between first sentence and second sentence;
First sentence and second sentence are determined according to the similarity between first sentence and second sentence
It is whether similar.
7. Semantic Similarity Measurement device according to claim 6, which is characterized in that described respectively by first sentence
With in the second sentence word and part of speech be converted to vector, determine corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, obtains first
The corresponding first word feature matrix of sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, obtains the
The corresponding first part of speech feature matrix of one sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix, by institute
It states the second word feature matrix and the second word eigenmatrix splices to obtain the second characteristic matrix.
8. Semantic Similarity Measurement device according to claim 6, which is characterized in that described according to the fisrt feature square
Battle array, second characteristic matrix and preset first deep neural network model obtain the tentatively expression of corresponding first sentence and second
Sub preliminary expression, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as the defeated of first deep neural network model
Enter, obtains corresponding first sentence and tentatively indicate tentatively to indicate with the second sentence.
9. Semantic Similarity Measurement device according to claim 6, which is characterized in that described according at the beginning of first sentence
Step indicates, the second sentence tentatively indicates, the corresponding feature vector of the statistical nature and preset second deep neural network mould
Type determines the similarity between first sentence and second sentence, including:
First sentence is tentatively indicated respectively and second sentence tentatively indicate to do it is point-by-point subtract each other with point-by-point multiplication operation,
Obtain corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spliced, obtained
To splicing result;
Using the splicing result as the input of second deep neural network model, first sentence and institute is calculated
State the similarity of the second sentence.
10. the Semantic Similarity Measurement device according to any one of claim 6 to 9, which is characterized in that described according to institute
It states the similarity between the first sentence and second sentence and determines whether first sentence and second sentence are similar, wrap
It includes:
When the similarity between first sentence and second sentence is more than default similarity, first sentence is determined
It is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, described in determination
Second sentence and second sentence are dissimilar.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810188175.8A CN108287824A (en) | 2018-03-07 | 2018-03-07 | Semantic similarity calculation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810188175.8A CN108287824A (en) | 2018-03-07 | 2018-03-07 | Semantic similarity calculation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108287824A true CN108287824A (en) | 2018-07-17 |
Family
ID=62833315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810188175.8A Pending CN108287824A (en) | 2018-03-07 | 2018-03-07 | Semantic similarity calculation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108287824A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101494A (en) * | 2018-08-10 | 2018-12-28 | 哈尔滨工业大学(威海) | A method of it is calculated for Chinese sentence semantic similarity, equipment and computer readable storage medium |
CN110646763A (en) * | 2019-10-10 | 2020-01-03 | 出门问问信息科技有限公司 | Sound source positioning method and device based on semantics and storage medium |
CN111144129A (en) * | 2019-12-26 | 2020-05-12 | 成都航天科工大数据研究院有限公司 | Semantic similarity obtaining method based on autoregression and self-coding |
CN111737988A (en) * | 2020-06-24 | 2020-10-02 | 深圳前海微众银行股份有限公司 | Method and device for recognizing repeated sentences |
CN112380830A (en) * | 2020-06-18 | 2021-02-19 | 达而观信息科技(上海)有限公司 | Method, system and computer readable storage medium for matching related sentences in different documents |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100228540A1 (en) * | 1999-11-12 | 2010-09-09 | Phoenix Solutions, Inc. | Methods and Systems for Query-Based Searching Using Spoken Input |
CN106445920A (en) * | 2016-09-29 | 2017-02-22 | 北京理工大学 | Sentence similarity calculation method based on sentence meaning structure characteristics |
CN107291699A (en) * | 2017-07-04 | 2017-10-24 | 湖南星汉数智科技有限公司 | A kind of sentence semantic similarity computational methods |
-
2018
- 2018-03-07 CN CN201810188175.8A patent/CN108287824A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100228540A1 (en) * | 1999-11-12 | 2010-09-09 | Phoenix Solutions, Inc. | Methods and Systems for Query-Based Searching Using Spoken Input |
CN106445920A (en) * | 2016-09-29 | 2017-02-22 | 北京理工大学 | Sentence similarity calculation method based on sentence meaning structure characteristics |
CN107291699A (en) * | 2017-07-04 | 2017-10-24 | 湖南星汉数智科技有限公司 | A kind of sentence semantic similarity computational methods |
Non-Patent Citations (4)
Title |
---|
DANIEL BÄR 等: "UKP:computing semantic textual similarity by combining multiple content similarity measures", 《FIRST JOINT CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS》 * |
KAI SHENG TAI 等: "Improved semantic representations from tree-structured long short-term memory networks", 《HTTPS://ARXIV.ORG/ABS/1503.00075》 * |
NHZSN: "一种基于Tree-LSTM的句子相似度计算方法", 《HTTP://WWW.DOC88.COM/P-9025669443117.HTML》 * |
QIAN CHEN 等: "Enhanced LSTM for Natural Language Inference", 《PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101494A (en) * | 2018-08-10 | 2018-12-28 | 哈尔滨工业大学(威海) | A method of it is calculated for Chinese sentence semantic similarity, equipment and computer readable storage medium |
CN110646763A (en) * | 2019-10-10 | 2020-01-03 | 出门问问信息科技有限公司 | Sound source positioning method and device based on semantics and storage medium |
CN111144129A (en) * | 2019-12-26 | 2020-05-12 | 成都航天科工大数据研究院有限公司 | Semantic similarity obtaining method based on autoregression and self-coding |
CN111144129B (en) * | 2019-12-26 | 2023-06-06 | 成都航天科工大数据研究院有限公司 | Semantic similarity acquisition method based on autoregressive and autoencoding |
CN112380830A (en) * | 2020-06-18 | 2021-02-19 | 达而观信息科技(上海)有限公司 | Method, system and computer readable storage medium for matching related sentences in different documents |
CN111737988A (en) * | 2020-06-24 | 2020-10-02 | 深圳前海微众银行股份有限公司 | Method and device for recognizing repeated sentences |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108287824A (en) | Semantic similarity calculation method and device | |
US11803711B2 (en) | Depthwise separable convolutions for neural machine translation | |
US11093813B2 (en) | Answer to question neural networks | |
CN105719649B (en) | Audio recognition method and device | |
CN109065054A (en) | Speech recognition error correction method, device, electronic equipment and readable storage medium storing program for executing | |
CN113051371B (en) | Chinese machine reading understanding method and device, electronic equipment and storage medium | |
CN107301866B (en) | Information input method | |
US20240028893A1 (en) | Generating neural network outputs using insertion commands | |
CN110704597B (en) | Dialogue system reliability verification method, model generation method and device | |
WO2019084558A1 (en) | Selecting answer spans from electronic documents using machine learning | |
CN111696521A (en) | Method for training speech clone model, readable storage medium and speech clone method | |
CN111508478B (en) | Speech recognition method and device | |
US11816443B2 (en) | Method, device, and storage medium for generating response | |
CN111402864A (en) | Voice processing method and electronic equipment | |
CN113887253A (en) | Method, apparatus, and medium for machine translation | |
CN110929532B (en) | Data processing method, device, equipment and storage medium | |
KR102621436B1 (en) | Voice synthesizing method, device, electronic equipment and storage medium | |
CN106599637A (en) | Method and device for inputting verification code into verification interface | |
CN109800286B (en) | Dialog generation method and device | |
CN116384412A (en) | Dialogue content generation method and device, computer readable storage medium and terminal | |
CN115965791A (en) | Image generation method and device and electronic equipment | |
US20210019477A1 (en) | Generating neural network outputs using insertion operations | |
CN111966803A (en) | Dialogue simulation method, dialogue simulation device, storage medium and electronic equipment | |
CN111128234A (en) | Spliced voice recognition detection method, device and equipment | |
US20220188163A1 (en) | Method for processing data, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180717 |