CN108287824A - Semantic similarity calculation method and device - Google Patents

Semantic similarity calculation method and device Download PDF

Info

Publication number
CN108287824A
CN108287824A CN201810188175.8A CN201810188175A CN108287824A CN 108287824 A CN108287824 A CN 108287824A CN 201810188175 A CN201810188175 A CN 201810188175A CN 108287824 A CN108287824 A CN 108287824A
Authority
CN
China
Prior art keywords
sentence
similarity
tentatively
word
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810188175.8A
Other languages
Chinese (zh)
Inventor
李勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunzhisheng Information Technology Co Ltd
Original Assignee
Beijing Yunzhisheng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunzhisheng Information Technology Co Ltd filed Critical Beijing Yunzhisheng Information Technology Co Ltd
Priority to CN201810188175.8A priority Critical patent/CN108287824A/en
Publication of CN108287824A publication Critical patent/CN108287824A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present invention relates to a kind of semantic similarity calculation method and devices, wherein method includes:The first sentence of sentence centering and the second sentence are pre-processed respectively, the statistical nature between the first syntax of extraction, the second syntax and the first sentence and the second sentence;Respectively by the first sentence and the second sentence word and part of speech be converted to vector, obtain corresponding fisrt feature matrix and second characteristic matrix;Determine that corresponding first sentence tentatively indicates and the second sentence tentatively indicates according to preset first deep neural network model;The similarity between the first sentence and the second sentence is determined according to preset second deep neural network model;Determine whether the first sentence and the second sentence are similar according to the similarity between the first sentence and the second sentence.With this solution, the statistical nature in word feature, word order feature, phrase feature and sentence level has been merged, the similarity between sentence can be determined more accurately.

Description

Semantic similarity calculation method and device
Technical field
The present invention relates to semantics recognition technical field more particularly to a kind of semantic similarity calculation method and devices.
Background technology
Semantic Similarity Measurement mainly judges whether two sentence semantics are similar, for example judges " what animal the arctic has " Whether " having which animal to live in the arctic " be similar.Present semantic similarity is mainly based upon literal syntactic feature, leads to Feature selecting is crossed, by sentence expression at vector, cosine similarity then is calculated to two sentences, is more than setting similarity then phase Seemingly, otherwise dissimilar.
Existing similarity calculation is primarily present problems with:
1) lack to the word order of sentence and portraying for semanteme;
2) a large amount of high accurately synonyms or alignment phrase resource are relied on.
Invention content
A kind of semantic similarity calculation method of offer of the embodiment of the present invention and device more accurately determine sentence to realize Between similarity.
According to a first aspect of the embodiments of the present invention, a kind of semantic similarity calculation method is provided, including:
The first sentence of sentence centering and the second sentence are pre-processed respectively, the first sentence of extraction is first corresponding Statistical nature between corresponding second syntax of method, the second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, it is special to obtain corresponding first Levy matrix and second characteristic matrix;
It is determined and is corresponded to according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model The first sentence tentatively indicate and the second sentence tentatively indicate;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature of the statistical nature Preset second deep neural network model of vector sum determines the similarity between first sentence and second sentence;
First sentence and described second are determined according to the similarity between first sentence and second sentence Whether sentence is similar.
In one embodiment, it is described respectively by first sentence and the second sentence word and part of speech be converted to Amount, determines corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, is obtained The corresponding first word feature matrix of first sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, is obtained To the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix, Splice the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.
In one embodiment, described according to the fisrt feature matrix, second characteristic matrix and preset first depth Neural network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as first deep neural network model Input, obtain corresponding first sentence tentatively indicate and the second sentence tentatively indicate.
In one embodiment, described tentatively indicated according to first sentence, the second sentence tentatively indicates, the statistics The corresponding feature vector of feature and preset second deep neural network model determine first sentence and second sentence Between similarity, including:
First sentence is tentatively indicated respectively tentatively to indicate to do with second sentence to subtract each other point by point and be multiplied point by point Operation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spelled It connects, obtains splicing result;
Using the splicing result as the input of second deep neural network model, first sentence is calculated With the similarity of second sentence.
In one embodiment, described in the similarity according between first sentence and second sentence determines Whether the first sentence and second sentence are similar, including:
When the similarity between first sentence and second sentence is more than default similarity, described first is determined Sentence is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, determine Second sentence and second sentence are dissimilar.
According to a second aspect of the embodiments of the present invention, a kind of Semantic Similarity Measurement device is provided, including:
Processor;
Memory for storing processor-executable instruction;
Wherein, the processor is configured as:
The first sentence of sentence centering and the second sentence are pre-processed respectively, the first sentence of extraction is first corresponding Statistical nature between corresponding second syntax of method, the second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, it is special to obtain corresponding first Levy matrix and second characteristic matrix;
It is determined and is corresponded to according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model The first sentence tentatively indicate and the second sentence tentatively indicate;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature of the statistical nature Preset second deep neural network model of vector sum determines the similarity between first sentence and second sentence;
First sentence and described second are determined according to the similarity between first sentence and second sentence Whether sentence is similar.
In one embodiment, it is described respectively by first sentence and the second sentence word and part of speech be converted to Amount, determines corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, is obtained The corresponding first word feature matrix of first sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, is obtained To the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix, Splice the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.
In one embodiment, described according to the fisrt feature matrix, second characteristic matrix and preset first depth Neural network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as first deep neural network model Input, obtain corresponding first sentence tentatively indicate and the second sentence tentatively indicate.
In one embodiment, described tentatively indicated according to first sentence, the second sentence tentatively indicates, the statistics The corresponding feature vector of feature and preset second deep neural network model determine first sentence and second sentence Between similarity, including:
First sentence is tentatively indicated respectively tentatively to indicate to do with second sentence to subtract each other point by point and be multiplied point by point Operation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
By statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix
Spliced, obtains splicing result;
Using the splicing result as the input of second deep neural network model, first sentence is calculated With the similarity of second sentence.
In one embodiment, described in the similarity according between first sentence and second sentence determines Whether the first sentence and second sentence are similar, including:
When the similarity between first sentence and second sentence is more than default similarity, described first is determined Sentence is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, determine Second sentence and second sentence are dissimilar.
It should be understood that above general description and following detailed description is only exemplary and explanatory, not It can the limitation present invention.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The purpose of the present invention and other advantages can be by the explanations write Specifically noted structure is realized and is obtained in book, claims and attached drawing.
Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Description of the drawings
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the present invention Example, and be used to explain the principle of the present invention together with specification.
Fig. 1 is a kind of flow chart of semantic similarity calculation method shown according to an exemplary embodiment.
Fig. 2 is the flow of step S102 in a kind of semantic similarity calculation method shown according to an exemplary embodiment Figure.
Fig. 3 is the flow chart of another semantic similarity calculation method shown according to an exemplary embodiment.
Fig. 4 is the flow of step S104 in a kind of semantic similarity calculation method shown according to an exemplary embodiment Figure.
Fig. 5 is the flow of step S105 in a kind of semantic similarity calculation method shown according to an exemplary embodiment Figure.
Specific implementation mode
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent and the consistent all embodiments of the present invention.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects being described in detail in claims, of the invention.
Fig. 1 is a kind of flow chart of semantic similarity calculation method shown according to an exemplary embodiment.The voice phase It can be applied in terminal device or server like degree computational methods, which can be mobile phone, and computer, number is extensively Broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, the equipment such as personal digital assistant. As shown in Figure 1, the method comprising the steps of S101-S105:
In step S101, the first sentence of sentence centering and the second sentence are pre-processed respectively, extract first System between corresponding first syntax of son, corresponding second syntax of the second sentence and first sentence and second sentence Count feature;
Wherein, the gram statistics feature of syntax, that is, sentence includes part of speech feature, word order feature, and the similarity between part of speech is special It levies, the matching degree feature between sentence, matching degree feature of word etc..
In step s 102, respectively by first sentence and the second sentence word and part of speech be converted to vector, obtain To corresponding fisrt feature matrix and second characteristic matrix;
In step s 103, according to the fisrt feature matrix, second characteristic matrix and preset first depth nerve net Network model determines that corresponding first sentence tentatively indicates and the second sentence tentatively indicates;
In step S104, tentatively indicated according to first sentence, the second sentence tentatively indicates, the statistical nature pair Preset second deep neural network model of statistical nature vector sum answered determine first sentence and second sentence it Between similarity;
In step S105, described first is determined according to the similarity between first sentence and second sentence Whether sub and described second sentence is similar.
In this embodiment, according to the word of sentence, part of speech, statistical nature and the first depth nerve net between sentence pair Network model determines the space length and COS distance between sentence, and then according to the space length and COS distance between sentence The similarity between sentence is determined, in this way, the statistics merged in word feature, word order feature, phrase feature and sentence level is special Sign, can be determined more accurately the similarity between sentence.
Fig. 2 is the flow of step S102 in a kind of semantic similarity calculation method shown according to an exemplary embodiment Figure.
As shown in Fig. 2, in one embodiment, above-mentioned steps S102 includes step S201-S203:
Word in first sentence and second sentence is converted to word by step S201 respectively using word2vec Vector obtains the corresponding first word feature matrix of the first sentence and the corresponding second word feature matrix of the second sentence;
Part of speech in first sentence and second sentence is converted to word by step S202 respectively using pos2vec Property vector, obtain the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence;
Step S203 splices the first word feature matrix and the first part of speech feature matrix to obtain described first Eigenmatrix splices the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.
In this embodiment, respectively by sentence word and part of speech be characterized as vector, and then obtain the corresponding spy of sentence Matrix is levied, subsequently to determine the similarity between sentence according to eigenmatrix.
Fig. 3 is the flow chart of another semantic similarity calculation method shown according to an exemplary embodiment.
As shown in figure 3, in one embodiment, above-mentioned steps S103 includes step S301:
Step S301, respectively using the fisrt feature matrix and the second characteristic matrix as first depth nerve The input of network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence.
In this embodiment, using fisrt feature matrix as the input of the first deep neural network model, first is obtained Sub preliminary expression, using second characteristic matrix as the input of the first deep neural network model, obtains the second sentence and tentatively indicates, Tentatively to indicate to determine the similarity between sentence subsequently according to sentence.
Fig. 4 is the flow of step S104 in a kind of semantic similarity calculation method shown according to an exemplary embodiment Figure.
As shown in figure 4, in one embodiment, above-mentioned steps S104 includes step S401- steps S404:
Step S401, first sentence is tentatively indicated respectively and second sentence tentatively indicate to do it is point-by-point subtract each other and Point-by-point multiplication operation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
If input is two sentences of A, B, their preliminary expression is denoted as RAAnd RB, then the geometric distance table between them Be shown as dist (| RA-RB|), angular distance is expressed as angle (RA⊙RB)。
The statistical nature is encoded into vector by step S402, obtains corresponding statistical nature vector;
Step S403, by statistical nature vector, the geometric distance eigenmatrix and the angular distance feature square Battle array is spliced, and splicing result is obtained;
Institute is calculated using the splicing result as the input of second deep neural network model in step S404 State the similarity of the first sentence and second sentence.
In this embodiment, by statistical nature vector, geometric distance eigenmatrix and the angular distance feature between sentence Input of the matrix as the second deep neural network model, to sentence it is final expression convert, and then obtain two sentences it Between similar probability value, i.e. similarity between sentence, in this way, having merged word feature, word order feature, phrase feature and Sentence-level The similarity between sentence can be determined more accurately in statistical nature on not.
Fig. 5 is the flow of step S105 in a kind of semantic similarity calculation method shown according to an exemplary embodiment Figure.
As shown in figure 5, in one embodiment, above-mentioned steps S105 includes step S501-S502:
Step S501, when the similarity between first sentence and second sentence is more than default similarity, really Fixed first sentence is similar with second sentence;
Step S502, when the similarity between first sentence and second sentence is similar less than or equal to default When spending, determine that second sentence and second sentence are dissimilar.
In this embodiment it is possible to default similarity is arranged, such as 80%, then the similarity between two sentences be more than Determine that two sentences are similar when 80%, otherwise, it determines two sentence dissmilarities.
Following is apparatus of the present invention embodiment, can be used for executing the method for the present invention embodiment.
According to a second aspect of the embodiments of the present invention, a kind of Semantic Similarity Measurement device is provided, including:
Processor;
Memory for storing processor-executable instruction;
Wherein, the processor is configured as:
The first sentence of sentence centering and the second sentence are pre-processed respectively, the first sentence of extraction is first corresponding Statistical nature between corresponding second syntax of method, the second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, it is special to obtain corresponding first Levy matrix and second characteristic matrix;
It is determined and is corresponded to according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model The first sentence tentatively indicate and the second sentence tentatively indicate;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature of the statistical nature Preset second deep neural network model of vector sum determines the similarity between first sentence and second sentence;
First sentence and described second are determined according to the similarity between first sentence and second sentence Whether sentence is similar.
In one embodiment, it is described respectively by first sentence and the second sentence word and part of speech be converted to Amount, determines corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, is obtained The corresponding first word feature matrix of first sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, is obtained To the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix, Splice the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.
In one embodiment, described according to the fisrt feature matrix, second characteristic matrix and preset first depth Neural network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as first deep neural network model Input, obtain corresponding first sentence tentatively indicate and the second sentence tentatively indicate.
In one embodiment, described tentatively indicated according to first sentence, the second sentence tentatively indicates, the statistics The corresponding feature vector of feature and preset second deep neural network model determine first sentence and second sentence Between similarity, including:
First sentence is tentatively indicated respectively tentatively to indicate to do with second sentence to subtract each other point by point and be multiplied point by point Operation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spelled It connects, obtains splicing result;
Using the splicing result as the input of second deep neural network model, first sentence is calculated With the similarity of second sentence.
In one embodiment, described in the similarity according between first sentence and second sentence determines Whether the first sentence and second sentence are similar, including:
When the similarity between first sentence and second sentence is more than default similarity, described first is determined Sentence is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, determine Second sentence and second sentence are dissimilar.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, the present invention can be used in one or more wherein include computer usable program code computer The shape for the computer program product implemented in usable storage medium (including but not limited to magnetic disk storage and optical memory etc.) Formula.
The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of semantic similarity calculation method, which is characterized in that including:
The first sentence of sentence centering and the second sentence are pre-processed respectively, corresponding first syntax of the first sentence of extraction, Statistical nature between corresponding second syntax of second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, obtain corresponding fisrt feature square Battle array and second characteristic matrix;
Corresponding is determined according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model One sentence tentatively indicates and the second sentence tentatively indicates;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature vector of the statistical nature And preset second deep neural network model determines the similarity between first sentence and second sentence;
First sentence and second sentence are determined according to the similarity between first sentence and second sentence It is whether similar.
2. semantic similarity calculation method according to claim 1, which is characterized in that described respectively by first sentence With in the second sentence word and part of speech be converted to vector, determine corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, obtains first The corresponding first word feature matrix of sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, obtains the The corresponding first part of speech feature matrix of one sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix, by institute It states the second word feature matrix and the second word eigenmatrix splices to obtain the second characteristic matrix.
3. semantic similarity calculation method according to claim 1, which is characterized in that described according to the fisrt feature square Battle array, second characteristic matrix and preset first deep neural network model obtain the tentatively expression of corresponding first sentence and second Sub preliminary expression, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as the defeated of first deep neural network model Enter, obtains corresponding first sentence and tentatively indicate tentatively to indicate with the second sentence.
4. semantic similarity calculation method according to claim 1, which is characterized in that described according at the beginning of first sentence Step indicates, the second sentence tentatively indicates, the corresponding feature vector of the statistical nature and preset second deep neural network mould Type determines the similarity between first sentence and second sentence, including:
First sentence is tentatively indicated respectively and second sentence tentatively indicate to do it is point-by-point subtract each other with point-by-point multiplication operation, Obtain corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spliced, obtained To splicing result;
Using the splicing result as the input of second deep neural network model, first sentence and institute is calculated State the similarity of the second sentence.
5. semantic similarity calculation method according to any one of claim 1 to 4, which is characterized in that described according to institute It states the similarity between the first sentence and second sentence and determines whether first sentence and second sentence are similar, wrap It includes:
When the similarity between first sentence and second sentence is more than default similarity, first sentence is determined It is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, described in determination Second sentence and second sentence are dissimilar.
6. a kind of Semantic Similarity Measurement device, which is characterized in that including:
Processor;
Memory for storing processor-executable instruction;
Wherein, the processor is configured as:
The first sentence of sentence centering and the second sentence are pre-processed respectively, corresponding first syntax of the first sentence of extraction, Statistical nature between corresponding second syntax of second sentence and first sentence and second sentence;
Respectively by first sentence and the second sentence word and part of speech be converted to vector, obtain corresponding fisrt feature square Battle array and second characteristic matrix;
Corresponding is determined according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network model One sentence tentatively indicates and the second sentence tentatively indicates;
It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature vector of the statistical nature And preset second deep neural network model determines the similarity between first sentence and second sentence;
First sentence and second sentence are determined according to the similarity between first sentence and second sentence It is whether similar.
7. Semantic Similarity Measurement device according to claim 6, which is characterized in that described respectively by first sentence With in the second sentence word and part of speech be converted to vector, determine corresponding fisrt feature matrix and second characteristic matrix, including:
The word in first sentence and second sentence is converted to term vector respectively using word2vec, obtains first The corresponding first word feature matrix of sentence and the corresponding second word feature matrix of the second sentence;
The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, obtains the The corresponding first part of speech feature matrix of one sentence and the corresponding second part of speech feature matrix of the second sentence;
Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix, by institute It states the second word feature matrix and the second word eigenmatrix splices to obtain the second characteristic matrix.
8. Semantic Similarity Measurement device according to claim 6, which is characterized in that described according to the fisrt feature square Battle array, second characteristic matrix and preset first deep neural network model obtain the tentatively expression of corresponding first sentence and second Sub preliminary expression, including:
Respectively using the fisrt feature matrix and the second characteristic matrix as the defeated of first deep neural network model Enter, obtains corresponding first sentence and tentatively indicate tentatively to indicate with the second sentence.
9. Semantic Similarity Measurement device according to claim 6, which is characterized in that described according at the beginning of first sentence Step indicates, the second sentence tentatively indicates, the corresponding feature vector of the statistical nature and preset second deep neural network mould Type determines the similarity between first sentence and second sentence, including:
First sentence is tentatively indicated respectively and second sentence tentatively indicate to do it is point-by-point subtract each other with point-by-point multiplication operation, Obtain corresponding geometric distance eigenmatrix and angular distance eigenmatrix;
The statistical nature is encoded into vector, obtains corresponding statistical nature vector;
Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spliced, obtained To splicing result;
Using the splicing result as the input of second deep neural network model, first sentence and institute is calculated State the similarity of the second sentence.
10. the Semantic Similarity Measurement device according to any one of claim 6 to 9, which is characterized in that described according to institute It states the similarity between the first sentence and second sentence and determines whether first sentence and second sentence are similar, wrap It includes:
When the similarity between first sentence and second sentence is more than default similarity, first sentence is determined It is similar with second sentence;
When the similarity between first sentence and second sentence is less than or equal to default similarity, described in determination Second sentence and second sentence are dissimilar.
CN201810188175.8A 2018-03-07 2018-03-07 Semantic similarity calculation method and device Pending CN108287824A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810188175.8A CN108287824A (en) 2018-03-07 2018-03-07 Semantic similarity calculation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810188175.8A CN108287824A (en) 2018-03-07 2018-03-07 Semantic similarity calculation method and device

Publications (1)

Publication Number Publication Date
CN108287824A true CN108287824A (en) 2018-07-17

Family

ID=62833315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810188175.8A Pending CN108287824A (en) 2018-03-07 2018-03-07 Semantic similarity calculation method and device

Country Status (1)

Country Link
CN (1) CN108287824A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101494A (en) * 2018-08-10 2018-12-28 哈尔滨工业大学(威海) A method of it is calculated for Chinese sentence semantic similarity, equipment and computer readable storage medium
CN110646763A (en) * 2019-10-10 2020-01-03 出门问问信息科技有限公司 Sound source positioning method and device based on semantics and storage medium
CN111144129A (en) * 2019-12-26 2020-05-12 成都航天科工大数据研究院有限公司 Semantic similarity obtaining method based on autoregression and self-coding
CN111737988A (en) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 Method and device for recognizing repeated sentences
CN112380830A (en) * 2020-06-18 2021-02-19 达而观信息科技(上海)有限公司 Method, system and computer readable storage medium for matching related sentences in different documents

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100228540A1 (en) * 1999-11-12 2010-09-09 Phoenix Solutions, Inc. Methods and Systems for Query-Based Searching Using Spoken Input
CN106445920A (en) * 2016-09-29 2017-02-22 北京理工大学 Sentence similarity calculation method based on sentence meaning structure characteristics
CN107291699A (en) * 2017-07-04 2017-10-24 湖南星汉数智科技有限公司 A kind of sentence semantic similarity computational methods

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100228540A1 (en) * 1999-11-12 2010-09-09 Phoenix Solutions, Inc. Methods and Systems for Query-Based Searching Using Spoken Input
CN106445920A (en) * 2016-09-29 2017-02-22 北京理工大学 Sentence similarity calculation method based on sentence meaning structure characteristics
CN107291699A (en) * 2017-07-04 2017-10-24 湖南星汉数智科技有限公司 A kind of sentence semantic similarity computational methods

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DANIEL BÄR 等: "UKP:computing semantic textual similarity by combining multiple content similarity measures", 《FIRST JOINT CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS》 *
KAI SHENG TAI 等: "Improved semantic representations from tree-structured long short-term memory networks", 《HTTPS://ARXIV.ORG/ABS/1503.00075》 *
NHZSN: "一种基于Tree-LSTM的句子相似度计算方法", 《HTTP://WWW.DOC88.COM/P-9025669443117.HTML》 *
QIAN CHEN 等: "Enhanced LSTM for Natural Language Inference", 《PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101494A (en) * 2018-08-10 2018-12-28 哈尔滨工业大学(威海) A method of it is calculated for Chinese sentence semantic similarity, equipment and computer readable storage medium
CN110646763A (en) * 2019-10-10 2020-01-03 出门问问信息科技有限公司 Sound source positioning method and device based on semantics and storage medium
CN111144129A (en) * 2019-12-26 2020-05-12 成都航天科工大数据研究院有限公司 Semantic similarity obtaining method based on autoregression and self-coding
CN111144129B (en) * 2019-12-26 2023-06-06 成都航天科工大数据研究院有限公司 Semantic similarity acquisition method based on autoregressive and autoencoding
CN112380830A (en) * 2020-06-18 2021-02-19 达而观信息科技(上海)有限公司 Method, system and computer readable storage medium for matching related sentences in different documents
CN111737988A (en) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 Method and device for recognizing repeated sentences

Similar Documents

Publication Publication Date Title
CN108287824A (en) Semantic similarity calculation method and device
US11803711B2 (en) Depthwise separable convolutions for neural machine translation
US11093813B2 (en) Answer to question neural networks
CN105719649B (en) Audio recognition method and device
CN109065054A (en) Speech recognition error correction method, device, electronic equipment and readable storage medium storing program for executing
CN113051371B (en) Chinese machine reading understanding method and device, electronic equipment and storage medium
CN107301866B (en) Information input method
US20240028893A1 (en) Generating neural network outputs using insertion commands
CN110704597B (en) Dialogue system reliability verification method, model generation method and device
WO2019084558A1 (en) Selecting answer spans from electronic documents using machine learning
CN111696521A (en) Method for training speech clone model, readable storage medium and speech clone method
CN111508478B (en) Speech recognition method and device
US11816443B2 (en) Method, device, and storage medium for generating response
CN111402864A (en) Voice processing method and electronic equipment
CN113887253A (en) Method, apparatus, and medium for machine translation
CN110929532B (en) Data processing method, device, equipment and storage medium
KR102621436B1 (en) Voice synthesizing method, device, electronic equipment and storage medium
CN106599637A (en) Method and device for inputting verification code into verification interface
CN109800286B (en) Dialog generation method and device
CN116384412A (en) Dialogue content generation method and device, computer readable storage medium and terminal
CN115965791A (en) Image generation method and device and electronic equipment
US20210019477A1 (en) Generating neural network outputs using insertion operations
CN111966803A (en) Dialogue simulation method, dialogue simulation device, storage medium and electronic equipment
CN111128234A (en) Spliced voice recognition detection method, device and equipment
US20220188163A1 (en) Method for processing data, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180717