CN110674256A

CN110674256A - Detection method and system for relevancy of comment and reply of OTA hotel

Info

Publication number: CN110674256A
Application number: CN201910909573.9A
Authority: CN
Inventors: 江小林; 罗超; 胡泓
Original assignee: Ctrip Computer Technology Shanghai Co Ltd
Current assignee: Ctrip Computer Technology Shanghai Co Ltd
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2020-01-10
Anticipated expiration: 2039-09-25
Also published as: CN110674256B

Abstract

The invention discloses a method and a system for detecting the relevancy of comment and reply of an OTA hotel, wherein the detection method comprises the following steps: obtaining comment and reply; respectively converting the comment and the reply into a comment vector sequence and a reply vector sequence; coding the comment vector sequence to obtain a coding comment vector at each moment; coding the reply vector sequence to obtain a coded reply vector at each moment; matching the coding comment vector at each moment with the coding reply vector at each moment to obtain a plurality of matching vectors; capturing the relation between the matched vectors in the vector sequence and aggregating the matched vectors into a splicing vector; inputting the splicing vector into a full connection layer to obtain a target vector; and calculating the relevance probability of the comment and the reply according to the target vector. The invention can effectively, quickly and accurately calculate whether the reply aiming at the comment is matched with the comment content, thereby not only helping a hotel improve the existing product according to the effective comment, but also reducing the labor cost.

Description

Detection method and system for relevancy of comment and reply of OTA hotel

Technical Field

The invention relates to the field of services of OTA (on-line travel agent) hotels, in particular to a method and a system for detecting the relevancy of comment and reply of the OTA hotels.

Background

For a service type enterprise, consultation or feedback of a user is important for the enterprise, many products have a commenting function, and the commenting, particularly bad commenting, of the user on the products can fully reflect the problems of the products, so that the merchant needs to reply the comments properly. When the bad comment client (except the malicious bad comment) receives a proper response, the client feels that the merchant pays attention to the opinion, and many people change the negative attitude of the merchant. It is therefore necessary for existing product reviews to be refined by detecting which responses are questions and which responses are targeted.

At present, most of methods for the relevance of reply and comment of comment are manually judged by setting a keyword rule, and some methods filter irrelevant question sentences and answer sentences by setting a threshold value.

Disclosure of Invention

The invention aims to overcome the defect that the matching between the comment of a user and the response of a merchant is inaccurate in the prior art, and provides a detection method and a detection system capable of efficiently and accurately detecting the correlation degree between the comment and the response of an OTA hotel.

The invention solves the technical problems through the following technical scheme:

the invention provides a detection method for the relevancy of comment and reply of an OTA hotel, which comprises the following steps:

obtaining comments and replies on the OTA hotel;

converting the comment and the reply into a comment vector sequence and a reply vector sequence respectively;

coding the semantic relation among the vectors in the comment vector sequence to obtain a coded comment vector at each moment;

coding the semantic relation among the vectors in the reply vector sequence to obtain a coded reply vector at each moment;

matching the coding comment vector of each moment with the coding reply vector of each moment to obtain a plurality of matching vectors, wherein the matching vectors form a matching vector sequence;

capturing the relation between the matched vectors in the vector sequence and aggregating the matched vector sequence into a splicing vector according to the relation;

inputting the splicing vectors into a full-connection layer to obtain target vectors, wherein the dimensionality of the target vectors is the same as the number of preset categories;

and calculating the relevance probability of the comment and the reply according to the target vector.

And respectively coding semantic relations among all words in the point evaluation vector sequence and the reply vector sequence by using a neural network model.

Wherein a relevance probability of the comment to the reply is calculated by softmax (flexible maximum transfer value function).

In the invention, through vectorizing the comment and the reply of the OTA hotel, analyzing the semantic relationship between the quantitative comment and the reply, analyzing and comparing each word inside the comment and inside the reply with the whole sentence through machine learning, whether the reply aiming at the comment is matched with the comment content can be effectively, quickly and accurately calculated, so that the hotel can be helped to improve the existing products according to the effective comment, the labor cost is also reduced, and the service quality of a merchant is improved under the condition of improving the identification precision and the recall rate, thereby helping the merchant bring the profit.

Preferably, before the step of inputting the stitching vector into the fully-connected layer to obtain the preset dimension vector, the method further includes:

calculating the text similarity of each reply and other replies to obtain a similarity sequence;

obtaining a similarity average value according to the similarity sequence;

splicing the splicing vector with the similarity average value as one dimension in the splicing vector;

the step of inputting the stitching vector into a full connection layer to obtain a target vector comprises:

inputting the spliced vector spliced with the similarity average value into a full-connection layer to obtain a target vector;

and/or the presence of a gas in the gas,

the step of matching the coding comment vector of each time with the coding reply vector of each time to obtain a plurality of matching vectors comprises:

and obtaining a plurality of matching vectors according to the cosine similarity of the weighted coding comment vector of each dimension of each moment and the weighted coding reply vector of the corresponding dimension of each moment.

The text similarity can be calculated in modes of editing distance and the like;

wherein the calculation formula of the cosine similarity is m_k＝cosine(w_k○v_1,w_k○v₂) Wherein v is_1,、v₂For the vectors to be compared, k denotes a certain dimension of the vector, w_kIt is a trainable parameter that can be propagated back through the neural network.

In the invention, the average value of the similarity sequence can be obtained by comparing the similarity of the text currently replied to other replies of a specific hotel, and the average value is used as a dimension in the splicing vector, so that the correlation probability calculated by the splicing vector is more accurate and more accords with the actual requirement.

Preferably, the first and second liquid crystal films are made of a polymer,

sequentially matching the coding comment vector at the current moment and the coding reply vector at the last moment from the first comment moment to obtain a first matching vector at each moment;

sequentially matching the coding reply vector of the current moment and the coding comment vector of the last moment from the first reply moment to obtain a second matching vector of each moment;

the plurality of matching vectors comprises the first matching vector and the second matching vector;

and/or the presence of a gas in the gas,

sequentially calculating a coding comment vector at the current moment and a coding reply vector at each moment from the first comment moment to obtain the cosine similarity of each reply moment;

calculating a weighted coding reply vector according to the cosine similarity of each reply moment of the current commenting moment;

matching the coding comment vector of each comment moment with the corresponding weighted coding reply vector from the first comment moment to obtain a third matching vector of each moment;

sequentially calculating the coding reply vector of the current moment and the coding comment vector of each moment from the first reply moment to obtain the cosine similarity of each comment moment;

calculating a weighted coding comment vector according to the cosine similarity of each comment moment of the current reply moment;

matching the coding comment vector of each reply moment with the corresponding weighted coding comment vector from the first reply moment to obtain a fourth matching vector of each moment;

the plurality of matching vectors includes the third matching vector and the fourth matching vector.

The evaluation method comprises the steps of obtaining a response time vector of a comment, wherein all time vectors of the response are weighted and averaged through the cosine similarity of each response time of each comment time, wherein the cosine similarity is used for calculating weight, namely the relevance between a certain word in the comment and response content, and the vectors of the response time are weighted through the cosine similarity, so that the relation between the comment and the response can be obtained.

In the invention, from the first moment, the vector of the current moment of the comment and the vector of the last moment of the reply are compared and are completely matched, and the vector of the current moment of the reply and the vector of the last moment of the comment are weighted by the cosine similarity of the comment or the reply, so that the real relation between the comment and the reply can be obtained, the defect of neglecting the detail correlation in the prior art is overcome, and the feedback of the more real correlation between the comment and the reply can be further obtained.

Preferably, in the step of encoding the semantic relationship between the vectors in the comment vector sequence to obtain the encoded comment vector at each moment,

the coding comment vector comprises a forward coding comment vector and a reverse coding comment vector;

in the step of encoding the semantic relationship between the vectors in the reply vector sequence to obtain the encoded reply vector at each time instant,

the coded reply vector comprises a forward coded reply vector and a reverse coded reply vector;

capturing the relation between the matched vectors in the vector sequence and aggregating the vector sequence into a splicing vector according to the relation, wherein the step comprises the following steps:

inputting the sequence of matching vectors into a bidirectional LSTM (a machine learning model) model;

obtaining the relation among the plurality of matching vectors at each moment according to the bidirectional LSTM model, and intercepting the comment forward relation vector, comment reverse relation vector, return forward relation vector and return reverse relation vector at the last moment in the LSTM model;

and aggregating the forward relation vector, the comment reverse relation vector, the return forward relation vector and the return reverse relation vector into the spliced vector.

In the invention, the inaccuracy of only obtaining a unidirectional vector is avoided by obtaining the forward coding comment vector and the backward coding comment vector, the complete semantics of the whole speech can be obtained by inputting the matching vector sequence into the bidirectional LSTM model and intercepting the specific four vectors, the aggregation efficiency is improved, and the calculation of the subsequent correlation degree is more accurate through the bidirectional model.

Preferably, the step of converting the comment and the reply into a comment vector sequence and a reply vector sequence, respectively, comprises:

preprocessing the comment and the reply;

inputting the comment and the reply into a word segmentation tool respectively to obtain a first word segmentation comment sequence and a first word segmentation reply sequence;

respectively adding preset professional vocabularies under the current scene to the first participle commenting sequence and the first participle reply sequence to form a second participle commenting sequence and a second participle reply sequence;

inputting the second participle comment sequence and the second participle reply sequence into a word vector model respectively to obtain a comment vector sequence and a reply vector sequence;

the step of pre-treating comprises: at least one of filtering special characters, filtering pure numbers, filtering statements not containing Chinese characters, filtering invalid statements, and standardizing statements;

and/or the presence of a gas in the gas,

the step of calculating the relevance probability of the comment and the reply according to the preset dimension vector further comprises the following steps:

and judging whether the relevance probability is greater than the preset probability, if so, mismatching the comment with the reply.

The word segmentation tool is an open source word segmentation tool and comprises a hand (a word segmentation tool).

In the process of word segmentation, some preset professional vocabularies in the current scene may be added, for example: in the hotel scene of OTA industry, the professional vocabularies corresponding to the scene, such as pre-authorization, credit, withholding money, cash-returning ticket, big bed room, account arrival, two-in-one, three-in-one, four-in-one, five-in-one, six-in-one, seven-in-one, eight-in-one, nine-in-one, ten-in-one, shop-free, house-up, land-occupied price, apartment, receiving and sending machine, and the like, are added during word segmentation processing.

Wherein the word vector model comprises word2vec, glove.

The preprocessing step comprises the steps of filtering out special characters such as expressions and the like, filtering out sentences not containing Chinese characters, summarizing partial sentences which are invalid in chatting, and calculating the similarity through editing the distance to carry out standardized sentence processes such as filtering, full-angle turning to half-angle turning, traditional turning to simplified turning, capital and small capital and capital writing conversion and the like.

According to the invention, the accuracy of vector sequence conversion can be improved by processing the comment and reply, the accuracy of the word segmentation processing step can be improved by adding the preset professional vocabulary, and the influence on the accuracy of subsequent relevance judgment due to objective reasons can be avoided by the preprocessing step.

In the invention, the predicted relevance probability is compared with the preset probability to judge which responses are not asked for the point evaluation response, so that the merchant can be helped to improve, and the loss of potential customers is further avoided.

The invention also provides a detection system for the relevancy of the comment and the reply of the OTA hotel, which comprises the following components: the system comprises an information acquisition module, a conversion module, a comment coding module, a reply coding module, a matching module, a first splicing module, a target vector acquisition module and a probability calculation module;

the information acquisition module is used for acquiring comments and replies of the OTA hotel;

the conversion module is used for converting the comment and the reply into a comment vector sequence and a reply vector sequence respectively;

the comment coding module is used for coding semantic relations among vectors in the comment vector sequence to obtain a coding comment vector at each moment;

the reply coding module is used for coding the semantic relation among the vectors in the reply vector sequence to obtain a coded reply vector at each moment;

the matching module is used for matching the coding comment vector at each moment with the coding reply vector at each moment to obtain a plurality of matching vectors, and the matching vectors form a matching vector sequence;

the first splicing module is used for capturing the relation between the matching vectors in the vector sequence and aggregating the matching vector sequence into a splicing vector according to the relation;

the target vector acquisition module is used for inputting the splicing vector to a full-connection layer to obtain a target vector, and the dimensionality of the target vector is the same as the number of preset categories;

the probability calculation module is used for calculating the relevance probability of the comment and the reply according to the target vector.

The comment coding module and the reply coding module respectively code semantic relations among all words in the comment vector sequence and the reply vector sequence by using a neural network model.

Wherein the probability calculation module calculates a relevance probability of the comment and the reply by softmax.

In the invention, the OTA hotel commenting and replying are vectorized through the conversion module, the semantic relation between commenting and replying in quantitative direction is analyzed through the commenting coding module and the replying coding module, and each word and the whole sentence between commenting and replying are analyzed and compared through machine learning by the matching module and the first splicing module, so that whether the replying aiming at commenting is matched with commenting contents can be effectively, quickly and accurately calculated, the hotel can be helped to improve the existing products according to effective comments, the labor cost is also reduced, and the service quality of a merchant is improved under the condition of improving the identification precision and the recall rate, and the merchant is helped to bring income.

Preferably, the detection system further comprises: the text similarity calculation module, the average value obtaining module and the second splicing module;

the text similarity calculation module is used for calculating the text similarity of each reply and other replies to obtain a similarity sequence;

the average value obtaining module is used for obtaining an average value of the similarity according to the similarity sequence;

the second splicing module is used for splicing the similarity average value serving as one dimension of the splicing vector with the splicing vector;

the target vector acquisition module is further used for inputting the spliced vector spliced with the similarity average value to a full-connection layer to obtain a target vector;

and/or the presence of a gas in the gas,

the matching module is further used for obtaining a plurality of matching vectors according to the cosine similarity between the weighted coding comment vector of each dimension of each moment and the weighted coding reply vector of the corresponding dimension of each moment.

The text similarity calculation module can calculate the text similarity through modes such as editing distance and the like.

In the invention, the text similarity calculation module is used for comparing the text similarity of the current reply to other replies of a specific hotel, so that the average value of the similarity sequence can be obtained through the average value acquisition module, and the average value is used as one dimension in the splicing vector through the second splicing module, thereby further ensuring that the correlation probability calculated through the splicing vector is more accurate and more accords with the actual requirement.

Preferably, the matching module comprises a first comment matching unit and a first reply matching unit;

the first comment matching unit is used for sequentially matching the coding comment vector at the current moment and the coding reply vector at the last moment from the first comment moment to obtain a first matching vector at each moment;

the first reply matching unit is used for sequentially matching the coding reply vector at the current moment and the coding comment vector at the last moment from the first reply moment to obtain a second matching vector at each moment;

and/or the presence of a gas in the gas,

the matching module comprises a first matching unit and a second matching unit;

the first matching unit is used for sequentially matching the coding comment vector at the current moment and the coding reply vector at the last moment from the first comment moment to obtain a first matching vector at each moment;

the second matching unit is used for sequentially matching the coding reply vector at the current moment and the coding comment vector at the last moment from the first reply moment to obtain a second matching vector at each moment;

and/or the presence of a gas in the gas,

the matching module comprises a reply cosine calculating unit, a weighted reply calculating unit, a third matching unit, a comment cosine calculating unit, a weighted comment calculating unit and a fourth matching unit;

the reply cosine calculation unit is used for sequentially calculating the coding comment vector at the current moment and the coding reply vector at each moment from the first comment moment to obtain the cosine similarity of each reply moment;

the weighted reply calculation unit is used for calculating a weighted coding reply vector according to the cosine similarity of each reply moment of the current comment moment;

the third matching unit is used for matching the coding comment vector of each comment moment with the corresponding weighted coding reply vector from the first comment moment so as to obtain a third matching vector of each moment;

the comment cosine calculation unit is used for sequentially calculating the coding reply vector at the current moment and the coding comment vector at each moment from the first reply moment to obtain the cosine similarity of each comment moment;

the weighted comment calculation unit is used for calculating a weighted coding comment vector according to the cosine similarity of each comment moment of the current reply moment;

the fourth matching unit is used for matching the coding comment vector of each reply moment with the corresponding weighted coding comment vector from the first reply moment so as to obtain a fourth matching vector of each moment;

In the invention, from the first moment, the first matching unit and the second matching unit are used for carrying out full matching on the current moment vector of the comment and the last moment vector of the reply and comparing the current moment vector of the reply and the last moment vector of the comment, and the third matching unit or the fourth matching unit is used for weighting the comment or the cosine similarity of the reply on the vector in the reply or the comment, so that the real relation between the comment and the reply can be obtained, the defect of neglecting the detail correlation in the prior art is overcome, and the feedback of the more real comment and reply correlation can be further obtained.

Preferably, the coding comment vector comprises a forward coding comment vector and a backward coding comment vector;

the first splicing module comprises: an input unit, an interception unit and an aggregation unit;

the input unit is used for inputting the matching vector sequence into a bidirectional LSTM model;

the intercepting unit is used for acquiring the relationship among the plurality of matching vectors at each moment according to the bidirectional LSTM model, and intercepting the comment forward relationship vector, the comment reverse relationship vector, the reply forward relationship vector and the reply reverse relationship vector at the last moment in the LSTM model;

the aggregation unit is used for aggregating the forward relation vector, the comment reverse relation vector, the reply forward relation vector and the reply reverse relation vector into the splicing vector.

In the invention, the inaccuracy of only obtaining a unidirectional vector is avoided by obtaining the forward coding comment vector and the reverse coding comment vector, the matching vector sequence is input into the bidirectional LSTM model through the input unit, and the specific four vectors are intercepted through the intercepting unit, so that the complete semantics of the whole section of speech can be obtained, the aggregation efficiency of the aggregation unit is improved, and the calculation of the subsequent correlation degree is more accurate through the bidirectional model.

Preferably, the conversion module comprises a preprocessing unit, a word segmentation processing unit, a vocabulary adding unit and a vector sequence obtaining unit;

the preprocessing unit is used for preprocessing the comment and the reply;

the word segmentation processing unit is used for respectively inputting the comment and the reply into a word segmentation tool so as to obtain a first word segmentation comment sequence and a first word segmentation reply sequence;

the vocabulary adding unit is used for respectively adding preset professional vocabularies in the current scene to the first participle commenting sequence and the first participle reply sequence to form a second participle commenting sequence and a second participle reply sequence;

the vector sequence acquisition unit is used for respectively inputting the second participle comment sequence and the second participle reply sequence into a word vector model to obtain a comment vector sequence and a reply vector sequence;

the preprocessing comprises at least one of filtering special characters, filtering pure numbers, filtering sentences which do not contain Chinese characters, filtering invalid sentences and standardizing sentences;

and/or the presence of a gas in the gas,

the detection system further comprises a judging module for judging whether the relevancy probability is greater than the preset probability, if so, the comment is not matched with the reply.

In the process of word segmentation, the vocabulary adding unit may add some preset professional vocabularies in a current scene, for example: in the hotel scene of OTA industry, professional vocabularies such as pre-authorization, credit, withholding, cash-back ticket, big bed room, account arrival, two-in-one, three-in-one, four-in-one, five-in-one, six-in-one, seven-in-one, eight-in-one, nine-in-one, ten-in-one, shop-in-no, room-expansion-free, land-occupied price, apartment, receiving and sending machine and the like corresponding to the scene are added during word segmentation processing.

Wherein the word vector model comprises word2vec, glove.

The preprocessing unit is used for filtering out special characters such as expressions and the like, filtering out sentences not containing Chinese characters and summarizing partial sentences which are invalid in chatting, and preprocessing by means of calculating similarity through editing distance to perform standardized sentences such as filtering, full angle turning to half angle, traditional turning to simplified form, capital and small capital and capital amount turning.

According to the invention, the accuracy of vector sequence conversion can be improved by processing the comment and reply by the preprocessing unit, the accuracy of word segmentation processing steps can be improved by adding the preset professional vocabulary by the vocabulary adding unit, and the influence on the accuracy of subsequent relevancy judgment due to objective reasons can be avoided by the preprocessing process in the preprocessing unit.

In the invention, the judgment module can judge which responses are not asked for the point evaluation response by comparing the predicted relevance probability with the preset probability, thereby helping merchants to improve and further avoiding the loss of potential customers.

The positive progress effects of the invention are as follows: according to the method, through vectorization of the comment and the reply of the OTA hotel, analysis of the semantic relationship between the quantitative comment and the reply, machine learning is used for analyzing and comparing each word inside the comment and inside the reply with the whole sentence, so that whether the reply aiming at the comment is matched with the comment content or not can be effectively, quickly and accurately calculated, the hotel can be helped to improve the existing products according to the effective comment, the labor cost is reduced, and the service quality of a merchant is improved under the condition of improving the identification precision and the recall rate, so that the merchant is helped to bring benefits.

Drawings

Fig. 1 is a flowchart of a method for detecting the correlation between the comment and the response of the OTA hotel in embodiment 1 of the present invention.

Fig. 2 is a detailed flowchart of step 102 in embodiment 2.

Fig. 3 is a detailed flowchart of step 104 in example 2.

FIG. 4 is a flowchart showing the detailed procedure of step 105 in example 2.

FIG. 5 is a schematic view of the principle of the detection method in example 2.

Fig. 6 is a module diagram of a system for detecting the correlation between the comment and the response of the OTA hotel in embodiment 3 of the present invention.

Fig. 7 is a block diagram of a conversion module in embodiment 4.

Fig. 8 is a block diagram of a matching block in embodiment 4.

Fig. 9 is a block diagram of a first splicing module in embodiment 4.

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

Example 1

The embodiment provides a method for detecting the relevance between the comment and the reply of the OTA hotel, and as shown in fig. 1, the method comprises the following steps:

step 101, obtaining comment and reply of the OTA hotel;

102, converting the comment and the reply into a comment vector sequence and a reply vector sequence respectively;

103, coding semantic relations among vectors in the comment vector sequence to obtain a coded comment vector at each moment; coding the semantic relation among the vectors in the reply vector sequence to obtain a coded reply vector at each moment;

step 104, matching the coding comment vector of each moment with the coding reply vector of each moment to obtain a plurality of matching vectors, wherein the matching vectors form a matching vector sequence;

105, capturing the relation between the matched vectors in the vector sequence and aggregating the matched vector sequence into a splicing vector according to the relation;

step 106, inputting the splicing vectors into a full-connection layer to obtain target vectors, wherein the dimensionality of the target vectors is the same as the number of preset categories;

and 107, calculating the relevance probability of the comment and the reply according to the target vector.

In step 106, the dimensions of the target vector are the same as the number of preset categories.

In the embodiment, through commenting and replying vectorization to the OTA hotel, through analyzing the semantic relation between commenting and replying to the quantification, come between commenting and replying through machine learning, comment inside, reply each word and whole sentence of inside and analyze and compare, thereby can be effective, fast and accurately calculate whether reply and comment content phase-match to commenting going on to commenting, not only can help the hotel to improve current product according to effective comment, also reduced the cost of labor, especially promoted the quality of service of trade company and help the trade company to bring the profit under the condition that improves recognition accuracy and recall ratio.

Example 2

This embodiment is a further improvement on the basis of embodiment 1, specifically, as shown in fig. 2, in this embodiment, step 102 includes:

step 201, preprocessing the comment and the reply;

step 202, inputting the comment and the reply into a word segmentation tool respectively to obtain a first word segmentation comment sequence and a first word segmentation reply sequence;

step 203, adding preset professional vocabularies in the current scene to the first participle commenting sequence and the first participle reply sequence respectively to form a second participle commenting sequence and a second participle reply sequence;

and step 204, respectively inputting the second participle commenting sequence and the second participle reply sequence into a word vector model to obtain a commenting vector sequence and a reply vector sequence.

In step 201, special characters such as expressions are filtered, sentences not containing Chinese characters are filtered, partial sentences which are invalid are summarized, and the similarity is calculated by editing the distance to perform preprocessing by means of standardized sentences such as filtering, full angle turning to half angle, traditional turning to simplified form, capital and small capital amount conversion and the like.

In step 202, a first segmentation comment sequence and a first segmentation reply sequence are obtained through a segmentation tool comprising hand (a segmentation processing tool).

In step 203, some preset specialized vocabularies in the current scene may be added, for example: in the hotel scene of OTA industry, professional vocabularies such as pre-authorization, credit, withholding, cash-back ticket, big bed room, account arrival, two-in-one, three-in-one, four-in-one, five-in-one, six-in-one, seven-in-one, eight-in-one, nine-in-one, ten-in-one, shop-in-no, room-expansion-free, land-occupied price, apartment, receiving and sending machine and the like corresponding to the scene are added during word segmentation processing.

Wherein, in step 204, the word vector model comprises word2vec (a word vector model), glove (a word vector model).

In the embodiment, the accuracy of vector sequence conversion can be improved by processing the comment and reply, the accuracy of the word segmentation processing step can be improved by adding the preset professional vocabulary, and the influence on the accuracy of subsequent relevance judgment due to objective reasons is avoided by the preprocessing step.

In this embodiment, by comparing the predicted relevance probability with the preset probability, it can be determined which responses are not asked for the point assessment response, so that the merchant can be helped to improve, and the loss of potential customers is further avoided.

In this embodiment, through step 204, a comment vector sequence composed of word vectors in each sentence comment and a reply vector sequence composed of word vectors in each sentence reply can be obtained respectively.

In this embodiment, in order to obtain a more accurate semantic relationship between each vector and the whole sentence between the comment vector sequence and the reply vector sequence, in step 103, the coding comment vector includes a forward coding comment vector and a reverse coding comment vector, and the coding reply vector includes a forward coding reply vector and a reverse coding reply vector.

In this embodiment, in order to perform more proper matching on each of the encoded comment vectors and the semantics of the encoded reply vector and to perform more proper matching on each of the encoded reply vectors and the semantics of the encoded comment vector, step 104 obtains a plurality of matching vectors according to the cosine similarity between the weighted encoded comment vector of each dimension of each time and the weighted encoded reply vector of the corresponding dimension of each time.

Wherein the calculation formula of the cosine similarity is m_k＝cosine(w_k○v_1,w_k○v₂) Wherein v is_1,、v₂For the vectors to be compared, k denotes a certain dimension of the vector, w_kIs a trainable parameter that can be passed through the neural networkThe reverse propagation of the collaterals.

As shown in fig. 3, in this embodiment, the step 104 may specifically include the following steps:

step 1041, from the first criticizing moment, sequentially matching the coding criticizing vector at the current moment with the coding reply vector at the last moment to obtain a first matching vector at each moment;

step 1042, starting from the first recovery moment, sequentially matching the coding recovery vector of the current moment with the coding comment vector of the last moment to obtain a second matching vector of each moment;

step 1043, calculating the coding criticizing vector of the current moment and the coding reply vector of each moment in sequence from the first criticizing moment to obtain the cosine similarity of each reply moment;

step 1044 of calculating a weighted coding reply vector according to the cosine similarity of each reply moment of the current commenting moment;

step 1045, from the first comment time, matching the coding comment vector of each comment time with the corresponding weighted coding reply vector to obtain a third matching vector of each time;

step 1046, starting from the first recovery moment, sequentially calculating the coding recovery vector of the current moment and the coding comment vector of each moment to obtain the cosine similarity of each comment moment;

step 1047, calculating a weighted coding comment vector according to the cosine similarity of each comment moment of the current reply moment;

step 1048, from the first recovery moment, matching the coding criticizing vector of each recovery moment with the corresponding weighted coding criticizing vector to obtain a fourth matching vector of each moment;

the plurality of matching vectors includes the first matching vector, a second matching vector, the third matching vector, and the fourth matching vector.

Wherein, the steps 1041-1042 and 1043-1048 can be performed simultaneously.

The matching method between vectors is to perform matching by the above cosine similarity formula, that is, after the whole process of

step

1041 and 1048, a matching vector sequence composed of multidimensional cosine similarities at each moment is obtained.

In

steps

1044 and 1045, all time vectors of the review are weighted and averaged by the cosine similarity of each review time, where the cosine similarity is used to calculate the weight, i.e. the correlation between a word in the review and the review content, and the cosine similarity is used to weight the vector of the review time, so as to obtain the relationship between the review and the review, and in the

same principle steps

1046 and 1047, the cosine similarity of each review time is used to weight and average all time vectors of the review, so as to obtain the relationship between the review and the review.

In the embodiment, from the first moment, the vectors of the current moment of the comment and the last moment of the reply are fully matched, and the vectors of the current moment of the reply and the last moment of the comment are compared, and the vectors of the reply or the comment are weighted according to the cosine similarity of the comment or the reply, so that the real relation between the comment and the reply can be obtained, the defect of neglecting the detail correlation in the prior art is overcome, and the feedback of the more real correlation between the comment and the reply can be further obtained.

In this embodiment, after obtaining a matched vector sequence, step 105 is executed, as shown in fig. 4, where step 105 specifically includes:

step 1051, inputting the matching vector sequence into a bidirectional LSTM model;

step 1052, obtaining the relationship among the multiple matching vectors at each moment according to the bidirectional LSTM model, and intercepting the comment forward relationship vector, comment reverse relationship vector, reply forward relationship vector and reply reverse relationship vector at the last moment in the LSTM model;

and 1053, aggregating the forward relation vector, the comment reverse relation vector, the reply forward relation vector and the reply reverse relation vector into the spliced vector.

In the embodiment, the inaccuracy of only obtaining a unidirectional vector is avoided by obtaining the forward coding comment vector and the reverse coding comment vector, the matching vector sequence is input into the bidirectional LSTM model, and the specific four vectors are intercepted, so that the complete semantics of the whole speech can be obtained, the aggregation efficiency is improved, and the calculation of the subsequent correlation degree is more accurate through the bidirectional model.

In addition, the embodiment further comprises the following steps:

and judging whether the relevancy probability is greater than the preset probability, if so, mismatching the comment with the reply, and if not, matching the comment with the reply.

For a better understanding of the present embodiment, the principle of the present embodiment will be briefly described below.

As shown in fig. 5, in this embodiment, the comment is converted into a word, then the word is converted into a word vector 301, similarly, the reply is converted into a word vector 311, then the comment word vector sequence and the reply word vector sequence composed of the word vectors 301 are input into the 302LSTM model respectively and encoded, so as to obtain the overall relationship between each word vector and the whole sentence, wherein both the forward relationship and the backward relationship between each word vector are included, then the encoded comment word vector and the encoded reply word vector are matched by the matching layer, so as to obtain the correlation between the word vector and the reply at each time in the comment and the correlation between the word vector and the comment at each time in the reply, and then the matched vectors containing the correlation information are spliced and input into the bidirectional LSTM model and aggregated into a vector with a fixed length, and in the model, a vector 304 at the last moment before comment forward, a vector 305 at the last moment after comment backward, a vector 314 at the last moment before comment forward and a vector 315 at the last moment after comment backward are intercepted and spliced into a spliced vector, so that the four vectors are intercepted because the four vectors contain all information of the whole sentence, the relation of the whole sentence can be reflected, and the calculation efficiency is improved. And then, after the similarity value calculated by the comment reply similarity is spliced with the splicing vector, the splice vector is sent to a full connection layer and a softMax layer to obtain the final similarity probability, and further the relation between the comment and the reply is judged.

The present embodiment will be further described with reference to a specific example.

If the comment of a user is "towel dirty" for a certain hotel check-in experience, and the reply of the comment is "sanitary and clean needed", after the comment and the reply of the comment are obtained in step 101, the comment and the reply are preprocessed, for example, special characters such as expressions in characters are removed, and the like, and then the comment and the reply are divided into words by a word dividing tool in step 202, for example, the comment that the towel is dirty can be divided into three words of "towel, very, dirty" and a sequence of the combination of the three words is formed, and the reply content "sanitary and clean needed" can be divided into three words of "sanitary and clean needed" and a sequence of the combination of the three words is formed. Then, a preset professional vocabulary in the current scene can be added to the relevant comment or reply, for example, in an apartment in the scene, the word "public rental house" can be added before the three words "towel, very, dirty". Then, the word sequence to be evaluated and the word sequence to be returned are respectively input into the word vector model, so that the evaluation vector sequence and the return vector sequence can be obtained, for example, in the embodiment, the three words of 'towel, very important word and dirty' are respectively formed into a_k、b_k、c_kThe evaluation vector sequence formed by the three vectors is a_kb_kc_kFor the same reason, the recovery vector sequence is A_kB_kC_kWhere k represents different dimensions.

Then, using LSTM model to point separatelyEvaluation vector sequence a_kb_kc_kAnd a recovery vector sequence A_kB_kC_kThe method comprises the steps of respectively coding to obtain the relation between each word in each sentence and the whole sentence, regarding a comment sentence as a sequence of words formed in sequence, wherein each word is expressed by word embedding, an intermediate expression is arranged at a corresponding position, then obtaining the intermediate expression of each word, the intermediate expression represents the semantics from the beginning of the sentence to the position, the intermediate expression of the word is composed of the word embedding of the current word and the intermediate expression of the previous word, finally, taking the intermediate expression of the word at the end of the sentence as the vector expression of the whole dialogue, respectively carrying out forward and backward operations, and fusing the forward and backward vectors of the same word to obtain the vector expression of a plurality of moments of the same sentence. Similarly, a vectorized representation of multiple time instances is also obtained for the reply by the bi-directional LSTM, e.g., at b_kAt the moment, the resulting forward vector is a_kb_kThe resulting backward vector is b_kc_kAccording to the method, the coding comment vector at each moment and the coding reply vector at each moment can be obtained respectively.

Next, the encoding comment vector and the encoding reply vector obtained in the previous step are matched, in this embodiment, there are two matching methods, one is the full matching method described in the step 1041-_kAt the moment, respectively combining b by a full matching method_kCoding criticizing vectors of time of day, i.e. b_kc_kWith the vector of the last moment of the reply, i.e. A_kB_kC_kSince the present embodiment requires forward and backward operations, respectively, b is commented on_kThere is also a forward coded vector a at time instant_kb_kCorrespondingly, there are also forward coded vectors at the last moment of reply, so there are essentially 4 comparison values; by a full-matching method and a focused attention matching method, the coding comment vector at the current moment and the coding reply vector at each moment are sequentially calculated to obtain the coding comment vector at each reply momentCosine similarity, e.g. criticizing from a_kTime of day, backward code vector a_kb_kc_kBackward encoding with reply, i.e.: a is_kb_kc_kAre respectively reacted with A_kB_kC_k、B_kC_kAnd C_k(there are essentially 4 comparison values, but the embodiment is simplified from one direction) the cosine line similarity obtained is 0.1, 0.2 and 0.3, respectively, and the cosine similarity is used to calculate the weight by which the weighted average of the reply vector M can be obtained_kAnd then the evaluation vector a at the moment_kb_kc_kWith the weighted average of the reply vector M_kAnd (6) matching. In this embodiment, the matching method in the vector is based on the formula m_k＝cosine(w_k○v_1,w_k○v₂) Performing a matching, wherein v₁、v₂For the vectors to be compared, k denotes a certain dimension of the vector, w_kIt is a trainable parameter that can be propagated back through the neural network. E.g. when comparing a_kb_kc_kAnd M_kDuring the comparison, the two vectors are respectively subjected to cosine similarity weighted comparison aiming at each dimension in the vectors, and if the cosine similarity value of the first dimension comparison is 0.1, the second dimension is 0.2 and the third dimension is 0.3, then the a is the value of the cosine similarity value of the first dimension comparison, and the second dimension is 0.2 and the third dimension is 0.3_kAt the moment, a three-dimensional cosine similarity vector is formed, similarly, multi-dimensional vectors are formed at other moments, all the compared vectors are spliced to obtain a matching vector sequence representing the relationship between the comment and the reply based on the two-vector cosine similarity, then the matching vector sequence representing the relationship between the comment and the reply in the reaction comment and reply is placed into a two-way LSTM model, the last vector in the forward direction of the overall semantic relationship of the comment in the model, the last vector in the reverse direction of the overall semantic relationship of the comment and the last vector in the forward direction of the overall semantic relationship of the reply are intercepted, and the four vectors are spliced together to form a spliced vector.

In addition, in this embodiment, the similarity between the current reply and the text of other replies may also be obtained by editing the distance similarity, and then the similarity is used as one dimension of the vector to be spliced with the vector obtained in the previous step, for example, if a 400-dimensional vector is obtained in the previous step, a 401-dimensional vector may be obtained through the similarity calculation in this step.

Then, the 401-dimensional vector of the previous step is put into the full connection layer to obtain the same dimensional vector as the number of categories (two categories, one is a question of answer and the second is not a question) in the present embodiment, and then the probability of the two categories is obtained after performing softmax calculation, for example, in the present embodiment, the probability of the question of answer is 0.6, and the probability of the question of answer is 0.4, then the answer in the present embodiment is the answer of the question of answer.

Example 3

The embodiment provides a system for detecting the relevancy of the comment and the reply of the OTA hotel, as shown in fig. 6, the system comprises an information acquisition module 401, a conversion module 402, a comment encoding module 403, a reply encoding module 404, a matching module 405, a first splicing module 406, a target vector acquisition module 407 and a probability calculation module 408;

the information acquisition module 401 is configured to acquire comments and replies to the OTA hotel;

the conversion module 402 is configured to convert the comment and the reply into a comment vector sequence and a reply vector sequence, respectively;

the comment encoding module 403 is configured to encode semantic relationships between vectors in the comment vector sequence to obtain an encoded comment vector at each time;

the reply encoding module 404 is configured to encode semantic relationships between vectors in the reply vector sequence to obtain an encoded reply vector at each time;

the matching module 405 is configured to match the coding comment vector at each time with the coding reply vector at each time to obtain a plurality of matching vectors, where the matching vectors form a matching vector sequence;

the first stitching module 406 is configured to capture a relationship between matching vectors in the vector sequence and aggregate the matching vector sequence into a stitching vector according to the relationship;

the target vector acquisition module 407 is configured to input the stitching vector to a full connection layer to obtain a target vector, where dimensions of the target vector are the same as the number of preset categories;

the probability calculation module 407 is configured to calculate a relevance probability between the comment and the reply according to the target vector.

The comment encoding module 403 and the reply encoding module 404 use a neural network model to encode semantic relationships between all words in the comment vector sequence and the reply vector sequence, respectively.

Wherein the probability calculation module 408 calculates a relevance probability of the comment to the reply by softmax.

In the embodiment, the OTA hotel commenting and replying are vectorized through the conversion module, semantic relations between commenting and replying in quantitative directions are analyzed through the commenting coding module and the replying coding module, each word and the whole sentence between commenting and replying are analyzed and compared through machine learning through the matching module and the first splicing module, and therefore whether the replying performed aiming at commenting is matched with commenting contents or not can be effectively, quickly and accurately calculated, the hotel can be helped to improve the existing products according to effective comments, labor cost is reduced, the service quality of a merchant is improved under the condition that identification precision and recall rate are improved, and the merchant is helped to bring benefits.

Example 4

The present embodiment is a further improvement on the basis of embodiment 3, specifically, as shown in fig. 7, in the present embodiment, the conversion module 402 includes: the system comprises a preprocessing unit 4021, a word segmentation processing unit 4022, a vocabulary adding unit 4023 and a vector sequence acquiring unit 4024;

the preprocessing unit 4021 is used for preprocessing the comment and the reply;

the word segmentation processing unit 4022 is configured to input the comment and the reply to a word segmentation tool respectively to obtain a first word segmentation comment sequence and a first word segmentation reply sequence;

the vocabulary adding unit 4023 is configured to add a preset professional vocabulary in the current scene to the first participle comment sequence and the first participle reply sequence respectively to form a second participle comment sequence and a second participle reply sequence;

the vector sequence obtaining unit 4024 is configured to input the second participle comment sequence and the second participle reply sequence to a word vector model respectively to obtain a comment vector sequence and a reply vector sequence;

the preprocessing unit 4021 is configured to perform preprocessing by at least one of filtering special characters, filtering pure numbers, filtering sentences that do not include chinese characters, filtering invalid sentences, and normalizing sentences;

the word segmentation tool is an open source word segmentation tool and comprises a hand.

In the process of word segmentation, the vocabulary adding unit 4023 may add some preset professional vocabularies in the current scene, for example: in the hotel scene of OTA industry, professional vocabularies such as pre-authorization, credit, withholding, cash-back ticket, big bed room, account arrival, two-in-one, three-in-one, four-in-one, five-in-one, six-in-one, seven-in-one, eight-in-one, nine-in-one, ten-in-one, shop-in-no, room-expansion-free, land-occupied price, apartment, receiving and sending machine and the like corresponding to the scene are added during word segmentation processing.

Wherein the word vector model comprises word2vec, glove.

The preprocessing unit 4021 is configured to perform preprocessing by filtering out special characters such as expressions, filtering out sentences not containing chinese characters, and summarizing partial sentences which are chatty invalid, and calculating similarity by editing distance to perform filtering, full-angle turning to half-angle turning, traditional turning to simplified form, capital and small capital amount turning, and other standardized sentences.

In this embodiment, the accuracy of vector sequence conversion after the comment and reply are processed by the preprocessing unit can be improved, the accuracy of the word segmentation processing step can be improved by adding the preset professional vocabulary by the vocabulary adding unit, and the influence on the accuracy of subsequent relevancy judgment due to objective reasons is avoided by the preprocessing process in the preprocessing unit.

In this embodiment, the vector sequence obtaining unit 4024 may obtain a comment vector sequence consisting of word vectors in each sentence comment and a reply vector sequence consisting of word vectors in each sentence reply.

In this embodiment, in order to obtain a more accurate semantic relationship between each vector and the whole sentence between the comment vector sequence and the reply vector sequence, the coding comment vector includes a forward coding comment vector and a reverse coding comment vector, and the coding reply vector includes a forward coding reply vector and a reverse coding reply vector.

In this embodiment, in order to perform more proper matching on each vector in the coding comment vectors and the semantics of the coding reply vector and to perform more proper matching on each vector in the coding reply vector and the semantics of the coding comment vector, the matching module 405 obtains a plurality of matching vectors according to the cosine similarity between the weighted coding comment vector of each dimension of each time and the weighted coding reply vector of the corresponding dimension of each time.

As shown in fig. 8, in this embodiment, the matching module 405 includes: a first matching unit 4051, a second matching unit 4052, a reply cosine calculation unit 4053, a weighted reply calculation unit 4054, a third matching unit 4055, a comment cosine calculation unit 4056, a weighted comment calculation unit 4057 and a fourth matching unit 4058;

the first matching unit 4051 is configured to sequentially match, from the first comment time, the coding comment vector at the current time with the coding reply vector at the last time to obtain a first matching vector at each time;

the second matching unit 4052 is configured to sequentially match, from the first recovery time, the coding recovery vector at the current time and the coding comment vector at the last time to obtain a second matching vector at each time;

the reply cosine calculating unit 4053 is configured to sequentially calculate, from the first comment time, a coding comment vector at the current time and a coding reply vector at each time to obtain a cosine similarity at each reply time;

the weighted reply calculation unit 4054 is configured to calculate a weighted coding reply vector according to the cosine similarity of each reply time of the current comment time;

the third matching unit 4055 is configured to match, from the first evaluation time, the encoded evaluation vector of each evaluation time with the corresponding weighted encoded reply vector to obtain a third matching vector of each time;

the comment cosine calculating unit 4056 is configured to sequentially calculate, from the first recovery time, a coding recovery vector at the current time and a coding comment vector at each time to obtain a cosine similarity at each comment time;

the weighted comment calculating unit 4057 is configured to calculate a weighted coding comment vector according to the cosine similarity of each comment time of the current reply time;

the fourth matching unit 4058 is configured to match, starting from the first recovery time, the coding comment vector at each recovery time with the corresponding weighted coding comment vector to obtain a fourth matching vector at each time;

the plurality of matching vectors includes the first matching vector, the second matching vector, the third matching vector, and the fourth matching vector.

The matching method between vectors is to match through the above cosine similarity formula, that is, after passing through all the modules 4051 and 4058, a matching vector sequence composed of multidimensional cosine similarities at each moment is obtained.

The reply cosine calculating unit 4053 and the weighted reply calculating unit 4054 perform weighted averaging on all the reply time vectors according to the cosine similarity of each reply time at each review time, wherein the cosine similarity is used for calculating the weight, namely the correlation between a certain word in the review and the reply content, and the relationship between the review and the reply can be obtained by weighting the reply time vectors according to the correlation, namely the cosine similarity, and the same review cosine calculating unit 4056 and the weighted review calculating unit 4057 can also obtain the relationship between the review and the review by performing weighted averaging on all the review time vectors according to the cosine similarity of each review time at each review time.

In this embodiment, from the first time, the first matching unit and the second matching unit perform full matching on the current time vector of the comment and the last time vector of the reply, and the third matching unit or the fourth matching unit performs weighting on the vector in the reply or the comment on the cosine similarity of the comment or the reply, so that a real relationship between the comment and the reply can be obtained, the defect of neglecting the relevance of details in the prior art is overcome, and further, feedback of a more real relevance between the comment and the reply can be obtained.

In this embodiment, after obtaining a matched vector sequence, the first splicing module 406 is called, as shown in fig. 9, the first splicing module 406 includes an input unit 4061, a truncation unit 4062, and an aggregation unit 4063.

The input unit 4061 is configured to input the matching vector sequence into a bidirectional LSTM model;

the intercepting unit 4062 is configured to obtain a relationship between the multiple matching vectors at each time according to the bidirectional LSTM model, and intercept a comment forward relationship vector, a comment reverse relationship vector, a reply forward relationship vector, and a reply reverse relationship vector at a last time in the LSTM model;

the aggregation unit 4063 is configured to aggregate the forward relationship vector, the comment reverse relationship vector, the reply forward relationship vector, and the reply reverse relationship vector into the concatenation vector.

In the embodiment, the inaccuracy of only obtaining a unidirectional vector is avoided by obtaining the forward coding comment vector and the reverse coding comment vector, the matching vector sequence is input into the bidirectional LSTM model through the input unit, and the specific four vectors are intercepted through the intercepting unit, so that the complete semantics of the whole section of speech can be obtained, the aggregation efficiency of the aggregation unit is improved, and the calculation of the subsequent correlation degree is more accurate through the bidirectional model.

In addition, the embodiment further includes a determining unit, configured to determine whether the relevancy probability is greater than the preset probability, if so, the comment is not matched with the reply, and if not, the comment is matched with the reply.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A detection method for relevancy of comment and reply of an OTA hotel is characterized by comprising the following steps:

obtaining comments and replies on the OTA hotel;

2. The detection method according to claim 1,

before the step of inputting the splicing vector to the full-connection layer to obtain the preset dimension vector, the method further comprises the following steps:

obtaining a similarity average value according to the similarity sequence;

and/or the presence of a gas in the gas,

3. The detection method according to claim 1,

and/or the presence of a gas in the gas,

4. The detection method according to claim 1,

in the step of coding the semantic relation among the vectors in the comment vector sequence to obtain the coding comment vector at each moment,

inputting the sequence of matching vectors into a bi-directional LSTM model;

5. The detection method according to claim 1,

the step of converting the comment and the reply into a comment vector sequence and a reply vector sequence respectively comprises:

preprocessing the comment and the reply;

and/or the presence of a gas in the gas,

6. A detection system for correlation of comment and reply of OTA hotel, characterized in that the detection system comprises: the system comprises an information acquisition module, a conversion module, a comment coding module, a reply coding module, a matching module, a first splicing module, a target vector acquisition module and a probability calculation module;

7. The detection system of claim 6, further comprising: the text similarity calculation module, the average value obtaining module and the second splicing module;

and/or the presence of a gas in the gas,

8. The detection system of claim 6, wherein the matching module comprises a first matching unit and a second matching unit;

and/or the presence of a gas in the gas,

9. The detection system of claim 6,

10. The detection system of claim 6, wherein the conversion module comprises a preprocessing unit, a word segmentation processing unit, a vocabulary adding unit and a vector sequence acquisition unit;

the preprocessing unit is used for preprocessing the comment and the reply;

the preprocessing unit is used for preprocessing by at least one means of filtering special characters, filtering pure numbers, filtering sentences which do not contain Chinese characters, filtering invalid sentences and standardizing sentences;

and/or the presence of a gas in the gas,