CN106776782B - Semantic similarity obtaining method and device based on artificial intelligence - Google Patents

Semantic similarity obtaining method and device based on artificial intelligence Download PDF

Info

Publication number
CN106776782B
CN106776782B CN201611042515.3A CN201611042515A CN106776782B CN 106776782 B CN106776782 B CN 106776782B CN 201611042515 A CN201611042515 A CN 201611042515A CN 106776782 B CN106776782 B CN 106776782B
Authority
CN
China
Prior art keywords
granularity
search
weight
similarity
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611042515.3A
Other languages
Chinese (zh)
Other versions
CN106776782A (en
Inventor
周坤胜
何径舟
石磊
冯仕堃
朱志凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201611042515.3A priority Critical patent/CN106776782B/en
Publication of CN106776782A publication Critical patent/CN106776782A/en
Application granted granted Critical
Publication of CN106776782B publication Critical patent/CN106776782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a semantic similarity obtaining method and a semantic similarity obtaining device based on artificial intelligence, wherein the method comprises the following steps: the method comprises the steps of obtaining the weight of each granularity characteristic after obtaining the multi-granularity characteristics of query and title, reflecting the importance degree of the characteristics with different granularities through the weight, and adding the factor of the weight of each granularity characteristic when carrying out multi-granularity representation on the query and the title, so that different granularity characteristics play different roles according to the importance of the characteristics when calculating the similarity of the query and the title, the similarity calculation precision is higher, the optimization of the existing voice similarity model is realized, the search result is accurate, and the requirements of users can be better met.

Description

Semantic similarity obtaining method and device based on artificial intelligence
Technical Field
The invention relates to the technical field of information processing, in particular to a semantic similarity obtaining method and device based on artificial intelligence.
Background
Artificial Intelligence (Artificial Intelligence), abbreviated in english as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, speech recognition, image recognition, natural language processing, and expert systems.
The search behavior of the user is analyzed based on artificial intelligence, and the purpose that the content related to the search word input by the user can be obtained as soon as possible through the search result when the user searches can be known.
Retrieval according to the semantics of the search terms is one of the keys for realizing knowledge retrieval, and similarity calculation is the basis of semantic retrieval. The semantic similarity model can calculate similarity between a search word (query) input by a user during searching and a candidate search item (title), and after the similarity between the query and the title is obtained, a search engine can sequence the obtained similarity and return a search result according to a sequencing result. Fig. 1 is a schematic structural diagram of a conventional semantic similarity model. The semantic similarity model comprises: the bottom layer is an embedding (embedding) layer, a transformation (BOW) layer, a contrast (FC) layer and a top scoring (Score) layer. The embedding layer is composed of vectorization representations of all dictionary words, when a sentence is input by a user during searching, the sentence can be mapped into a two-dimensional vector by the embedding layer, and each sub-vector is term-embedding corresponding to a term (term); the BOW layer represents the transformation of the two-dimensional vector, the two-dimensional vector is transformed into a one-dimensional vector, and the layer can also be replaced by volume and posing; the FC layer is a full communication layer, linear transformation is carried out on the one-dimensional vector by the FC layer, an activation function can be optionally added after the linear transformation, and nonlinear transformation is added through the activation function; the Score layer is used to measure the similarity between the obtained query and title. For example, query is "Baidu Brazilian gulf, and title" Brazilian gulf, after word cutting, a sequence of words can be obtained, the sequence of words can include: baidu, Brazil, Konjac, while title discrete word sequences include: brazil and Konjac. When the similarity between the query and the title is calculated through the semantic similarity model shown in fig. 1, each word after query word segmentation is used as a granularity, then a single-granularity vector representation is performed on the query by using all the words of the query, correspondingly, each word after title word segmentation is used as a granularity, and then a single-granularity vector representation is performed on the title by using all the words of the title. The semantic similarity calculation of the single granularity obtains the similarity with poor precision, so that the search result is not ideal.
In order to improve the search precision, as shown in fig. 2, a semantic similarity model is improved, in the similarity calculation process, after the query and the title are cut into words, feature extraction is performed by using a participle corpus, and a plurality of granularity features of the query and the title, such as query-basic granularity feature (query-basic) of the query, binary feature (query-basic-binary) of the query, basic granularity feature (title-basic) of the title, and binary feature (title-basic-binary) of the title, are obtained. Although multi-granularity is introduced to represent the query and the title as shown in fig. 2, before the similarity between the query and the title is calculated, the multi-granularity characteristics of the query and the title are not distinguished in the semantic similarity model, the multi-granularity characteristics of the query are directly added in the transformation BOW layer to obtain multi-granularity representation of the query, and the multi-granularity characteristics of the title are added to obtain multi-granularity representation of the title.
The existing voice similarity model directly adds the multiple granularities without distinguishing the multiple granularity characteristics to obtain the multiple granularity representation of query and title, so that the accuracy of a search result obtained by a search engine is poor.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present invention is to provide a semantic similarity obtaining method based on artificial intelligence, so as to optimize an existing speech similarity model, and solve the problem that in the prior art, a speech similarity model directly adds multiple granularities to obtain a multi-granularity representation of query and title because the speech similarity model does not distinguish multiple granularity features, so that the accuracy of a search result obtained by a search engine is poor.
The second purpose of the invention is to provide a semantic similarity obtaining device based on artificial intelligence.
The third purpose of the invention is to provide another semantic similarity acquiring device based on artificial intelligence.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a semantic similarity obtaining method based on artificial intelligence, including:
acquiring granularity characteristics of search terms and search items;
similarity calculation is carried out on each granularity characteristic of the search word and the granularity characteristics of the search items, and a weight corresponding to each granularity characteristic is obtained;
carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity characteristic to obtain granularity vectors of the search terms and the granularity vectors of the search items;
calculating a similarity between the search term and the search entry based on the granularity vector of the search term and the granularity vector of the search entry.
According to the semantic similarity obtaining method based on artificial intelligence, after the granularity characteristics of the query and the title are obtained, the weight of each granularity characteristic is obtained, the importance degree of the characteristics with different granularities can be embodied through the weight, and then the weight factor of each granularity characteristic is added when the query and the title are subjected to multi-granularity representation, so that different granularity characteristics play different roles according to the importance of the characteristics when the similarity of the query and the title is calculated, the similarity calculation precision is higher, the existing voice similarity model is optimized, the search result is accurate, and the requirements of users can be met better.
In order to achieve the above object, an embodiment of a second aspect of the present invention provides an artificial intelligence based semantic similarity obtaining apparatus, including:
the characteristic acquisition module is used for acquiring granularity characteristics of the search terms and the search items;
the weight calculation module is used for carrying out similarity calculation on each granularity characteristic of the search word and the granularity characteristics of the search items to obtain the weight corresponding to each granularity characteristic;
the vector acquisition module is used for performing weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity feature to obtain granularity vectors of the search terms and the granularity vectors of the search items;
and the similarity calculation module is used for calculating the similarity between the search word and the search item based on the granularity vector of the search word and the granularity vector of the search item.
According to the semantic similarity obtaining device based on artificial intelligence, after the granularity characteristics of the query and the title are obtained, the weight of each granularity characteristic is obtained, the importance degree of the characteristics with different granularities can be embodied through the weight, and then the weight factor of each granularity characteristic is added when the query and the title are subjected to multi-granularity representation, so that different granularity characteristics play different roles according to the importance of the characteristics when the similarity of the query and the title is calculated, the similarity calculation precision is higher, the existing voice similarity model is optimized, the search result is accurate, and the requirements of users can be met better.
In order to achieve the above object, an embodiment of a third aspect of the present invention provides another semantic similarity obtaining apparatus based on artificial intelligence, including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to: acquiring granularity characteristics of search terms and search items; similarity calculation is carried out on each granularity characteristic of the search word and the granularity characteristics of the search items, and a weight corresponding to each granularity characteristic is obtained; carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity characteristic to obtain granularity vectors of the search terms and the granularity vectors of the search items; calculating a similarity between the search term and the search entry based on the granularity vector of the search term and the granularity vector of the search entry.
In order to achieve the above object, a fourth aspect of the present invention provides a non-transitory computer-readable storage medium, where instructions of the storage medium, when executed by a processor on a server side, enable the server side to perform a semantic similarity obtaining method based on artificial intelligence, the method including: acquiring granularity characteristics of search terms and search items; similarity calculation is carried out on each granularity characteristic of the search word and the granularity characteristics of the search items, and a weight corresponding to each granularity characteristic is obtained; carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity characteristic to obtain granularity vectors of the search terms and the granularity vectors of the search items; calculating a similarity between the search term and the search entry based on the granularity vector of the search term and the granularity vector of the search entry.
In order to achieve the above object, a fifth aspect of the present invention provides a computer program product, wherein when being executed by an instruction processor, a semantic similarity obtaining method based on artificial intelligence is performed, and the method includes: acquiring granularity characteristics of search terms and search items; similarity calculation is carried out on each granularity characteristic of the search word and the granularity characteristics of the search items, and a weight corresponding to each granularity characteristic is obtained; carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity characteristic to obtain granularity vectors of the search terms and the granularity vectors of the search items; calculating a similarity between the search term and the search entry based on the granularity vector of the search term and the granularity vector of the search entry.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic structural diagram of a semantic similarity model with a single granularity in the prior art;
FIG. 2 is a schematic structural diagram of a conventional multi-granularity semantic similarity model;
fig. 3 is a schematic flowchart of a semantic similarity obtaining method based on artificial intelligence according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a semantic similarity model according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of another semantic similarity obtaining method based on artificial intelligence according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of another semantic similarity model provided in the embodiment of the present invention;
fig. 7 is a schematic structural diagram of an artificial intelligence-based semantic similarity obtaining apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a vector obtaining module according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a similarity calculation module according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The semantic similarity obtaining method and device based on artificial intelligence according to the embodiment of the invention are described below with reference to the accompanying drawings.
Fig. 3 is a schematic flowchart of a semantic similarity obtaining method based on artificial intelligence according to an embodiment of the present invention. The semantic similarity obtaining method based on artificial intelligence comprises the following steps:
s101, acquiring granularity characteristics of the search terms and the search items.
Specifically, word segmentation is carried out on a search word (query) and a search item (title) to obtain a participle corpus of the query and the title, and feature extraction is carried out on the participle corpus by utilizing a neural network to obtain the granularity feature of the query and the granularity feature of the title.
Before extracting the characteristics of the query and title participle linguistic data through the neural network, firstly training the neural network by using the training linguistic data, cutting words of the training linguistic data before training, then forming multi-granularity participle linguistic data by using the words after word cutting, training the neural network based on the participle linguistic data, and obtaining a stable and convergent neural network after training.
After the input query and title cut words, the participle linguistic data corresponding to the query and the title can be input into a neural network for feature extraction, and the granularity features of the search words and the search items are obtained. For example, query is "Baidu Brazilian gulf, and title" Brazilian gulf, after word cutting, a sequence of words can be obtained, the sequence of words can include: baidu, Brazil, Konjac, while title discrete word sequences include: brazil and Konjac. After the discrete word sequences of the query and the title are obtained, the words after word segmentation can be combined to obtain the granularity characteristics of the query and the title. For example, the granularity characteristics of a query may include: query-based granularity characteristics (qb) of the query, phrase-based granularity characteristics (qp) of the query, binary granularity characteristics (qbb) of the base granularity of the query, and the like, and accordingly, the granularity characteristics of the title may include: a title-basic granularity characteristic (tb), a title-phrase granularity characteristic (tp), and a title-basic-bigram characteristic (tbb).
S102, similarity calculation is carried out on each granularity characteristic of the search terms and the granularity characteristics of the search items, and weight corresponding to each granularity characteristic is obtained.
Specifically, each granularity feature of the query is subjected to similar calculation with the granularity features of the same type of the title, so that the weight corresponding to each granularity feature is obtained. Preferably, cosine similarity calculation is carried out on each granularity characteristic of the query and the granularity characteristics of the same type of the title respectively, and the weight corresponding to each granularity characteristic is obtained. For example, cosine similarity calculation is carried out on query-basic and title-basic to obtain the weight of granularity characteristics of the basic, cosine similarity calculation is carried out on query-phrase and title-phrase to obtain the weight of granularity characteristics of the phrase, and the weight can embody the importance degree of the characteristics with different granularities.
S103, carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity feature to obtain granularity vectors of the search terms and the granularity vectors of the search items.
Specifically, after the weight corresponding to each granularity feature is obtained, multiplying each granularity feature of the query by the weight corresponding to the granularity feature, and adding values obtained by multiplying all granularity features of the query by the corresponding weights to obtain the granularity vector of the query.
Correspondingly, each granularity feature of the title is multiplied by the weight corresponding to the granularity feature, and the values obtained by multiplying all the granularity features of the title by the corresponding weights are added to obtain the granularity vector representation of the title.
In this embodiment, when the granularity vectors of the query and the title are obtained, because the weight of each granularity feature is added, the granularity vector of the query and the granularity vector of the title can be represented according to the importance degree of the granularity, so that the granularity vector representation is better and more accurate.
In practical application, the weight corresponding to each particle size feature obtained by calculating the cosine similarity needs to be normalized to obtain the normalized weight of each particle size feature. Preferably, the weight of each granular feature may be normalized using a regression function (softmax).
And S104, calculating the similarity between the search word and the search item based on the granularity vector of the search word and the granularity vector of the search item.
After the granularity vector of the query and the granularity vector of the title are obtained, the similarity between the query and the title can be calculated based on the granularity vector of the query and the granularity vector of the title.
Fig. 4 is a schematic structural diagram of a speech similarity model provided in this embodiment. As shown in fig. 4, the semantic similar model includes: the system comprises a granularity weighting layer, a comparison layer and a grading layer, wherein the granularity weighting layer is used for realizing the granularity vector representation of the query and the title. The granularity weighting layer is provided with three cosine similarity calculation modules, a normalization module and a weighting module, wherein qb and tb, qp and tp, and qbb and tbb respectively pass through one cosine similarity calculation module to obtain a weight corresponding to a basic granularity characteristic, a weight corresponding to a phrase granularity characteristic and a weight corresponding to a digram characteristic of the basic granularity. After the weights are obtained, normalization processing is carried out on the obtained weights of the granularity characteristics through a normalization module, then the granularity characteristics and the corresponding normalization weights are multiplied and weighted through a weighting module, and a granularity vector (q _1) of the query and a granularity vector (t _1) of the title are obtained. The semantic similarity model can execute the voice acquirement method based on artificial intelligence provided by the embodiment.
Specifically, a query granularity vector and a title granularity vector are obtained in a granularity weighting layer, then the query granularity vector and the title granularity vector are input into a comparison layer, namely an FC layer, linear transformation or nonlinear transformation is carried out through the FC layer to obtain the similarity between the query and the title, then the similarity between the query and the title is scored through a score layer in a semantic similarity model, after scoring is completed, sorting is carried out according to the scores, and a sorting result is fed back to a user.
According to the semantic similarity obtaining method based on artificial intelligence, after the multi-granularity features of the query and the title are obtained, the weight of each granularity feature is obtained, the importance degree of the features with different granularities can be embodied through the weight, and then the weight factor of each granularity feature is added when the query and the title are subjected to multi-granularity representation, so that different granularity features play different roles according to the importance of the features when the similarity of the query and the title is calculated, the similarity calculation precision is higher, the existing voice similarity model is optimized, the search result is accurate, and the requirements of users can be met better.
Fig. 5 is a schematic flow chart of another semantic similarity obtaining method based on artificial intelligence according to an embodiment of the present invention. The semantic similarity obtaining method based on artificial intelligence comprises the following steps:
s201, acquiring granularity characteristics of the search terms and the search items.
S202, similarity calculation is carried out on each granularity characteristic of the search terms and the granularity characteristics of the search items, and the weight corresponding to each granularity characteristic is obtained.
S203, carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity feature to obtain the granularity vectors of the search terms and the granularity vectors of the search items.
The descriptions of S201 to S203 can be referred to the description of the relevant contents in the above embodiments, and are not repeated herein.
S204, calculating the similarity of each granularity feature of the search word and the granularity vector of the search item to obtain a first weight, and calculating the similarity of each granularity feature of the search item and the granularity vector of the search item to obtain a second weight.
S205, carrying out weighted calculation on each granularity characteristic of the search terms and the first weight, and carrying out weighted calculation on each granularity characteristic of the search items and the second weight.
Specifically, similarity calculation is carried out on each granularity feature of the query and the granularity vector of the title to obtain a first weight, and similarity calculation is carried out on each granularity feature of the title and the granularity vector of the query to obtain a second weight. And performing weighted calculation on each granularity characteristic of the query and the first weight, namely multiplying each granularity characteristic of the query by the first weight, and adding all products. Accordingly, each granularity feature of the title is weighted with the first weight, i.e., each granularity feature of the title is multiplied with the second weight, and all the products are added.
And S206, updating the granularity vector of the search word and the granularity vector of the search item by using the result of the weighted calculation.
After the results of the weighted calculation of the query and the title are obtained, the results of the weighted calculation of the query are used for updating the granularity vector of the query, and the results of the weighted calculation of the title are used for updating the granularity vector of the title.
S207, calculating the similarity of the granularity vector of the search word and the granularity vector of the search item to obtain the similarity between the search word and the search item.
After the granularity vector of the query and the granularity vector of the title are obtained, the similarity between the query and the title can be calculated based on the granularity vector of the query and the granularity vector of the title.
Fig. 6 is a schematic structural diagram of another speech similarity model provided in this embodiment. As shown in fig. 6, the semantic similar model includes: the system comprises a first granularity weighting layer, a second granularity weighting layer, a comparison layer and a grading layer, wherein the first granularity weighting layer and the second granularity weighting layer realize the granularity vector representation of the query and the title. Two first cosine similarity calculation modules, a first normalization module and a first weighting module are arranged in the first granularity weighting layer, wherein qb and tb, and qbb and tbb respectively obtain the weight corresponding to the basic granularity characteristic and the weight corresponding to the digram characteristic of the basic granularity through one first cosine similarity calculation module. After the weight corresponding to each granularity feature is obtained, the weight of each granularity feature is normalized through a first normalization module, and then the granularity features and the corresponding normalization weights are multiplied and added through the first weighting module to obtain a query granularity vector (q _1) and a title granularity vector (t _ 1).
Further, the second granular weighting layer comprises: four second cosine similarity calculation modules, two second normalization modules and two second weighting modules.
For the query, similarity calculation is performed on the basis of the basic granularity feature and the digram feature of the basic granularity and the granularity vector t _1 of the title obtained by the first granularity weighting layer respectively through two second cosine similarity calculation modules to obtain a first weight, then normalization processing is performed on the first weight through a second normalization module, and then product addition is performed on qb and qbb of the query and the corresponding normalized first weight respectively through the second weighting module to obtain a granularity vector (q _2) of the query.
Correspondingly, for a title, similarity calculation is performed on the basis of the baseline granularity characteristic and the digram characteristic of the baseline granularity and the query granularity vector q _1 obtained by the first granularity weighting layer respectively through two second cosine similarity calculation modules to obtain a second weight, then normalization processing is performed on the second weight through a second normalization module, and then product addition is performed on tb and tbb of the title and the normalized second weight respectively through the second weighting module to obtain a granularity vector (t _2) of the title.
And after the comparison layer, namely the FC layer, input by the q _2 and the t _2 is obtained, linear transformation or nonlinear transformation is carried out through the FC layer to obtain the similarity between the query and the title, then the similarity between the query and the title is scored through the score layer in the semantic similarity model, and after the scoring is finished, the ranking is carried out according to the scores, and the ranking result is fed back to the user.
Further, in this embodiment, an iterative operation may be performed in the second granularity weighting module, that is, the query granularity vector and the title granularity vector obtained in S206, that is, the granularity vector of the search term and the granularity vector of the search entry updated by using the result of the weighted calculation, are subjected to an iterative computation, according to a preset number of iterations, a similarity computation is performed on each granularity feature of the query and the granularity vector of the title to obtain a first weight, a similarity computation is performed on each granularity feature of the search entry and the granularity vector of the title to obtain a second weight, a weighted calculation is performed on each granularity feature of the query and the first weight, and a weighted calculation is performed on each granularity feature of the title and the second weight until the number of iterations is completed.
For example, the iteration number is set to 2, that is, q _2 and t _2 are used as new q _1 and t _1, qb and qbb are respectively subjected to cosine similarity calculation with q _1, tb, tbb and t _1 are subjected to cosine similarity calculation and subsequent operation to obtain q _3 and t _3, at this time, one iteration process is completed, q _3 and t _3 are used as new q _1 and t _1, qb and qbb are respectively subjected to cosine similarity calculation with q _1, tb, tbb and t _1 are subjected to cosine similarity calculation and subsequent operation to obtain q _4 and t _4, and at this time, a second iteration process is completed.
And further taking q _4 and t _4 as new q _1 and t _1 to perform similarity calculation between the query and the title.
According to the semantic similarity obtaining method based on artificial intelligence, after the multi-granularity features of the query and the title are obtained, the weight of each granularity feature is obtained, the importance degree of the features with different granularities can be embodied through the weight, and then the weight factor of each granularity feature is added when the query and the title are subjected to multi-granularity representation, so that different granularity features play different roles according to the importance of the features when the similarity of the query and the title is calculated, the similarity calculation precision is higher, the existing voice similarity model is optimized, the search result is accurate, and the requirements of users can be met better.
In the embodiment, the multi-hop granularity weighting is performed by using the weight of the granularity characteristic, and further, the accuracy of similarity calculation is improved, so that the search result is more in line with the requirements of users.
Fig. 7 is a schematic structural diagram of a semantic similarity obtaining apparatus based on artificial intelligence according to an embodiment of the present invention. The semantic similarity obtaining device based on artificial intelligence comprises: a feature acquisition module 11, a weight calculation module 12, a vector acquisition module 13 and a similarity calculation module 14.
The feature obtaining module 11 is configured to obtain granularity features of the search terms and the search entries.
The feature obtaining module 11 is specifically configured to:
and performing word segmentation on the search word and the search item to obtain a word segmentation corpus of the search word and the search item.
And carrying out feature extraction on the word segmentation linguistic data by utilizing a neural network to obtain the granularity features of the search words and the granularity features of the search items.
The weight calculation module 12 is configured to perform similarity calculation based on each granularity feature of the search term and the granularity feature of the search entry to obtain a weight corresponding to each granularity feature;
the weight calculation module 12 is specifically configured to perform similar calculation on each granularity feature of the search term and granularity features of the same type of the search entry, respectively, to obtain a weight corresponding to each granularity feature.
Further, the weight calculating module 12 is specifically configured to:
and respectively carrying out cosine similarity calculation on each granularity characteristic of the search word and the granularity characteristics of the same type of the search entry to obtain the weight corresponding to each granularity characteristic.
And the vector obtaining module 13 is configured to perform weighted calculation on the search term and the search entry by using the weight corresponding to each granularity feature to obtain a granularity vector of the search term and a granularity vector of the search entry.
Fig. 8 is a schematic diagram of an alternative structure of the vector obtaining module in this embodiment. The vector acquisition module 13 includes: a normalization unit 131 and a vector acquisition unit 132.
The normalizing unit 131 is configured to perform normalization processing on the weight corresponding to each granularity feature to obtain a normalized weight corresponding to each granularity.
A vector obtaining unit 132, configured to add, for the search term and the search entry, products of each granularity feature and the corresponding normalization weight to obtain a granularity vector of the search term and a granularity vector of the search entry.
And a similarity calculation module 14, configured to calculate a similarity between the search term and the search entry based on the granularity vector of the search term and the granularity vector of the search entry.
Fig. 9 is a schematic diagram of an alternative structure of the similarity calculation module in this embodiment. The similarity calculation module 14 includes: a weighting unit 141, a weighting calculation unit 142, an updating unit 143, a similarity calculation unit 144, and an iteration unit 145.
The weighting unit 141 is configured to perform similarity calculation on each particle size feature of the search term and the particle size vector of the search entry to obtain a first weight, and perform similarity calculation on each particle size feature of the search entry and the particle size vector of the search entry to obtain a second weight.
A weighting calculation unit 142, configured to perform weighting calculation on each granular feature of the search term and the first weight, and perform weighting calculation on each granular feature of the search entry and the second weight.
An updating unit 143, configured to update the granularity vector of the search term and the granularity vector of the search entry using a result of the weighting calculation.
The similarity calculation unit 144 is configured to calculate similarities of the granularity vectors of the search terms and the granularity vectors of the search entries, so as to obtain similarities between the search terms and the search entries.
Further, the iteration unit 145 is configured to iteratively execute, according to a preset iteration number, the similarity calculation between each particle size feature of the search term and the particle size vector of the search entry to obtain a first weight, and the similarity calculation between each particle size feature of the search entry and the particle size vector of the search entry to obtain a second weight, perform the weight calculation between each particle size feature of the search term and the first weight, and perform the weight calculation between each particle size feature of the search entry and the second weight until the iteration number is completed.
Further, the weighting calculation unit 142 is specifically configured to:
normalizing the first weight and the second weight;
performing weighted calculation on each granularity characteristic of the search term and the normalized first weight;
and carrying out weighted calculation on each granularity characteristic of the search item and the normalized second weight.
According to the semantic similarity obtaining method based on artificial intelligence, after the multi-granularity features of the query and the title are obtained, the weight of each granularity feature is obtained, the importance degree of the features with different granularities can be embodied through the weight, and then the weight factor of each granularity feature is added when the query and the title are subjected to multi-granularity representation, so that different granularity features play different roles according to the importance of the features when the similarity of the query and the title is calculated, the similarity calculation precision is higher, the existing voice similarity model is optimized, the search result is accurate, and the requirements of users can be met better.
In the embodiment, the multi-hop granularity weighting is performed by using the weight of the granularity characteristic, and further, the accuracy of similarity calculation is improved, so that the search result is more in line with the requirements of users.
In order to implement the above embodiment, the present invention further provides another semantic similarity obtaining apparatus based on artificial intelligence, including: a processor, and a memory for storing processor-executable instructions.
Wherein the processor is configured to: acquiring granularity characteristics of search terms and search items; similarity calculation is carried out on each granularity characteristic of the search word and the granularity characteristics of the search items, and a weight corresponding to each granularity characteristic is obtained; carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity characteristic to obtain granularity vectors of the search terms and the granularity vectors of the search items; calculating a similarity between the search term and the search entry based on the granularity vector of the search term and the granularity vector of the search entry.
In order to implement the foregoing embodiments, the present invention further proposes a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor on a server side, enable the server side to execute a semantic similarity obtaining method based on artificial intelligence, the method comprising: acquiring granularity characteristics of search terms and search items; similarity calculation is carried out on each granularity characteristic of the search word and the granularity characteristics of the search items, and a weight corresponding to each granularity characteristic is obtained; carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity characteristic to obtain granularity vectors of the search terms and the granularity vectors of the search items; calculating a similarity between the search term and the search entry based on the granularity vector of the search term and the granularity vector of the search entry.
In order to implement the foregoing embodiments, the present invention further provides a computer program product, which when executed by an instruction processor in the computer program product, performs a semantic similarity obtaining method based on artificial intelligence, where the method includes: acquiring granularity characteristics of search terms and search items; similarity calculation is carried out on each granularity characteristic of the search word and the granularity characteristics of the search items, and a weight corresponding to each granularity characteristic is obtained; carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity characteristic to obtain granularity vectors of the search terms and the granularity vectors of the search items; calculating a similarity between the search term and the search entry based on the granularity vector of the search term and the granularity vector of the search entry.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (16)

1. A semantic similarity obtaining method based on artificial intelligence is characterized by comprising the following steps:
acquiring granularity characteristics of search terms and search items;
similarity calculation is carried out on each granularity characteristic of the search word and the granularity characteristics of the search items, and a weight corresponding to each granularity characteristic is obtained;
carrying out weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity characteristic to obtain granularity vectors of the search terms and the granularity vectors of the search items;
calculating a similarity between the search term and the search entry based on the granularity vector of the search term and the granularity vector of the search entry.
2. The artificial intelligence based semantic similarity obtaining method according to claim 1, wherein the obtaining of the granularity characteristics of the search terms and the search entries comprises:
performing word segmentation on the search word and the search item to obtain a word segmentation corpus of the search word and the search item;
and carrying out feature extraction on the word segmentation linguistic data by utilizing a neural network to obtain the granularity features of the search words and the granularity features of the search items.
3. The method for obtaining semantic similarity based on artificial intelligence according to claim 1, wherein the calculating of similarity between each granular feature based on the search term and the granular feature of the search entry to obtain a weight corresponding to each granular feature comprises:
and performing similar calculation on each granularity characteristic of the search term and the granularity characteristics of the same type of the search entry respectively to obtain the weight corresponding to each granularity characteristic.
4. The semantic similarity obtaining method based on artificial intelligence according to claim 3, wherein the performing similarity calculation on each granularity feature of the search term and the granularity features of the same type of the search entry to obtain a weight corresponding to each granularity feature comprises:
and respectively carrying out cosine similarity calculation on each granularity characteristic of the search word and the granularity characteristics of the same type of the search entry to obtain the weight corresponding to each granularity characteristic.
5. The artificial intelligence based semantic similarity obtaining method according to claim 4, wherein the obtaining of the granularity vector of the search term and the granularity vector of the search entry by performing weighted calculation on the search term and the search entry by using the weight corresponding to each granularity feature comprises:
carrying out normalization processing on the weight corresponding to each granularity characteristic to obtain a normalization weight corresponding to each granularity;
and adding products of each granularity feature and the corresponding normalization weight aiming at the search word and the search item to obtain a granularity vector of the search word and a granularity vector of the search item.
6. The artificial intelligence based semantic similarity obtaining method according to claim 5, wherein the calculating the similarity between the search term and the search item based on the granularity vector of the search term and the granularity vector of the search item comprises:
similarity calculation is carried out on each granularity feature of the search word and the granularity vector of the search item to obtain a first weight, and similarity calculation is carried out on each granularity feature of the search item and the granularity vector of the search item to obtain a second weight;
performing weighted calculation on each granularity characteristic of the search term and the first weight, and performing weighted calculation on each granularity characteristic of the search item and the second weight;
updating the granularity vector of the search term and the granularity vector of the search item by using the result of the weighted calculation;
and calculating the similarity of the granularity vector of the search word and the granularity vector of the search item to obtain the similarity between the search word and the search item.
7. The artificial intelligence based semantic similarity obtaining method according to claim 6, wherein before updating the granularity vector of the search term and the granularity vector of the search entry using the result of the weighting calculation, the method further comprises:
and iterating and executing the similarity calculation between each granularity feature of the search word and the granularity vector of the search entry to obtain a first weight according to a preset iteration number, and iterating and calculating the similarity calculation between each granularity feature of the search word and the granularity vector of the search entry to obtain a second weight, carrying out the weight calculation between each granularity feature of the search word and the first weight, and carrying out the weight calculation between each granularity feature of the search entry and the second weight until the iteration number is finished.
8. The artificial intelligence based semantic similarity obtaining method according to claim 6, wherein the performing weighted calculation on each granular feature of the search term and the first weight and each granular feature of the search item and the second weight comprises:
normalizing the first weight and the second weight;
performing weighted calculation on each granularity characteristic of the search term and the normalized first weight;
and carrying out weighted calculation on each granularity characteristic of the search item and the normalized second weight.
9. A semantic similarity obtaining device based on artificial intelligence is characterized by comprising:
the characteristic acquisition module is used for acquiring granularity characteristics of the search terms and the search items;
the weight calculation module is used for carrying out similarity calculation on each granularity characteristic of the search word and the granularity characteristics of the search items to obtain the weight corresponding to each granularity characteristic;
the vector acquisition module is used for performing weighted calculation on the search terms and the search items by utilizing the weight corresponding to each granularity feature to obtain granularity vectors of the search terms and the granularity vectors of the search items;
and the similarity calculation module is used for calculating the similarity between the search word and the search item based on the granularity vector of the search word and the granularity vector of the search item.
10. The artificial intelligence based semantic similarity obtaining apparatus according to claim 9, wherein the feature obtaining module is specifically configured to:
performing word segmentation on the search word and the search item to obtain a word segmentation corpus of the search word and the search item;
and carrying out feature extraction on the word segmentation linguistic data by utilizing a neural network to obtain the granularity features of the search words and the granularity features of the search items.
11. The artificial intelligence based semantic similarity acquisition apparatus according to claim 9, wherein the weight calculation module is specifically configured to:
and performing similar calculation on each granularity characteristic of the search term and the granularity characteristics of the same type of the search entry respectively to obtain the weight corresponding to each granularity characteristic.
12. The artificial intelligence based semantic similarity acquisition apparatus according to claim 11, wherein the weight calculation module is specifically configured to:
and respectively carrying out cosine similarity calculation on each granularity characteristic of the search word and the granularity characteristics of the same type of the search entry to obtain the weight corresponding to each granularity characteristic.
13. The artificial intelligence based semantic similarity acquisition apparatus according to claim 12, wherein the vector acquisition module comprises:
the normalization unit is used for performing normalization processing on the weight corresponding to each granularity characteristic to obtain the normalization weight corresponding to each granularity;
and the vector acquisition unit is used for adding the products of each granularity characteristic and the corresponding normalization weight aiming at the search word and the search item to obtain the granularity vector of the search word and the granularity vector of the search item.
14. The artificial intelligence based semantic similarity obtaining apparatus according to claim 13, wherein the similarity calculating module includes:
the weighting unit is used for calculating the similarity between each granularity feature of the search word and the granularity vector of the search item to obtain a first weight, and calculating the similarity between each granularity feature of the search item and the granularity vector of the search item to obtain a second weight;
the weighting calculation unit is used for carrying out weighting calculation on each granularity characteristic of the search term and the first weight and carrying out weighting calculation on each granularity characteristic of the search item and the second weight;
an updating unit, configured to update the granularity vector of the search term and the granularity vector of the search entry using a result of the weighting calculation;
and the similarity calculation unit is used for calculating the similarity of the granularity vector of the search word and the granularity vector of the search item to obtain the similarity between the search word and the search item.
15. The artificial intelligence based semantic similarity obtaining apparatus according to claim 14, wherein the similarity calculating module further comprises:
and the iteration unit is used for iteratively executing similarity calculation on each granularity feature of the search word and the granularity vector of the search entry according to preset iteration times to obtain a first weight, and performing similarity calculation on each granularity feature of the search entry and the granularity vector of the search entry to obtain a second weight, and performing weighting calculation on each granularity feature of the search word and the first weight, and performing weighting calculation on each granularity feature of the search entry and the second weight until the iteration times are completed.
16. The artificial intelligence based semantic similarity acquisition apparatus according to claim 14 or 15, wherein the weighting calculation unit is specifically configured to:
normalizing the first weight and the second weight;
performing weighted calculation on each granularity characteristic of the search term and the normalized first weight;
and carrying out weighted calculation on each granularity characteristic of the search item and the normalized second weight.
CN201611042515.3A 2016-11-21 2016-11-21 Semantic similarity obtaining method and device based on artificial intelligence Active CN106776782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611042515.3A CN106776782B (en) 2016-11-21 2016-11-21 Semantic similarity obtaining method and device based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611042515.3A CN106776782B (en) 2016-11-21 2016-11-21 Semantic similarity obtaining method and device based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN106776782A CN106776782A (en) 2017-05-31
CN106776782B true CN106776782B (en) 2020-05-22

Family

ID=58974777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611042515.3A Active CN106776782B (en) 2016-11-21 2016-11-21 Semantic similarity obtaining method and device based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN106776782B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111128376B (en) * 2019-11-21 2023-06-16 泰康保险集团股份有限公司 Method and device for recommending evaluation form
CN111078849B (en) * 2019-12-02 2023-07-25 百度在线网络技术(北京)有限公司 Method and device for outputting information
CN111259118B (en) * 2020-05-06 2020-09-01 广东电网有限责任公司 Text data retrieval method and device
CN111931034B (en) * 2020-08-24 2024-01-26 腾讯科技(深圳)有限公司 Data searching method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011248596A (en) * 2010-05-26 2011-12-08 Hitachi Ltd Searching system and searching method for picture-containing documents
CN103838789A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Text similarity computing method
CN104462060A (en) * 2014-12-03 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for calculating text similarity and realizing search processing through computer
CN104899322A (en) * 2015-06-18 2015-09-09 百度在线网络技术(北京)有限公司 Search engine and implementation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011248596A (en) * 2010-05-26 2011-12-08 Hitachi Ltd Searching system and searching method for picture-containing documents
CN103838789A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Text similarity computing method
CN104462060A (en) * 2014-12-03 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for calculating text similarity and realizing search processing through computer
CN104899322A (en) * 2015-06-18 2015-09-09 百度在线网络技术(北京)有限公司 Search engine and implementation method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于概念语义相似度计算模型的信息检索研究;杨春龙等;《计算机应用与软件》;20130615;第88-92页 *

Also Published As

Publication number Publication date
CN106776782A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN107102981B (en) Word vector generation method and device
CN110750640B (en) Text data classification method and device based on neural network model and storage medium
CN106649818B (en) Application search intention identification method and device, application search method and server
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN107239497B (en) Hot content search method and system
CN109933686B (en) Song label prediction method, device, server and storage medium
CN111539197B (en) Text matching method and device, computer system and readable storage medium
CN106776782B (en) Semantic similarity obtaining method and device based on artificial intelligence
CN108509474A (en) Search for the synonym extended method and device of information
CN108875074A (en) Based on answer selection method, device and the electronic equipment for intersecting attention neural network
EP3314461A1 (en) Learning entity and word embeddings for entity disambiguation
CN110795913B (en) Text encoding method, device, storage medium and terminal
CN105930413A (en) Training method for similarity model parameters, search processing method and corresponding apparatuses
CN109902156B (en) Entity retrieval method, storage medium and electronic device
CN108875065B (en) Indonesia news webpage recommendation method based on content
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
CN110019669B (en) Text retrieval method and device
CN107291895B (en) Quick hierarchical document query method
CN109829045A (en) A kind of answering method and device
JP2020512651A (en) Search method, device, and non-transitory computer-readable storage medium
CN110674301A (en) Emotional tendency prediction method, device and system and storage medium
CN115658851A (en) Medical literature retrieval method, system, storage medium and terminal based on theme
CN111881264B (en) Method and electronic equipment for searching long text in question-answering task in open field
WO2016210203A1 (en) Learning entity and word embeddings for entity disambiguation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant