CN113377922B - Method, device, electronic equipment and medium for matching information - Google Patents

Method, device, electronic equipment and medium for matching information Download PDF

Info

Publication number
CN113377922B
CN113377922B CN202110711931.2A CN202110711931A CN113377922B CN 113377922 B CN113377922 B CN 113377922B CN 202110711931 A CN202110711931 A CN 202110711931A CN 113377922 B CN113377922 B CN 113377922B
Authority
CN
China
Prior art keywords
information
matched
matching degree
matching
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110711931.2A
Other languages
Chinese (zh)
Other versions
CN113377922A (en
Inventor
安叶嵩
李雅楠
何伯磊
和为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110711931.2A priority Critical patent/CN113377922B/en
Publication of CN113377922A publication Critical patent/CN113377922A/en
Application granted granted Critical
Publication of CN113377922B publication Critical patent/CN113377922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method, a device, electronic equipment and a medium for matching information, and relates to the technical field of computers, in particular to the fields of information retrieval and intelligent office. The specific implementation scheme is as follows: acquiring a query text and an information set to be matched; respectively generating overlapping matching degrees between the information to be matched in the information set to be matched and the query text, wherein the overlapping matching degrees are used for representing the literal matching degree between the information to be matched and the query text; selecting corresponding information to be matched with the overlapping matching degree larger than a preset filtering threshold value from the information to be matched to generate a quasi-matching information set; and selecting a first number of information to be matched from the quasi-matching information set as a matching result according to the generated overlapping matching degree, wherein the first number is not more than a preset number threshold value. Therefore, the accuracy of information matching is improved on the premise of meeting the matching time requirement, and the matching result is more concise and clear.

Description

Method, device, electronic equipment and medium for matching information
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to the field of information retrieval and intelligent office, and more particularly, to a method, an apparatus, an electronic device, and a medium for matching information.
Background
With the development of internet technology and the increase of data scale, the existing retrieval technology gradually establishes a framework taking recall-sorting as a core strategy.
In the prior art, a character matching method based on a tree (dictionary tree) is generally used in a recall stage; in the ranking stage, a large number of features are typically used for ranking.
Disclosure of Invention
Provided are a method, apparatus, electronic device, and storage medium for matching information.
According to a first aspect, there is provided a method for matching information, the method comprising: acquiring a query text and an information set to be matched; respectively generating overlapping matching degrees between the information to be matched in the information set to be matched and the query text, wherein the overlapping matching degrees are used for representing the literal matching degree between the information to be matched and the query text; selecting corresponding information to be matched with the overlapping matching degree larger than a preset filtering threshold value from the information to be matched to generate a quasi-matching information set; and selecting a first number of information to be matched from the quasi-matching information set as a matching result based on the generated overlapping matching degree, wherein the first number is not larger than a preset number threshold value.
According to a second aspect, there is provided an apparatus for matching information, the apparatus comprising: the acquisition unit is configured to acquire the query text and the information set to be matched; the generation unit is configured to generate overlapping matching degrees between the information to be matched in the information set to be matched and the query text respectively, wherein the overlapping matching degrees are used for representing the literal matching degree between the information to be matched and the query text; the first selecting unit is configured to select corresponding information to be matched, the overlapping matching degree of which is greater than a preset filtering threshold value, from the information to be matched to generate a quasi-matching information set; and a second selecting unit configured to select a first number of information to be matched from the quasi-matching information set as a matching result based on the generated overlapping matching degree, wherein the first number is not greater than a preset number threshold.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for enabling a computer to perform a method as described in any of the implementations of the first aspect.
According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.
According to the technology disclosed by the invention, the information in the information set to be matched is filtered through the overlapping matching degree and the preset filtering threshold value, and the information is further matched from the filtered information based on the overlapping matching degree, so that the accuracy of information matching is improved on the premise of meeting the matching time requirement instead of using a large number of more complex features; and the most relevant information is fed back through the setting of the preset number threshold value to be presented to the user, so that the matching result is more concise and clear, and the use efficiency of the user is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram of one application scenario in which a method for matching information of embodiments of the present disclosure may be implemented;
FIG. 4 is a schematic diagram of an apparatus for matching information according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing a method for matching information of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram 100 illustrating a first embodiment according to the present disclosure. The method for matching information comprises the following steps:
S101, acquiring a query text and an information set to be matched.
In this embodiment, the execution subject of the method for matching information may acquire the query text and the set of information to be matched in various ways. The query text may be a query word, a query sentence, etc. input by the user. The set of information to be matched may be a set containing information to be returned that matches the query text. As an example, the query text may be a keyword to be searched for, which is input by the user. The information set to be matched can be a chat record matched with the query range of the keyword to be searched. As yet another example, the query text described above may be a query statement generated by a user clicking on a "commonly used page". The set of information to be matched may be a preset page information list.
In this embodiment, the executing body may acquire the query text and the information set to be matched in various manners. As an example, the executing entity may obtain the query text and the set of information to be matched locally. As yet another example, the execution subject described above may also obtain the query text and the set of information to be matched, respectively, from communicatively connected electronic devices (e.g., a user terminal and a database server).
S102, respectively generating overlapping matching degrees between the information to be matched in the information set to be matched and the query text.
In this embodiment, the execution body may generate the overlapping matching degree between the information to be matched and the query text in the information to be matched set acquired in step S101 in various manners. The overlapping matching degree can be used for representing the literal matching degree between the information to be matched and the query text. As an example, the overlap matching degree may be determined based on an intersection between the information to be matched in the set of information to be matched and the query text, such as a ratio of the intersection to the union.
S103, selecting corresponding information to be matched with the overlapping matching degree larger than a preset filtering threshold value from the information to be matched to generate a quasi-matching information set.
In this embodiment, the executing body may select, from the to-be-matched information sets, to-be-matched information having an overlapping matching degree greater than a preset filtering threshold, which is generated in the corresponding step S102, by various manners to generate a quasi-matching information set. As an example, the execution body may combine all the information to be matched, whose generated overlap matching degree is greater than a preset filtering threshold, into a quasi-matching information set. As an example, the execution body may select the target number of pieces of information to be matched according to the order of the generated overlapping matching degree from large to small to generate the quasi-matching information set. The target number may be preset (e.g., 10), or may be dependent on the actual application (e.g., twice the first number).
S104, selecting a first number of information to be matched from the quasi-matching information set as a matching result based on the generated overlapping matching degree.
In this embodiment, based on the overlapping matching degree generated in step S102, the executing body may select, in various manners, the first number of information to be matched from the quasi-matching information set as the matching result. As an example, the execution body may select, according to the generated overlapping matching degree, a first number of information to be matched from the quasi-matching information set in order from large to small as the matching result. Wherein the first number may be not greater than a preset number threshold. As yet another example, the execution body may further generate a weighted matching degree based on the generated overlapping matching degree in combination with other forms of matching degrees, and select, from the quasi-matching information set, a first number of information to be matched as the matching result according to the order of the weighted matching degree from the high to the low. The matching degree of the other forms can include, but is not limited to, at least one of the following: cosine similarity, jekcard similarity coefficient (Jaccard similarity coefficient).
In the present embodiment, the above-described preset number threshold may be set to a smaller number, for example, 3 or 1, or the like. Therefore, the smaller the value of the preset number threshold is, the higher the accuracy requirement on matching is, and the more concise the matching result is presented.
In this embodiment, the preset filtering threshold may be flexibly set according to the actual application requirement.
It should be noted that, when the preset filtering threshold is set to a very low value (e.g., 0), it means that the filtering effect is reduced, and even no filtering is needed, so that the quasi-matching information set may also be the original to-be-matched information set.
According to the method provided by the embodiment of the disclosure, the information in the information set to be matched is filtered through the overlapping matching degree and the preset filtering threshold value, and the information is further matched from the filtered information based on the overlapping matching degree, so that the accuracy of information matching is improved on the premise of meeting the matching time requirement instead of using a large number of more complex features; and the most relevant information is fed back through the setting of the preset number threshold value to be presented to the user, so that the matching result is more concise and clear, and the use efficiency of the user is improved.
In some optional implementations of this embodiment, the executing body may generate the overlapping matching degree between the information to be matched and the query text in the information set to be matched according to the following steps:
first, determining a first sub-overlap matching degree between the information to be matched in the information set to be matched and the query text.
In these implementations, the first sub-overlap match may be positively correlated with the intersection vocabulary. The first sub-overlap matching degree may be inversely related to the number of segmented words obtained after the segmentation of the query text. As an example, the first sub-overlap matching degree may include cqr.
And a second step of determining a second sub-overlap matching degree between the information to be matched in the information set to be matched and the query text.
In these implementations, the second sub-overlap match may be positively correlated with the intersection vocabulary. The second sub-overlapping matching degree may be inversely related to the number of segmented words obtained after the segmentation of the information to be matched in the information set to be matched. As an example, the second sub-overlap matching degree may include ctr.
And thirdly, generating the overlapping matching degree according to at least one of the first sub-overlapping matching degree and the second sub-overlapping matching degree.
In these implementations, the execution subject may generate the overlap matching degree in various manners according to at least one of the first sub-overlap matching degree generated in the first step and the second sub-overlap matching degree generated in the second step. As an example, the execution body may use the first sub-overlap matching degree and the second sub-overlap matching degree as the overlap matching degree, alone or in combination. As yet another example, the execution body may further use a maximum value or a minimum value among the first sub-overlap matching degree and the second sub-overlap matching degree as the overlap matching degree.
Based on the optional implementation manner, the scheme provides a new method for calculating the overlapping matching degree, and further enriches the generation manner of the overlapping matching degree.
Optionally, based on the optional implementation manner, according to at least one of the first sub-overlap matching degree and the second sub-overlap matching degree, the execution body may generate the overlap matching degree according to the following steps:
s3-1, in response to determining that the query text hits a keyword of the information to be matched in the information set to be matched, correcting at least one of the first sub-overlap matching degree and the second sub-overlap matching degree of the information to be matched corresponding to the hit keyword so as to increase the numerical value of at least one of the first sub-overlap matching degree and the second sub-overlap matching degree.
In these implementations, in response to determining that the query text hits a keyword of the information to be matched in the information set to be matched, at least one of a first sub-overlap matching degree and a second sub-overlap matching degree of the information to be matched corresponding to the hit keyword is modified to increase a value of at least one of the first sub-overlap matching degree and the second sub-overlap matching degree. As an example, if the to-be-matched information set includes to-be-matched information in which the corresponding keyword matches the query text, the execution body may multiply at least one of the first sub-overlap matching degree and the second sub-overlap matching degree corresponding to the to-be-matched information by a coefficient greater than 1.
S3-2, generating the overlapping matching degree according to at least one of the corrected first sub-overlapping matching degree and the corrected second sub-overlapping matching degree.
In these implementations, the execution subject may generate the overlap matching degree in various manners according to at least one of the corrected first sub-overlap matching degree and the corrected second sub-overlap matching degree obtained in the step S3-1. The manner of generating the overlap matching degree may be the same as the manner of generating the overlap matching degree in the third step, which is not described herein.
In some optional implementations of this embodiment, the preset filtering threshold may be inversely related to the number of segmented words obtained after the segmentation of the query text. As an example, when the number of segmented words obtained after the segmentation of the query text is not greater than 2, the preset filtering threshold may be set between 0.8 and 1. As yet another example, when the number of segmented words obtained after the segmentation of the query text is not less than 3, the preset filtering threshold may be set to 0.6.
Based on the optional implementation manner, the second preset threshold is adjusted according to the number of segmented words obtained after the segmentation of the query text, so that the matching accuracy and success rate can be improved.
It should be noted that, based on the above alternative implementation manner, if the overlap matching degree is a value (for example, the first sub-overlap matching degree, the second sub-overlap matching degree, or the overlap matching degree generated based on the first sub-overlap matching degree and the second sub-overlap matching degree), the preset filtering threshold is also a value; if the overlap matching degree is two values (for example, a first sub-overlap matching degree and a second sub-overlap matching degree), the preset filtering threshold may be set for the first sub-overlap matching degree and the second sub-overlap matching degree, respectively. The preset filtering thresholds may be the same (e.g., 0.6) or different (e.g., 0.8 and 0.6, respectively), which are not limited herein.
In some optional implementations of this embodiment, the information to be matched in the set of information to be matched may include website information obtained by recall. The at least one keyword included in the website information may include: the words with highest occurrence frequency in the values of the website titles and website keywords of the websites corresponding to the network information.
In these implementations, when the scheme is applied to the sorting step of the information retrieval technology, the information to be matched in the information set to be matched may further include website information obtained in the recall stage. Wherein, the at least one keyword included in the website information may include: the website corresponding to the network information passes through the highest frequency word (term) among the website title (title) and the website keywords (keywords) set by the source code.
Optionally, the website information may further include a website address and custom description information of the website. The website information, the website name and the website title (title)
Based on the above optional implementation manner, the scheme can be applied to the sorting stage of website retrieval, so that the information retrieval efficiency in the scene is improved. And the method is particularly suitable for searching common websites in enterprises, ensures the accuracy of information matching while improving the matching speed, and is beneficial to the quick presentation of matching results.
In some optional implementations of this embodiment, the execution body may further send the matching result obtained in step S104 to the target device. The target device may be, for example, a user terminal that sends the query text. Therefore, the matching result matched with the query text can be conveniently obtained by the user through the target equipment.
With continued reference to fig. 2, fig. 2 is a schematic diagram 200 according to a second embodiment of the present disclosure. The method for matching information comprises the following steps:
s201, acquiring a query text and an information set to be matched.
S202, respectively generating overlapping matching degrees between the information to be matched in the information set to be matched and the query text.
S203, selecting corresponding information to be matched with the overlapping matching degree larger than a preset filtering threshold value from the information to be matched to generate a quasi-matching information set.
The above S201, S202, S203 may be respectively identical to S101, S102, S103 and optional implementations thereof in the foregoing embodiments, and the descriptions of the above S101, S102, S103 and optional implementations thereof are also applicable to S201, S202, S203, which are not repeated herein.
S204, acquiring semantic matching degree between the information to be matched in the information set to be matched and the query text.
In this embodiment, the executing body may obtain the semantic matching degree between the information to be matched in the information set to be matched and the query text in various manners. The semantic matching degree can be used for representing similarity of a semantic layer. As an example, the above semantic matching degree may be obtained by various natural language processing models based on machine learning.
In some optional implementations of the present embodiment, the semantic matching degree may be generated by:
inputting the information to be matched and the query text in the information set to be matched into a pre-trained semantic matching model, and respectively generating the semantic matching degree between each input information to be matched and the query text.
In these implementations, the semantic matching model described above may include a feature extraction layer, a feature fusion layer, and an output layer. The feature extraction layer may be used to extract the word features and semantic features of the information to be matched and the query text input into the semantic matching model. The literal features may include, but are not limited to, at least one of: BM25 (Best Match 25), TF-IDF (term frequency-reverse document frequency), edit distance. The semantic features may include, but are not limited to, features extracted using models (e.g., simNet-BOW, simNet-RNN, simNet-CNN) trained using semantic matching SimNet architecture. The feature fusion layer may fuse features obtained by the feature extraction layer. The output layer can generate semantic matching degree between the input information to be matched and the query text according to the fused features obtained by the feature fusion layer. The semantic matching degree can be used for representing the semantic similarity degree between the input information to be matched and the query text. For example, the meaning of "1" indicates that the two are identical, and the meaning of "0" indicates that the two are completely different.
In these implementations, the semantic matching model may include various models trained based on machine learning methods, which are not described herein.
Based on the optional implementation manner, the scheme can generate the semantic matching degree between the information to be matched and the query text by utilizing the semantic matching model fused with the literal features and the semantic features, so that the accuracy of the semantic matching degree is improved, and the accuracy of the matching degree is further improved.
Optionally, based on the optional implementation manner, the information to be matched in the set of information to be matched may include a title and at least one keyword. The executing body may input the information to be matched in the information set to be matched and the query text to a pre-trained semantic matching model according to the following steps, and generate semantic matching degrees between each input information to be matched and the query text respectively:
the method comprises the steps of firstly, inputting a title and at least one keyword of information to be matched in an information set to be matched and a query text into a pre-trained semantic matching model, and respectively generating a first sub-semantic matching degree and a second sub-semantic matching degree corresponding to each input information to be matched.
In these implementations, the first sub-semantic matching degree may be used to characterize a semantic matching degree between a title of the information to be matched and the query text. The second sub-semantic matching degree may be used to characterize a semantic matching degree between a keyword of the information to be matched and the query text.
As an example, the set of information to be matched may include a title Y and a keyword K corresponding to the text X 1 、K 2 . The execution body may input the title Y and the query text into the pre-trained semantic matching model, thereby generating a semantic matching degree between the title Y and the query text as a first sub-semantic matching degree. Similarly, the execution subject may execute the keyword K 1 And inputting the query text into the pre-trained semantic matching model, thereby generating a keyword K 1 And the semantic matching degree between the query texts is used as a second sub-semantic matching degree. The execution subject may execute the keyword K 2 And inputting the query text into the pre-trained semantic matching model, thereby generating a keyword K 2 And the semantic matching degree between the query text as yet another second sub-semantic matching degree.
And secondly, selecting the maximum value from the generated first sub-semantic matching degree and the second sub-semantic matching degree corresponding to the same information to be matched as the semantic matching degree between the information to be matched and the query text.
In these implementations, as an example, the execution body may select a maximum value from the first sub-semantic matching degree and the two second sub-semantic matching degrees generated in the first step as the title Y and the keyword K 1 、K 2 Semantic matching degree of corresponding information to be matched (e.g., text X).
It should be noted that the pre-trained semantic matching model may be consistent with the foregoing description, and will not be described herein.
Based on the optional implementation manner, the scheme can provide a reference for information matching from the matching angle of the title and the keyword, thereby providing a new semantic matching degree generation manner.
S205, respectively generating comprehensive matching degrees between the information to be matched in the quasi matching information set and the query text according to the generated overlapping matching degrees and the acquired semantic matching degrees.
In this embodiment, according to the overlapping matching degree generated in the step S202 and the semantic matching degree obtained in the step S204, the execution subject may generate the comprehensive matching degree between the information to be matched in the quasi-matching information set generated in the step S203 and the query text in various manners.
In this embodiment, for the information to be matched in the quasi-matching information set, as an example, the execution subject may select a maximum value or a minimum value among the generated overlapping matching degree and the acquired semantic matching degree corresponding to the information to be matched as the comprehensive matching degree between the information to be matched and the query text. As yet another example, the execution subject may select a weighted sum or product of the obtained overlapping matching degree and the generated semantic matching degree corresponding to the information to be matched as the comprehensive matching degree between the information to be matched and the query text.
S206, selecting a first number of information to be matched from the quasi-matching information set as a matching result according to the generated comprehensive matching degree.
In this embodiment, according to the comprehensive matching degree generated in step S205, the execution subject may select the first number of pieces of information to be matched from the quasi-matching information set generated in step S204 in various manners as the matching result. As an example, the execution body may select the first N pieces of corresponding information to be matched with the greatest comprehensive matching degree as the matching result. Wherein N is the first number.
In some optional implementations of this embodiment, the information to be matched in the set of information to be matched may include website information obtained by recall. The at least one keyword included in the website information may include: the words with highest occurrence frequency in the values of the website titles and website keywords of the websites corresponding to the network information.
As can be seen from fig. 2, the flow 200 of the method for matching information in this embodiment shows the steps of respectively generating the comprehensive matching degree between the information to be matched in the quasi-matching information set and the query text according to the generated overlapping matching degree and the acquired semantic matching degree, and selecting the first number of information to be matched from the quasi-matching information set as the matching result according to the generated comprehensive matching degree. Therefore, after filtering through the overlapping matching degree, the scheme described in the embodiment can comprehensively consider the semantic matching degree and the text overlapping degree to further perform information matching on the filtered information, so that the information matching effect can be improved under the condition of reducing matching time as much as possible.
With continued reference to fig. 3, fig. 3 is a schematic illustration of an application scenario of a method for matching information according to an embodiment of the present disclosure. In the application scenario of fig. 3, a user 301 may send query text "xxx networks" 303 to a server 304 using a terminal device 302. The server 304 may generate the overlapping matching degree 306 between the website information to be matched and the query text 303 stored in the local website information set to be matched 304. Then, the server 304 selects the corresponding website information to be matched with the overlapping matching degree larger than a preset filtering threshold (for example, 0.6) from the website information set to be matched 305 to generate a quasi-matching website information set 307. Then, based on the generated overlap matching degree 306, the server 304 selects, as a matching result, the website information 308 to be matched having the largest overlap matching degree from the above-described quasi-matching network information set 307. Optionally, the server 304 may also send the website information to be matched 308 to the terminal device 302.
Currently, one of the existing technologies is to sort the recalled results with a large number of features, since too many features need to occupy large computing resources and time, and for a more limited scenario such as an intra-enterprise search, too many features are instead prone to degrading the accuracy of the matching. According to the method provided by the embodiment of the disclosure, the information in the information set to be matched is filtered through the overlapping matching degree and the preset filtering threshold value, and the information is further matched from the filtered information based on the overlapping matching degree, so that the accuracy of information matching is improved on the premise of meeting the matching time requirement instead of using a large number of more complex features; and the most relevant information is fed back through the setting of the preset number threshold value to be presented to the user, so that the matching result is more concise and clear, and the use efficiency of the user is improved.
With further reference to fig. 4, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an apparatus for matching information, which corresponds to the method embodiment shown in fig. 1 or fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 4, the apparatus 400 for matching information provided in this embodiment includes an acquisition unit 401, a generation unit 402, a first selection unit 403, and a second selection unit 404. Wherein, the obtaining unit 401 is configured to obtain the query text and the information set to be matched; a generating unit 402 configured to generate overlapping matching degrees between the information to be matched and the query text in the information set to be matched, where the overlapping matching degrees are used to represent a literal matching degree between the information to be matched and the query text; a first selection unit 403 configured to characterize a degree of literal matching between the information to be matched and the query text; the second selecting unit 404 is configured to select, based on the generated overlapping matching degree, a first number of information to be matched from the quasi-matching information set as a matching result, where the first number is not greater than a preset number threshold.
In this embodiment, in the apparatus 400 for matching information: the specific processing of the obtaining unit 401, the generating unit 402, the first selecting unit 403, and the second selecting unit 404 and the technical effects thereof may refer to the relevant descriptions of steps S101, S102, S103, and S104 in the corresponding embodiment of fig. 1, and are not repeated herein.
In some optional implementations of this embodiment, the generating unit 402 may include: a first determining module (not shown in the figure) configured to determine a first sub-overlap matching degree between the information to be matched in the information set to be matched and the query text, wherein the first sub-overlap matching degree is positively correlated with the intersection vocabulary, and the first sub-overlap matching degree is negatively correlated with the number of segmented words obtained after the segmentation of the query text; a second determining module (not shown in the figure) configured to determine a second sub-overlapping matching degree between the information to be matched in the information set to be matched and the query text, where the second sub-overlapping matching degree is positively correlated with the intersection vocabulary, and the second sub-overlapping matching degree is negatively correlated with the number of segmented words obtained after the segmentation of the information to be matched in the information set to be matched; a generating module (not shown in the figure) is configured to generate the overlap matching degree according to at least one of the first sub-overlap matching degree and the second sub-overlap matching degree.
In some optional implementations of this embodiment, the generating module may be further configured to: in response to determining that the query text hits a keyword of the information to be matched in the information set to be matched, correcting at least one of a first sub-overlap matching degree and a second sub-overlap matching degree of the information to be matched corresponding to the hit keyword so as to increase a numerical value of at least one of the first sub-overlap matching degree and the second sub-overlap matching degree; and generating the overlapping matching degree according to at least one of the corrected first sub-overlapping matching degree and the corrected second sub-overlapping matching degree.
In some optional implementations of this embodiment, the second selecting unit 404 may be further configured to: acquiring semantic matching degree between the information to be matched in the information set to be matched and the query text; respectively generating comprehensive matching degrees between the information to be matched in the quasi matching information set and the query text according to the generated overlapping matching degrees and the acquired semantic matching degrees; and selecting a first number of information to be matched from the quasi-matching information set as a matching result according to the generated comprehensive matching degree.
In some optional implementations of the present embodiment, the semantic matching degree may be generated by: inputting the information to be matched and the query text in the information set to be matched into a pre-trained semantic matching model, and respectively generating semantic matching degrees between the input information to be matched and the query text, wherein the semantic matching model comprises a feature extraction layer, a feature fusion layer and an output layer, and the feature extraction layer is used for extracting the literal features and the semantic features of the information to be matched and the query text of the input semantic matching model.
In some optional implementations of this embodiment, the information to be matched in the set of information to be matched may include a title and at least one keyword. Inputting the information to be matched and the query text in the information set to be matched into a pre-trained semantic matching model, and respectively generating the semantic matching degree between each input information to be matched and the query text, wherein the method comprises the following steps: inputting the title and at least one keyword of the information to be matched in the information set to be matched and the query text into a pre-trained semantic matching model, and respectively generating a first sub-semantic matching degree and a second sub-semantic matching degree corresponding to each input information to be matched, wherein the first sub-semantic matching degree is used for representing the semantic matching degree between the title of the information to be matched and the query text, and the second sub-semantic matching degree is used for representing the semantic matching degree between the keyword of the information to be matched and the query text; and selecting the maximum value from the generated first sub-semantic matching degree and the second sub-semantic matching degree which correspond to the same information to be matched as the semantic matching degree between the information to be matched and the query text.
In some optional implementations of this embodiment, the preset filtering threshold may be inversely related to the number of segmented words obtained after the query text is segmented.
In some optional implementations of this embodiment, the information to be matched in the set of information to be matched may include website information obtained by recall. The at least one keyword included in the website information may include: the words with highest occurrence frequency in the values of the website titles and website keywords of the websites corresponding to the network information.
According to the device provided by the embodiment of the disclosure, the first selection unit 403 filters the information to be matched in the information set to be matched acquired by the acquisition unit 401 according to the overlapping matching degree generated by the generation unit 402, and the second selection unit 404 further performs information matching from the filtered information based on the overlapping matching degree generated by the generation unit 402 to obtain a matching result, so that compared with the existing method for performing information matching by using a large number of more complex features, the accuracy of information matching is improved on the premise of meeting the matching time requirement; and the most relevant information is fed back through the setting of the preset number threshold value to be presented to the user, so that the matching result is more concise and clear, and the use efficiency of the user is improved.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related personal information of the user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the public order harmony is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 includes a computing unit 501 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 505 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the respective methods and processes described above, for example, a method for matching information. For example, in some embodiments, the method for matching information may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the method for matching information described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the method for matching information by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (14)

1. A method for matching information, comprising:
acquiring a query text and an information set to be matched;
respectively generating overlapping matching degrees between the information to be matched in the information set to be matched and the query text, wherein the overlapping matching degrees are used for representing the literal matching degree between the information to be matched and the query text;
selecting corresponding information to be matched with the overlapping matching degree larger than a preset filtering threshold value from the information to be matched to generate a quasi-matching information set;
Selecting a first number of information to be matched from the quasi-matching information set as a matching result based on the generated overlapping matching degree, wherein the first number is not more than a preset number threshold;
the selecting, based on the generated overlapping matching degree, a first number of information to be matched from the quasi-matching information set as a matching result includes: inputting the information to be matched in the information set to be matched and the query text into a pre-trained semantic matching model, and respectively generating semantic matching degrees between the input information to be matched and the query text, wherein the semantic matching model comprises a feature extraction layer, a feature fusion layer and an output layer, and the feature extraction layer is used for extracting the literal features and the semantic features of the information to be matched and the query text which are input into the semantic matching model; respectively generating comprehensive matching degrees between the information to be matched in the quasi matching information set and the query text according to the generated overlapping matching degrees and semantic matching degrees; and selecting the first number of information to be matched from the quasi-matching information set as a matching result according to the generated comprehensive matching degree.
2. The method of claim 1, wherein the generating the overlap matching degree between the information to be matched in the information set to be matched and the query text respectively includes:
determining a first sub-overlapping matching degree between the information to be matched in the information set to be matched and the query text, wherein the first sub-overlapping matching degree is positively correlated with the intersection vocabulary, and the first sub-overlapping matching degree is negatively correlated with the number of segmented words obtained after word segmentation of the query text;
determining a second sub-overlapping matching degree between the information to be matched in the information set to be matched and the query text, wherein the second sub-overlapping matching degree is positively correlated with the intersection vocabulary, and the second sub-overlapping matching degree is negatively correlated with the number of segmented words obtained after the segmentation of the information to be matched in the information set to be matched;
and generating the overlapping matching degree according to at least one of the first sub-overlapping matching degree and the second sub-overlapping matching degree.
3. The method of claim 2, wherein the generating an overlap match from at least one of the first sub-overlap match and the second sub-overlap match comprises:
In response to determining that the query text hits a keyword of the information to be matched in the information set to be matched, correcting at least one of a first sub-overlap matching degree and a second sub-overlap matching degree of the information to be matched corresponding to the hit keyword so as to increase a numerical value of at least one of the first sub-overlap matching degree and the second sub-overlap matching degree;
and generating the overlapping matching degree according to at least one of the corrected first sub-overlapping matching degree and the corrected second sub-overlapping matching degree.
4. The method of claim 1, wherein the information to be matched in the set of information to be matched comprises a title and at least one keyword; and
inputting the information to be matched in the information set to be matched and the query text into a pre-trained semantic matching model, and respectively generating the semantic matching degree between each input information to be matched and the query text, wherein the method comprises the following steps:
inputting the title and at least one keyword of the information to be matched in the information set to be matched and the query text into the pre-trained semantic matching model respectively, and generating a first sub-semantic matching degree and a second sub-semantic matching degree corresponding to each input information to be matched respectively, wherein the first sub-semantic matching degree is used for representing the semantic matching degree between the title of the information to be matched and the query text, and the second sub-semantic matching degree is used for representing the semantic matching degree between the keyword of the information to be matched and the query text;
And selecting the maximum value from the generated first sub-semantic matching degree and the second sub-semantic matching degree which correspond to the same information to be matched as the semantic matching degree between the information to be matched and the query text.
5. The method according to one of claims 1 to 4, wherein the preset filtering threshold is inversely related to the number of segmented words obtained after segmentation of the query text.
6. The method according to one of claims 1-4, wherein the information to be matched in the set of information to be matched comprises recalled website information comprising at least one keyword comprising: and the words with highest occurrence frequency in the values of the website titles and website keywords of the websites corresponding to the website information.
7. An apparatus for matching information, comprising:
the acquisition unit is configured to acquire the query text and the information set to be matched;
the generating unit is configured to generate overlapping matching degrees between the information to be matched in the information set to be matched and the query text respectively, wherein the overlapping matching degrees are used for representing the literal matching degree between the information to be matched and the query text;
the first selecting unit is configured to select corresponding information to be matched with the overlapping matching degree larger than a preset filtering threshold value from the information to be matched to generate a quasi-matching information set;
A second selecting unit configured to select, based on the generated overlapping matching degree, a first number of pieces of information to be matched from the quasi-matching information set as a matching result, wherein the first number is not greater than a preset number threshold;
wherein the second selection unit is further configured to: inputting the information to be matched in the information set to be matched and the query text into a pre-trained semantic matching model, and respectively generating semantic matching degrees between the input information to be matched and the query text, wherein the semantic matching model comprises a feature extraction layer, a feature fusion layer and an output layer, and the feature extraction layer is used for extracting the literal features and the semantic features of the information to be matched and the query text which are input into the semantic matching model; respectively generating comprehensive matching degrees between the information to be matched in the quasi matching information set and the query text according to the generated overlapping matching degrees and semantic matching degrees; and selecting the first number of information to be matched from the quasi-matching information set as a matching result according to the generated comprehensive matching degree.
8. The apparatus of claim 7, wherein the generating unit comprises:
The first determining module is configured to determine a first sub-overlapping matching degree between the information to be matched in the information set to be matched and the query text, wherein the first sub-overlapping matching degree is positively correlated with intersection vocabulary, and the first sub-overlapping matching degree is negatively correlated with the number of segmented words obtained after word segmentation of the query text;
the second determining module is configured to determine a second sub-overlapping matching degree between the information to be matched in the information set to be matched and the query text, wherein the second sub-overlapping matching degree is positively correlated with the intersection vocabulary, and the second sub-overlapping matching degree is negatively correlated with the number of segmented words obtained after the segmentation of the information to be matched in the information set to be matched;
and the generating module is configured to generate the overlapping matching degree according to at least one of the first sub-overlapping matching degree and the second sub-overlapping matching degree.
9. The apparatus of claim 8, wherein the generation module is further configured to:
in response to determining that the query text hits a keyword of the information to be matched in the information set to be matched, correcting at least one of a first sub-overlap matching degree and a second sub-overlap matching degree of the information to be matched corresponding to the hit keyword so as to increase a numerical value of at least one of the first sub-overlap matching degree and the second sub-overlap matching degree;
And generating the overlapping matching degree according to at least one of the corrected first sub-overlapping matching degree and the corrected second sub-overlapping matching degree.
10. The apparatus of claim 7, wherein the information to be matched in the set of information to be matched comprises a title and at least one keyword; and
inputting the information to be matched in the information set to be matched and the query text into a pre-trained semantic matching model, and respectively generating the semantic matching degree between each input information to be matched and the query text, wherein the method comprises the following steps:
inputting the title and at least one keyword of the information to be matched in the information set to be matched and the query text into the pre-trained semantic matching model respectively, and generating a first sub-semantic matching degree and a second sub-semantic matching degree corresponding to each input information to be matched respectively, wherein the first sub-semantic matching degree is used for representing the semantic matching degree between the title of the information to be matched and the query text, and the second sub-semantic matching degree is used for representing the semantic matching degree between the keyword of the information to be matched and the query text;
and selecting the maximum value from the generated first sub-semantic matching degree and the second sub-semantic matching degree which correspond to the same information to be matched as the semantic matching degree between the information to be matched and the query text.
11. The apparatus according to one of claims 7-10, wherein the preset filtering threshold is inversely related to a number of segmented words obtained after the segmentation of the query text.
12. The apparatus of one of claims 7-10, wherein the information to be matched in the set of information to be matched comprises recalled website information comprising at least one keyword comprising: and the words with highest occurrence frequency in the values of the website titles and website keywords of the websites corresponding to the website information.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.
CN202110711931.2A 2021-06-25 2021-06-25 Method, device, electronic equipment and medium for matching information Active CN113377922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110711931.2A CN113377922B (en) 2021-06-25 2021-06-25 Method, device, electronic equipment and medium for matching information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110711931.2A CN113377922B (en) 2021-06-25 2021-06-25 Method, device, electronic equipment and medium for matching information

Publications (2)

Publication Number Publication Date
CN113377922A CN113377922A (en) 2021-09-10
CN113377922B true CN113377922B (en) 2024-04-02

Family

ID=77579214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110711931.2A Active CN113377922B (en) 2021-06-25 2021-06-25 Method, device, electronic equipment and medium for matching information

Country Status (1)

Country Link
CN (1) CN113377922B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103813279A (en) * 2012-11-14 2014-05-21 中国移动通信集团设计院有限公司 Junk short message detecting method and device
CN103886034A (en) * 2014-03-05 2014-06-25 北京百度网讯科技有限公司 Method and equipment for building indexes and matching inquiry input information of user
CN110377714A (en) * 2019-07-18 2019-10-25 泰康保险集团股份有限公司 Text matching technique, device, medium and equipment based on transfer learning
CN112749255A (en) * 2020-12-30 2021-05-04 科大国创云网科技有限公司 Human-computer interaction semantic recognition intention matching method and system based on ES

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9836529B2 (en) * 2014-09-22 2017-12-05 Oracle International Corporation Semantic text search

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103813279A (en) * 2012-11-14 2014-05-21 中国移动通信集团设计院有限公司 Junk short message detecting method and device
CN103886034A (en) * 2014-03-05 2014-06-25 北京百度网讯科技有限公司 Method and equipment for building indexes and matching inquiry input information of user
CN110377714A (en) * 2019-07-18 2019-10-25 泰康保险集团股份有限公司 Text matching technique, device, medium and equipment based on transfer learning
CN112749255A (en) * 2020-12-30 2021-05-04 科大国创云网科技有限公司 Human-computer interaction semantic recognition intention matching method and system based on ES

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于上下文的深度语义句子检索模型;范意兴;郭嘉丰;兰艳艳;徐君;程学旗;;中文信息学报(第05期);全文 *

Also Published As

Publication number Publication date
CN113377922A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN114549874B (en) Training method of multi-target image-text matching model, image-text retrieval method and device
US20220027569A1 (en) Method for semantic retrieval, device and storage medium
CN114861889B (en) Deep learning model training method, target object detection method and device
CN112506864B (en) File retrieval method, device, electronic equipment and readable storage medium
CN113660541B (en) Method and device for generating abstract of news video
CN112989235B (en) Knowledge base-based inner link construction method, device, equipment and storage medium
CN113806660A (en) Data evaluation method, training method, device, electronic device and storage medium
EP3992814A2 (en) Method and apparatus for generating user interest profile, electronic device and storage medium
CN114244795B (en) Information pushing method, device, equipment and medium
CN113919424A (en) Training of text processing model, text processing method, device, equipment and medium
CN113806483A (en) Data processing method and device, electronic equipment and computer program product
CN117171296A (en) Information acquisition method and device and electronic equipment
CN115168537B (en) Training method and device for semantic retrieval model, electronic equipment and storage medium
CN114818736B (en) Text processing method, chain finger method and device for short text and storage medium
CN111666417A (en) Method and device for generating synonyms, electronic equipment and readable storage medium
CN116597443A (en) Material tag processing method and device, electronic equipment and medium
CN116049370A (en) Information query method and training method and device of information generation model
CN113377922B (en) Method, device, electronic equipment and medium for matching information
CN113377921B (en) Method, device, electronic equipment and medium for matching information
CN115098729A (en) Video processing method, sample generation method, model training method and device
CN114218431A (en) Video searching method and device, electronic equipment and storage medium
CN114329206A (en) Title generation method and device, electronic equipment and computer readable medium
CN114417862A (en) Text matching method, and training method and device of text matching model
CN114328855A (en) Document query method and device, electronic equipment and readable storage medium
CN113743112A (en) Keyword extraction method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant