CN116383340A - Information searching method, device, electronic equipment and storage medium - Google Patents

Information searching method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116383340A
CN116383340A CN202310115644.4A CN202310115644A CN116383340A CN 116383340 A CN116383340 A CN 116383340A CN 202310115644 A CN202310115644 A CN 202310115644A CN 116383340 A CN116383340 A CN 116383340A
Authority
CN
China
Prior art keywords
search
candidate information
information
word
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310115644.4A
Other languages
Chinese (zh)
Inventor
戴松泰
姜文斌
孙卓
崔骁鹏
吕雅娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310115644.4A priority Critical patent/CN116383340A/en
Publication of CN116383340A publication Critical patent/CN116383340A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools

Abstract

The disclosure provides an information searching method, relates to the technical field of artificial intelligence, and particularly relates to the technical field of deep learning and natural language processing. The specific implementation scheme is as follows: respectively generating text features of the search words and text features of candidate information in the information base; generating text features of the target term in response to determining that the search term and the candidate information contain the same target term; generating fusion features of the search word and fusion features of the candidate information according to the text features of the search word, the text features of the candidate information and the text features of the target proper nouns; determining the similarity between the search word and the candidate information according to the fusion characteristic of the search word and the fusion characteristic of the candidate information; and determining information search results for the search terms according to the similarity. The disclosure also provides an information searching device, an electronic device and a storage medium.

Description

Information searching method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the field of deep learning and natural language processing. More particularly, the present disclosure provides an information searching method, apparatus, electronic device, and storage medium.
Background
Information searching (or information retrieval, information query) is an important branch of the natural language processing field. The information searching task is to give a search word Query (or a search word or a Query word) and recall the information result most relevant to the Query from the candidate information base.
Disclosure of Invention
The disclosure provides an information searching method, an information searching device, information searching equipment and a storage medium.
According to a first aspect, there is provided an information search method comprising: respectively generating text features of the search words and text features of candidate information in the information base; generating text features of the target term in response to determining that the search term and the candidate information contain the same target term; generating fusion features of the search word and fusion features of the candidate information according to the text features of the search word, the text features of the candidate information and the text features of the target proper nouns; determining the similarity between the search word and the candidate information according to the fusion characteristic of the search word and the fusion characteristic of the candidate information; and determining information search results for the search terms according to the similarity.
According to a second aspect, there is provided an information search apparatus comprising: the first generation module is used for respectively generating text features of the search words and text features of candidate information in the information base; the second generation module is used for generating text features of the target proper nouns in response to the fact that the search word and the candidate information contain the same target proper nouns; the third generation module is used for generating fusion features of the search word and fusion features of the candidate information according to the text features of the search word, the text features of the candidate information and the text features of the target proper nouns; the first determining module is used for determining the similarity between the search word and the candidate information according to the fusion characteristic of the search word and the fusion characteristic of the candidate information; and the second determining module is used for determining information search results aiming at the search words according to the similarity.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method provided according to the present disclosure.
According to a fifth aspect, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements a method provided according to the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a dual column model in the related art;
FIG. 2 is a flow chart of an information search method according to one embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a method of determining target terminology in accordance with one embodiment of the present disclosure;
FIG. 4 is a schematic illustration of a dual tower model according to one embodiment of the present disclosure;
FIG. 5 is a block diagram of an information search apparatus according to one embodiment of the present disclosure;
fig. 6 is a block diagram of an electronic device of an information search method according to one embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The information search task generally calculates the similarity between the search word Query and the plurality of candidate information by means of a sorting model, sorts the plurality of candidate information according to the order of the similarity from large to small, and obtains the sorting result of the plurality of candidate information as the information search result.
Generally, the higher the rank the higher the relevance of the top search results to the search term Query. However, in some search ranking tasks of special business scenarios, specific proper nouns (hereinafter referred to as proper nouns) exist, and the ranking model is insensitive to the proper nouns, so that after the search results containing the proper nouns are ranked, the ranking results are inaccurate, and the user experience is poor.
For example, for some business users, each business user may have its own common proprietary words that are not common among different users. When the ranking model is deployed in the private domain of the enterprise user, the ranking model has poor recognition capability for the special proprietary words.
For example, for a banking user, "XXX" in "XXX electronic consumer coupons" belongs to the user's private word, however the ranking model may not recognize that "XXX" is a word. For a restaurant user, "ABC" belongs to the user's nomination, however the ranking model may treat it as three separate words "a", "B" and "C". Both of these conditions may result in incorrect search results.
To ensure a user experience, such errors need to be corrected. There are several methods currently available.
One is to add a manual intervention strategy, such as manually setting rules, behind the ranking model so that the search results containing the proprietary word in the ranking result are rearranged to the first place. However, this method can only solve a single case, and similar problems may be expressed as problems, so that the universality is poor.
Another approach is to retrain the ranking model on a better dataset. But the time and labor costs of retraining the model for each user are relatively high and are not economically viable.
Therefore, the existing solution to the above situation has high intervention cost, the intervention cannot be immediately effective, the effective intervention requires retraining a model or retraining deployment rules, and relatively long development iteration time and labor are required. In addition, whether the retraining model is additionally regulated, the influence range is difficult to control, the search result which does not need to be interfered is likely to be influenced, and unexpected side effects are generated.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.
The ranking model in the information search task may be a double-tower model. The double-tower model includes a left tower and a right tower that include a first natural language processing model for the search term Query and a second natural language processing model for the candidate information, respectively. The first natural language processing model and the second natural language processing model may both be ERNIE (Enhanced language Representation with Informative Entities) models.
The candidate information may be from a candidate information base, which includes a plurality of candidate information, and the candidate information may be titles of contents such as news, resources, events, and the like. The candidate information may also be referred to as a candidate Title.
Fig. 1 is a schematic diagram of a double column model in the related art.
As shown in FIG. 1, the search term Query is input into a first natural language processing model 110, resulting in text features 111 of the search term Query. The candidate Title is input into the second natural language processing model 120, resulting in text features 121 of the candidate Title.
The search term Query is matched with the candidate Title by computing the similarity between text feature 111 and text feature 121. The similarity between text feature 111 and text feature 121 may be used as a similarity between the search term Query and the candidate Title. Therefore, the similarity between the search word Query and each candidate Title in the candidate information base can be obtained, and the plurality of candidate titles are ranked according to the sequence from the high similarity to the low similarity, so that the ranked search result is obtained.
The natural language processing model in the module 130 may be the first natural language processing model 110 or the second natural language processing model 120. In the case where the natural language processing model in block 130 represents the first natural language processing model 110, the model input is a search word Query, e.g., the search word Query is "XXX line coupon". In the case where the natural language processing model in module 130 represents the second natural language processing model 120, the model input is a candidate Title, e.g., a "year XXX electronic consumer coupon".
And inputting the search word Query or the candidate Title into a natural language processing model, and extracting the characteristics of the search word Query or the candidate Title by the natural language processing model to obtain the text characteristics of the search word Query or the candidate Title.
Since the natural language processing model is insensitive to "XXX", it may result in inaccuracy of the text features extracted from "XXX" and thus inaccurate search ranking results.
Fig. 2 is a flowchart of an information search method according to one embodiment of the present disclosure.
As shown in fig. 2, the information search method 200 includes operations S210 to S250.
In operation S210, text features of the search word and text features of candidate information in the information base are generated, respectively.
For example, for a "search word Query-candidate Title" pair to be matched, a double-tower model may be used to generate text features of the search word Query and text features of the candidate Title, respectively.
In operation S220, in response to determining that the search term and the candidate information contain the same target term, a text feature of the target term is generated.
For each "search word Query-candidate Title" pair to be matched, it may be determined whether the search word Query and the candidate Title in the "search word Query-candidate Title" pair both contain the same target term. The target proper noun may be from a preset proper noun library.
For example, for different business scenarios, a corresponding private word library may be set. The thesaurus comprises a set of thesaurus for the actual business scenario. For example, for the business scenario of the user a, a special word library a may be set, where the special word library a includes a proper noun A1, a special word A2, special words A3, … …, and the like. For the business scenario of the user B, a special word library B may be set, where the special word library B includes a proper noun B1, a special word B2, a special word B3, … …, and the like.
For example, in the business scenario of the user a, if the search word Query and the candidate Title in the "search word Query-candidate Title" pair both include the same target proper noun A1, the matching of the "search word Query-candidate Title" pair may be interfered to improve the similarity of the "search word Query-candidate Title" pair, and further improve the ranking of the candidate titles in the ranking result.
For example, the text feature of the target term A1 may be generated before the text feature of the search word Query and the text feature of the candidate Title calculate the similarity. The text features of the target proper noun A1 are respectively fused into the text features of the search word Query and the text features of the candidate Title, and the similarity calculation is carried out by using the fused text features of the Query and the text features of the candidate Title, so that the similarity of the search word Query-candidate Title pair can be improved, and the ranking of the candidate Title in the ranking result is further improved.
In operation S230, fusion features of the search word and fusion features of the candidate information are generated according to the text features of the search word, the text features of the candidate information, and the text features of the target proper nouns.
For example, the text feature of the search word Query and the text feature of the target proper noun A1 are subjected to weighted fusion, so that the fusion feature of the search word Query is obtained. And carrying out weighted fusion on the text features of the candidate Title and the text features of the target proper noun A1 to obtain fusion features of the candidate Title. The weighted weights can be set manually or obtained through learning in the model training process.
In operation S240, a similarity between the search word and the candidate information is determined according to the fusion feature of the search word and the fusion feature of the candidate information.
In operation S250, information search results for the search term are determined according to the similarity.
For example, the similarity between the fusion feature of the search word Query and the fusion feature of the candidate Title is calculated as the similarity of the "search word Query-candidate Title" pair. And arranging the plurality of candidate Title according to the sequence of the similarity of the plurality of search word Query-candidate Title pairs from big to small to obtain a search result.
The search result can be displayed in the form of an information recommendation list, wherein the information recommendation list comprises a plurality of candidate Title which are sequentially arranged, and the higher the correlation between the candidate Title which is ranked ahead and the search word Query is.
According to the embodiment of the disclosure, under the condition that the search word Query and the candidate Title contain the same target proper noun, the matching of the search word Query and the candidate Title is interfered according to the target proper noun, so that the accuracy of the sorting result can be improved, and the user experience is improved.
According to the method and the device, on the basis that the sorting model has certain universality and generalization capability, the recognition capability of the model on the special words is improved, the importance of the special words in sorting is increased, the problem that the model is insensitive to the special words is further repaired, and the accuracy of search results is improved.
Compared with the method for intervening in the sorting result after the sorting result is obtained in the related art, the method is high in universality and effective in time. Compared with the mode of retraining the model in the related art, the embodiment does not need retraining the model, and labor and development time are effectively saved.
FIG. 3 is a schematic diagram of a method of determining target terminology according to one embodiment of the present disclosure.
According to an embodiment of the disclosure, the information searching method further includes matching the search word with a special word stock to obtain a first special word set hit by the search word in the special word stock; matching the candidate information with a special word stock to obtain a second special word set which is hit by the candidate information in the special word stock; in response to the first set of keywords and the second set of keywords having an intersection, the keywords in the intersection are determined to be target terms.
As shown in fig. 3, the search word Query is matched with the special name library 310 to obtain a first special name word set 311 in the special name word library, which is hit by the search word Query. For example, the first set of keywords 311 is { keyword A1, keyword A2}.
The candidate Title is matched against the repository 310 of special names to obtain a second set 312 of special names in the repository of special names that are hit by the candidate Title. For example, the second set of keywords 312 is { keyword A1, keyword A3}.
The intersection 320 of the first set of keywords 311 and the second set of keywords 312 is { the keyword A1}, the proper noun A1 can be determined as the target proper noun.
And then, respectively fusing the text features of the target proper nouns with the search word Query and the candidate Title, and using the similarity between the fusion features of the search word Query and the fusion features of the candidate Title to participate in the sorting.
The embodiment intervenes on the search word Query-candidate Title pairs containing the same proper nouns, does not influence the similarity of other search word Query-candidate titles, and does not negatively influence the sorting result.
According to an embodiment of the present disclosure, the information search method further includes determining a similarity between the search word and the candidate information according to a text feature of the search word and a text feature of the candidate information in response to the search word and the candidate information not including the same target proper noun.
For example, in the case that the search word Query and the candidate Title do not contain the same proper noun, the similarity is calculated according to the text feature of the search word Query and the text feature of the candidate Title, that is, the similarity of the search word Query and the original text feature of the candidate Title is used for participating in the ranking.
For example, the search word Query contains the term A1, the candidate Title contains the term A2, and although the search word Query and the candidate Title both contain the term, the terms contained are different, so that the similarity between the search word Query and the candidate Title should be used to participate in the ranking.
In the embodiment, under the condition that the search word Query and the candidate Title do not contain the same special words, the original similarity of the search word Query and the candidate Title is used for participating in the sorting, and the sorting result is not negatively influenced.
FIG. 4 is a schematic diagram of a dual tower model according to one embodiment of the present disclosure.
As shown in fig. 4, the left tower and the right tower of the present embodiment are added with an external knowledge model 401 and an external knowledge model 402, respectively, compared to the dual tower model in the related art. The plug-in knowledge model 401 and the plug-in knowledge model 402 are composed of a natural language processing model and a proprietary library. The private databases in the plug-in knowledge model 401 and the plug-in knowledge model 402 are the same private database.
The natural language processing model 431 in the module 430 may be the first natural language processing model 410 or the second natural language processing model 420. In the case where the natural language processing model 431 represents the first natural language processing model 410, the model input is a search word Query, for example, the search word Query is "XXX line coupon". In the case where the natural language processing model 431 represents the second natural language processing model 420, the model input is a candidate Title, for example, a "certain-year XXX electronic consumer ticket".
The natural language processing model 432 and the repository 433 in module 430 make up the plug-in knowledge model 401 or the plug-in knowledge model 402.
The search word Query is input into the plug-in knowledge model 401 and matched with the special name library 433 to obtain a first special name word set. Candidate Title is input into the plug-in knowledge model 402 and matched with the special name library 433 to obtain a second special name word set. In the case where the first set of keywords and the second set of keywords have an intersection, the keywords in the intersection are determined as target proper nouns. For example, the target term is "XXX". The target term "XXX" is input into the natural language processing model 432 to obtain the text feature of the target term.
The text feature of the target proper noun is weighted and fused with the text feature of the search word Query to obtain the fused feature 411 of the search word Query. And (5) carrying out weighted fusion on the text features of the target proper nouns and the text features of the candidate Title to obtain fusion features 421 of the candidate Title.
The similarity between the fusion feature 411 and the fusion feature 421 is calculated as the similarity between the search word Query and the candidate Title. Therefore, the similarity between the search word Query and each candidate Title in the candidate information base can be obtained, and the plurality of candidate titles in the candidate information base are ranked according to the similarity from large to small, so that the ranked search result is obtained.
The embodiment of the disclosure sets a private name library in the plug-in knowledge model, wherein the private name library is effective in real time for intervention of the sequencing result. Because whether a new intervention is effective depends only on whether the corresponding nomination word is contained in the nomination library. That is, the method can control the effectiveness/invalidation of the intervention in real time while adding/removing the special words into/from the special library, and is very convenient and controllable.
In addition, the thesaurus may also be updated in response to changes in the target business requirements. For example, a new special word can be added in the special name library, or a special word can be removed, so that the expansion of the special nouns is facilitated, and the special nouns are suitable for wider application scenes. Furthermore, the updating of the proper nouns is also effective in real time and does not have any negative effect on the searching method of the present embodiment.
Fig. 5 is a block diagram of an information search apparatus according to one embodiment of the present disclosure.
As shown in fig. 5, the information search apparatus 500 includes a first generation module 501, a second generation module 502, a third generation module 503, a first determination module 504, and a second determination module 505.
The first generation module 501 is configured to generate text features of the search term and text features of candidate information in the information base, respectively.
The second generation module 502 is configured to generate text features of the target terminology in response to determining that the search term and the candidate information contain the same target terminology.
The third generating module 503 is configured to generate a fusion feature of the search word and a fusion feature of the candidate information according to the text feature of the search word, the text feature of the candidate information, and the text feature of the target term.
The first determining module 504 is configured to determine a similarity between the search term and the candidate information according to the fusion feature of the search term and the fusion feature of the candidate information.
The second determining module 505 is configured to determine information search results for the search term according to the similarity.
According to an embodiment of the present disclosure, the information search apparatus 500 further includes a first matching module, a second matching module, and a third determining module.
The first matching module is used for matching the search word with the special word stock to obtain a first special word set hit by the search word in the special word stock.
And the second matching module is used for matching the candidate information with the special word stock to obtain a second special word set which is hit by the candidate information in the special word stock.
The third determination module is configured to determine the keywords in the intersection as target proper nouns in response to the first and second sets of keywords having an intersection.
The special words in the special word library are constructed according to the target business requirements. The information search apparatus 500 further includes an update module.
The updating module is used for responding to the change of the target business requirement and updating the special name word stock.
The third generating module 503 includes a first processing unit and a second processing unit.
The first processing unit is used for carrying out weighting processing on the text characteristics of the search word and the text characteristics of the target proper nouns to obtain fusion characteristics of the search word.
And the second processing unit is used for carrying out weighting processing on the text characteristics of the candidate information and the text characteristics of the target proper nouns to obtain fusion characteristics of the candidate information.
The information search apparatus 500 further includes a fourth determination module.
The fourth determining module is used for determining the similarity between the search word and the candidate information according to the text characteristics of the search word and the text characteristics of the candidate information in response to the search word and the candidate information not containing the same target proper noun.
The information base includes a plurality of candidate information. The second determining module 505 includes an ordering unit, a generating unit, and an output unit.
The sorting unit is used for sorting the plurality of candidate information according to the similarity between the search words and each candidate information.
The generating unit is used for generating an information recommendation list according to the sorting result and taking the information recommendation list as an information searching result.
The output unit is used for outputting the information search result.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as an information search method. For example, in some embodiments, the information search method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the information search method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the information search method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (15)

1. An information search method, comprising:
respectively generating text features of the search words and text features of candidate information in the information base;
generating text features of the target term in response to determining that the search term and the candidate information contain the same target term;
generating fusion features of the search word and fusion features of the candidate information according to the text features of the search word, the text features of the candidate information and the text features of the target proper nouns;
determining the similarity between the search word and the candidate information according to the fusion characteristic of the search word and the fusion characteristic of the candidate information; and
and determining information search results aiming at the search words according to the similarity.
2. The method of claim 1, further comprising:
matching the search word with a special word stock to obtain a first special word set hit by the search word in the special word stock;
matching the candidate information with the special word stock to obtain a second special word set which is hit by the candidate information in the special word stock;
and determining the special words in the intersection as the target special nouns in response to the first special word set and the second special word set having an intersection.
3. The method of claim 2, wherein the speeches in the speeches library are constructed for target business requirements; further comprises:
and updating the special name word stock in response to the change of the target business requirement.
4. The method of claim 1, wherein the generating the fusion feature of the search term and the fusion feature of the candidate information based on the text feature of the search term, the text feature of the candidate information, and the text feature of the target term comprises:
weighting the text features of the search word and the text features of the target proper nouns to obtain fusion features of the search word;
and weighting the text features of the candidate information and the text features of the target proper nouns to obtain the fusion features of the candidate information.
5. The method of claim 1, further comprising:
and determining the similarity between the search word and the candidate information according to the text characteristics of the search word and the text characteristics of the candidate information in response to the search word and the candidate information not containing the same target proper noun.
6. The method of claim 1, wherein the information repository comprises a plurality of candidate information; the determining information search results for the search term according to the similarity comprises:
sorting the plurality of candidate information according to the similarity between the search word and each candidate information;
generating an information recommendation list according to the sorting result to serve as the information search result; and
and outputting the information search result.
7. An information search apparatus, comprising:
the first generation module is used for respectively generating text features of the search words and text features of candidate information in the information base;
a second generation module, configured to generate a text feature of the target term in response to determining that the search term and the candidate information contain the same target term;
the third generation module is used for generating fusion features of the search word and fusion features of the candidate information according to the text features of the search word, the text features of the candidate information and the text features of the target proper nouns;
the first determining module is used for determining the similarity between the search word and the candidate information according to the fusion characteristic of the search word and the fusion characteristic of the candidate information; and
and the second determining module is used for determining information search results aiming at the search words according to the similarity.
8. The apparatus of claim 7, further comprising:
the first matching module is used for matching the search word with a special word stock to obtain a first special word set hit by the search word in the special word stock;
the second matching module is used for matching the candidate information with the special word stock to obtain a second special word set which is hit by the candidate information in the special word stock;
and a third determining module, configured to determine, as the target proper noun, a proper noun in the intersection in response to the first and second sets of special words having an intersection.
9. The apparatus of claim 8, wherein the speeches in the speeches library are constructed for target business needs; further comprises:
and the updating module is used for responding to the change of the target service requirement and updating the special name word stock.
10. The apparatus of claim 7, wherein the third generation module comprises:
the first processing unit is used for carrying out weighting processing on the text characteristics of the search word and the text characteristics of the target proper nouns to obtain fusion characteristics of the search word;
and the second processing unit is used for carrying out weighting processing on the text characteristics of the candidate information and the text characteristics of the target proper nouns to obtain the fusion characteristics of the candidate information.
11. The apparatus of claim 7, further comprising:
and a fourth determining module, configured to determine, in response to the search term and the candidate information not including the same target term, a similarity between the search term and the candidate information according to a text feature of the search term and a text feature of the candidate information.
12. The apparatus of claim 7, wherein the information repository comprises a plurality of candidate information; the second determining module includes:
the sorting unit is used for sorting the plurality of candidate information according to the similarity between the search words and each candidate information;
the generation unit is used for generating an information recommendation list according to the sorting result and taking the information recommendation list as the information search result; and
and the output unit is used for outputting the information search result.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6.
14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 6.
15. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements the method according to any one of claims 1 to 6.
CN202310115644.4A 2023-02-01 2023-02-01 Information searching method, device, electronic equipment and storage medium Pending CN116383340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310115644.4A CN116383340A (en) 2023-02-01 2023-02-01 Information searching method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310115644.4A CN116383340A (en) 2023-02-01 2023-02-01 Information searching method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116383340A true CN116383340A (en) 2023-07-04

Family

ID=86972015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310115644.4A Pending CN116383340A (en) 2023-02-01 2023-02-01 Information searching method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116383340A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680481A (en) * 2023-08-03 2023-09-01 腾讯科技(深圳)有限公司 Search ranking method, apparatus, device, storage medium and computer program product

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680481A (en) * 2023-08-03 2023-09-01 腾讯科技(深圳)有限公司 Search ranking method, apparatus, device, storage medium and computer program product
CN116680481B (en) * 2023-08-03 2024-01-12 腾讯科技(深圳)有限公司 Search ranking method, apparatus, device, storage medium and computer program product

Similar Documents

Publication Publication Date Title
US11397772B2 (en) Information search method, apparatus, and system
US10210243B2 (en) Method and system for enhanced query term suggestion
US11782999B2 (en) Method for training fusion ordering model, search ordering method, electronic device and storage medium
US9934293B2 (en) Generating search results
US20220083874A1 (en) Method and device for training search model, method for searching for target object, and storage medium
JP2022050379A (en) Semantic retrieval method, apparatus, electronic device, storage medium, and computer program product
CN110147494B (en) Information searching method and device, storage medium and electronic equipment
CN114840671A (en) Dialogue generation method, model training method, device, equipment and medium
CN113326420B (en) Question retrieval method, device, electronic equipment and medium
CN113988157B (en) Semantic retrieval network training method and device, electronic equipment and storage medium
CN113660541B (en) Method and device for generating abstract of news video
CN111475725A (en) Method, apparatus, device, and computer-readable storage medium for searching for content
CN116383340A (en) Information searching method, device, electronic equipment and storage medium
CN112506864B (en) File retrieval method, device, electronic equipment and readable storage medium
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
EP3992814A2 (en) Method and apparatus for generating user interest profile, electronic device and storage medium
CN114048315A (en) Method and device for determining document tag, electronic equipment and storage medium
CN111881255B (en) Synonymous text acquisition method and device, electronic equipment and storage medium
CN113377922B (en) Method, device, electronic equipment and medium for matching information
CN111539208B (en) Sentence processing method and device, electronic device and readable storage medium
CN116501841B (en) Fuzzy query method, system and storage medium for data model
CN117591741A (en) Implement retrieval method, device, electronic equipment and storage medium
CN113377921A (en) Method, apparatus, electronic device, and medium for matching information
CN117271884A (en) Method, device, electronic equipment and storage medium for determining recommended content
CN117932036A (en) Dialogue processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination