CN113486246A - Information searching method, device, equipment and storage medium - Google Patents

Information searching method, device, equipment and storage medium Download PDF

Info

Publication number
CN113486246A
CN113486246A CN202110844238.2A CN202110844238A CN113486246A CN 113486246 A CN113486246 A CN 113486246A CN 202110844238 A CN202110844238 A CN 202110844238A CN 113486246 A CN113486246 A CN 113486246A
Authority
CN
China
Prior art keywords
search
webpages
value
keywords
basic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110844238.2A
Other languages
Chinese (zh)
Other versions
CN113486246B (en
Inventor
卢春曦
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110844238.2A priority Critical patent/CN113486246B/en
Publication of CN113486246A publication Critical patent/CN113486246A/en
Application granted granted Critical
Publication of CN113486246B publication Critical patent/CN113486246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information searching method, which comprises the following steps: acquiring basic keywords; executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords; executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords; and screening out a target webpage from all the search webpages based on the jumping probability value between every two search webpages to serve as a search result of the basic keyword, wherein the jumping probability value is a numerical value calculated according to the jumping relation between the two webpages. Therefore, the method and the device can realize multi-language search of basic keywords, improve the search scope, and screen the search webpages according to the skip probability value between every two search webpages to obtain the search result, thereby realizing the screening of the search webpages by using relatively complex standards and improving the search accuracy. The invention also relates to the technical field of block chains.

Description

Information searching method, device, equipment and storage medium
Technical Field
The present invention relates to the field of information search technologies, and in particular, to a method and an apparatus for searching information, a computer device, and a storage medium.
Background
With the arrival of the information-oriented society, information technologies have gradually penetrated into the daily life of human beings, bringing great convenience to the daily life of human beings, for example, the current information technologies such as communication technology, artificial intelligence technology, internet of things technology and the like create better living conditions for human beings. While information technology is widely used, a large amount of information is generated, for example, there are countless various websites in the internet, each of which stores a large amount of information, and people are easily submerged in the sea of information. In order to assist users to find needed information from a large amount of information, information search engine technologies (e.g., Baidu search engine, Google search engine, etc.) have been developed. However, most of the existing information search technologies only perform a search in a single language, that is, after a user inputs a search keyword, only a search result in the same language as the input keyword is fed back to the user, for example, the keyword input by the user is chinese, and the search result fed back to the user is also chinese. It can be seen that the search breadth and the search accuracy of the current information search method still have a space for further improvement.
Disclosure of Invention
The invention aims to solve the technical problem that the existing information searching method is low in searching breadth and searching accuracy.
In order to solve the above technical problem, a first aspect of the present invention discloses an information searching method, including:
acquiring basic keywords;
executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords;
executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords;
and screening out target webpages from all the searched webpages as search results of the basic keywords based on the jumping probability value between every two searched webpages, wherein the jumping probability value is a numerical value calculated according to the jumping relation between the two webpages.
The second aspect of the present invention discloses an information searching apparatus, comprising:
the acquisition module is used for acquiring basic keywords;
the translation module is used for executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords;
the search module is used for executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords;
and the screening module is used for screening a target webpage from all the search webpages based on the jump probability value between every two search webpages to serve as the search result of the basic keyword, wherein the jump probability value is a numerical value obtained by calculation according to the jump relation between the two webpages.
A third aspect of the present invention discloses a computer apparatus, comprising:
a memory storing executable program code;
a processor coupled to the memory;
the processor calls the executable program code stored in the memory to execute part or all of the steps of the information searching method disclosed by the first aspect of the invention.
In a fourth aspect of the present invention, a computer storage medium is disclosed, which stores computer instructions for performing some or all of the steps of the information searching method disclosed in the first aspect of the present invention when the computer instructions are called.
In the embodiment of the invention, the basic keyword is obtained firstly, then the basic keyword is translated into the corresponding translation keyword, then the search is carried out according to the basic keyword and the translation keyword to obtain the search web page corresponding to the basic keyword, finally the target web page is screened out from all the search web pages based on the jump probability value between every two search web pages to be used as the search result, so that the multi-language search of the basic keyword can be realized, the finally obtained search result can contain multiple languages, the search breadth is improved, the search web pages are screened according to the jump probability value between every two search web pages to obtain the search result, the search web pages can be screened by using relatively complex standards, and the search accuracy is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for searching information according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an information search apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or article.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The invention discloses an information searching method, an information searching device, computer equipment and a storage medium, wherein basic keywords are obtained firstly, then the basic keywords are translated into corresponding translation keywords, then searching is carried out according to the basic keywords and the translation keywords to obtain searching webpages corresponding to the basic keywords, and finally target webpages are screened out from all the searching webpages to be used as searching results based on the jumping probability value between every two searching webpages, so that the multi-language searching of the basic keywords can be realized, the finally obtained searching results can contain multiple languages, the searching breadth is improved, the searching webpages are screened according to the jumping probability value between every two searching webpages to obtain the searching results, the screening of the searching webpages by using relatively complex standards can be realized, and the searching accuracy is improved. The following are detailed below.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an information searching method according to an embodiment of the present invention. As shown in fig. 1, the information searching method may include the following operations:
101. and acquiring basic keywords.
In the step 101, the basic keyword may be input by the user, for example, the user may input the english word "Homomorphic encryption" as the basic keyword to be used for the subsequent search. Optionally, the basic keyword input by the user may also include words of multiple languages, for example, words of both chinese and english.
102. And executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords.
In step 102, the basic keyword may be automatically translated by a preset translator (e.g., Baidu translation, Google translation, etc.), so that the basic keyword is translated into words of different languages (i.e., translation keyword). After the user inputs the basic keyword, the basic keyword may be set as a keyword I1And marking the language of the basic key words as t1. The user may pre-select the language in which the underlying keyword is to be translated, thereby forming a list of languages { t }1,t2,t3,……,tnAnd then the translator can translate the basic keywords into words of corresponding languages according to a language list set by a user. E.g. t in language list1、t2、t3The English words, the Chinese words and the German words are sequentially 'English', 'Chinese' and 'German' and the basic key words input by the user are 'Homomorphic encryption', the translator translates the 'Homomorphic encryption' into the 'Homomorphic encryption' of the Chinese words and the 'Homorphic Verschl ü sselung' of the DE words and respectively sets the 'Homomorphic encryption' of the Chinese words and the 'Homorphic encryption' of the DE words as the key words I2And a keyword I3
103. And executing preset search operation according to the basic keywords and the translation keywords to obtain a search webpage corresponding to the basic keywords.
In step 103, after obtaining the basic keyword and the translation keyword, the internet can be searched by using the basic keyword and the translation keyword, so as to obtain a search webpage related to the basic keyword and the translation keyword. E.g., for the above keyword I1、I2、I3The search condition AND (TERM (I) can be set in the following manner1),TERM(I2),TERM(I3)),FROM(LAN(t1),LAN(t2),LAN(t3) According to the search condition, the search webpage can be obtained after the internet is searched. Where AND denotes the fetching of a web page on the Internet that satisfies one of a plurality of conditions behind AND, TERM (I)1Showing that the web page captured contains the keyword I1,TERM(I2)、TERM(I3) The same is true. FROM sets the language of the search Web page, LAN (t)1) The language representing the web page to be searched is t1,LAN(t2)、LAN(t3) The same is true.
104. And screening out target webpages from all the searched webpages as search results of the basic keywords based on the jumping probability value between every two searched webpages, wherein the jumping probability value is a numerical value calculated according to the jumping relation between the two webpages.
In the step 104, after the search webpages are obtained, the search webpages may be filtered according to the probability value of skipping between every two search webpages, so as to obtain a final search result, and a specific filtering process is described in detail later. The search web pages are screened according to the jump probability value between every two search web pages to obtain the search result, so that the search web pages can be screened by using relatively complex standards, and the accuracy of information search is improved.
It can be seen that, when the information searching method described in fig. 1 is implemented, the basic keyword is obtained first, then the basic keyword is translated into the corresponding translation keyword, then the search is performed according to the basic keyword and the translation keyword to obtain the search web page corresponding to the basic keyword, and finally the target web page is screened out from all the search web pages based on the jump probability value between every two search web pages to be used as the search result, so that the multi-language search of the basic keyword can be realized, the finally obtained search result can contain multiple languages, the search breadth is improved, and the search web pages are screened according to the jump probability value between every two search web pages to obtain the search result, so that the search web pages can be screened by using relatively complex standards, and the search accuracy is improved.
In an optional embodiment, the screening out a target web page from all the search web pages based on the probability value of skipping between every two search web pages includes:
screening out intermediate search webpages from all the search webpages according to the occurrence times of the basic keywords and the translation keywords in the search webpages;
and screening out a target webpage from all the intermediate search webpages based on the jump probability value between every two intermediate search webpages.
In the optional embodiment, when the search web page is screened, the search web page may be first screened according to the times of occurrence of the basic keyword and the translation keyword in the search web page, and then, on the basis of the first screening result, the second screening may be performed based on the jump probability value, so as to screen out the target web page. Therefore, the target webpage is screened out through twice screening, the target webpage can be screened out from the searched webpage more accurately to serve as a searching result, and the accuracy of information searching is improved.
Therefore, by implementing the optional embodiment, the search webpages are firstly screened according to the times of occurrence of the basic keywords and the translation keywords in the search webpages, and then the second screening is performed based on the skip probability value on the basis of the first screening result, so that the target webpages can be more accurately screened from the search webpages to serve as the search results, and the accuracy of information search is improved.
In an optional embodiment, the screening out intermediate search webpages from all the search webpages according to the times of the basic keywords and the translation keywords appearing in the search webpages includes:
calculating a compliance value of each of the search webpages by the following formula:
Figure BDA0003179909610000061
wherein N isiIs the number of times the target keyword appears in the search web page, NdThe number of search webpages containing the target keyword in all the search webpages is, the totalDocCount is the number of all the search webpages, wherein the target keyword is the basic keyword or the translation keyword;
and screening out the search webpages meeting preset conditions of the matching values from all the search webpages to serve as intermediate search webpages according to the matching values of all the search webpages.
In this alternative embodiment, NiIndicates the number of times the target keyword appears in the search web page, the more the target keyword appears in the search web page, i.e., the higher the match value of the search web page. The higher the proportion of the search web pages containing the target keyword among all the search web pages, that is, the higher the matching value of the search web pages containing the target keyword.
Therefore, by implementing the optional embodiment, the matching value of each search webpage is calculated through a preset formula containing the number of times that the basic keyword and the translation keyword appear in the search webpage, and then the intermediate search webpage is screened according to the matching value of each search webpage, so that the intermediate search webpage can be screened according to the number of times that the basic keyword and the translation keyword appear in the search webpage, and the accuracy of information search is improved.
In an optional embodiment, the screening, according to the matching value of each search web page, a search web page that matches a preset matching value condition from all the search web pages as an intermediate search web page includes:
sequencing all the search webpages according to the corresponding values of all the search webpages;
and taking the search web page in the preset sequencing range as the intermediate search web page, or,
acquiring a preset coincidence value threshold;
and taking the search webpage with the conformity value larger than the conformity value threshold value as the intermediate search webpage.
In this alternative embodiment, the search webpages may be ranked from high to low according to the matching value of each search webpage, and then the search webpages within a preset ranking range (e.g., top 50 ranked webpages, top 100 ranked webpages) may be used as the intermediate search webpages. Or a matching value threshold value can be preset, and when the matching value of the searched web page is calculated to be larger than the matching value threshold value, the searched web page can be used as an intermediate searched web page.
Therefore, the optional embodiment is implemented, the search webpages are ranked according to the matching values of the search webpages, then the search webpages in the preset ranking range are used as intermediate search webpages, or a matching value threshold value is preset, and then the search webpages with the corresponding matching values larger than the matching value threshold value are used as intermediate search webpages, so that the appropriate intermediate search webpages can be screened out from all the search webpages, and the accuracy of information search is improved.
In an optional embodiment, the screening out the target web page from all the intermediate search web pages based on the probability value of the jump between every two intermediate search web pages comprises:
calculating a relevance value of each of the intermediate search webpages by the following formula:
Bpr=(1-α)MT+αET
wherein, BprIs a vector of length n, BprThe ith element in (1) isThe correlation value of the ith intermediate search web page, n is the total number of the intermediate search web pages, M is an n x n matrix, each element in M is the jump probability value between every two intermediate search web pages, E is also an n x n matrix, the value of each element in E is 1/n, and alpha is a preset web page jump coefficient;
and screening out intermediate search webpages meeting preset conditions of the correlation values from all the intermediate search webpages to serve as target webpages according to the correlation values of all the intermediate search webpages.
In this alternative embodiment, BprEach element in (a) corresponds to the pagerank score of each intermediate search web page (i.e., the relevance value of the intermediate search web page). α is a preset web page jump coefficient, which can be generally set empirically, and can be generally set to 0.15. E is a random jump matrix for expressing randomness of webpage jumps, so the value of each element in E is set to 1/n.
Therefore, by implementing the optional embodiment, the related value of each intermediate search webpage is calculated through a preset formula containing the jump probability value between every two intermediate search webpages, and then the target webpage is screened out according to the related value of each intermediate search webpage, so that the target webpage can be screened according to the jump probability value between every two intermediate search webpages, and the accuracy of information search is improved.
In an optional embodiment, the screening, according to the correlation value of each intermediate search web page, intermediate search web pages that meet a preset correlation value condition from all the intermediate search web pages as target web pages includes:
sequencing all the intermediate search webpages according to the relevant values of each intermediate search webpage;
and taking the intermediate search web page in the preset sequencing range as the target web page, or,
acquiring a preset correlation value threshold;
and taking the intermediate search web page with the correlation value larger than the correlation value threshold value as the target web page.
In this alternative embodiment, the intermediate search webpages may be ranked from high to low according to the relevance value of each intermediate search webpage, and then the intermediate search webpages within a preset ranking range (e.g., top 50 ranked webpages, top 100 ranked webpages) may be used as target webpages. Or a correlation value threshold value can be preset, and when the correlation value of the intermediate search webpage is calculated to be greater than the correlation value threshold value, the intermediate search webpage can be used as the target webpage.
Therefore, when the optional embodiment is implemented, the intermediate search webpages are ranked according to the relevance values of the intermediate search webpages, then the intermediate search webpages in the preset ranking range are used as the target webpages, or a relevance value threshold value is preset, and then the intermediate search webpages with the corresponding relevance values larger than the relevance value threshold value are used as the target webpages, so that the appropriate target webpages can be screened from all the intermediate search webpages, and the accuracy of information search is improved.
In an alternative embodiment, the probability value of the jump between two web pages is calculated by the following formula:
Figure BDA0003179909610000081
the method comprises the steps of obtaining a jump probability value of a first webpage jumping to a second webpage in two webpages, c being the number of all jump links in the first webpage, and b being the number of all jump links jumping from the first webpage to the second webpage in the first webpage.
In this alternative embodiment, the jump probability value between the two web pages is a value indicating the possibility of a first web page of the two web pages jumping to a second web page, so that it can be calculated according to the number of all jump links in the first web page and the number of all jump links in the first web page jumping from the first web page to the second web page. For example, there are three jump links in the web page a, and only one jump link is jumped to the web page b, so the jump probability value between the web page a and the web page b is 1/3.
Therefore, by implementing the optional embodiment, the jump probability value between the two webpages is calculated according to the number of all jump links in the first webpage and the number of all jump links in the first webpage jumping from the first webpage to the second webpage, so that the target webpages can be screened according to the jump relation between the webpages, and the accuracy of information search is improved.
Optionally, it is also possible: and uploading the search information of the information search method to a block chain.
Specifically, the search information of the information is obtained by operating the search method of the information, and is used for recording the search condition of the information, such as the acquired basic keyword, the translated keyword obtained by translation, the searched search webpage, the screened target webpage and the like. Uploading the search information of the information to the block chain can ensure the safety and the fair transparency of the block chain to the user. The user can download the search information of the information from the blockchain so as to verify whether the search information of the search method of the information is tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Therefore, by implementing the embodiment of the invention, the basic keyword is obtained firstly, then the basic keyword is translated into the corresponding translation keyword, then the search is carried out according to the basic keyword and the translation keyword to obtain the search web page corresponding to the basic keyword, and finally the target web page is screened out from all the search web pages to be used as the search result based on the jump probability value between every two search web pages, so that the multi-language search of the basic keyword can be realized, the finally obtained search result can contain multiple languages, the search breadth is improved, the search web pages are screened according to the jump probability value between every two search web pages to obtain the search result, the search web pages can be screened by using relatively complex standards, and the search accuracy is improved. And firstly, screening the search web pages for the first time according to the times of the basic keywords and the translation keywords appearing in the search web pages, then screening for the second time based on the skip probability value on the basis of the first screening result, and more accurately screening the target web pages from the search web pages to serve as the search result, so that the accuracy of information search is improved. And the matching value of each search webpage is calculated through a preset formula containing the times of the basic keywords and the translation keywords appearing in the search webpages, and then the intermediate search webpages are screened according to the matching value of each search webpage, so that the intermediate search webpages can be screened according to the times of the basic keywords and the translation keywords appearing in the search webpages, and the accuracy of information search is improved. The search webpages are ranked according to the matching values of the search webpages, then the search webpages in a preset ranking range are used as intermediate search webpages, or a matching value threshold value is preset, and then the search webpages with the corresponding matching values larger than the matching value threshold value are used as intermediate search webpages, so that the appropriate intermediate search webpages can be screened out from all the search webpages, and the accuracy of information search is improved. And the related value of each intermediate search webpage is calculated through a preset formula containing the jump probability value between every two intermediate search webpages, and then the target webpage is screened according to the related value of each intermediate search webpage, so that the target webpage can be screened according to the jump probability value between every two intermediate search webpages, and the accuracy of information search is improved. And sequencing the intermediate search webpages according to the correlation values of the intermediate search webpages, and then taking the intermediate search webpages in a preset sequencing range as target webpages, or presetting a correlation value threshold value, and then taking the intermediate search webpages with the corresponding correlation values larger than the correlation value threshold value as the target webpages, so that the appropriate target webpages can be screened from all the intermediate search webpages, and the accuracy of information search is improved. And the jump probability value between the two webpages is calculated according to the number of all jump links in the first webpage and the number of all jump links in the first webpage jumping from the first webpage to the second webpage, so that the target webpages can be screened according to the jump relation between the webpages, and the accuracy of information search is improved.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of an information searching apparatus according to an embodiment of the present invention. As shown in fig. 2, the information searching apparatus may include:
an obtaining module 201, configured to obtain a basic keyword;
the translation module 202 is configured to perform a preset automatic translation operation on the basic keyword to obtain a translation keyword corresponding to the basic keyword;
the search module 203 is configured to execute a preset search operation according to the basic keyword and the translation keyword to obtain a search webpage corresponding to the basic keyword;
and the screening module 204 is configured to screen out a target webpage from all the search webpages as a search result of the basic keyword based on a jump probability value between every two search webpages, where the jump probability value is a numerical value calculated according to a jump relationship between two webpages.
In an optional embodiment, the screening module 204 screens out the target web page from all the search web pages based on the probability value of the jump between every two search web pages in a specific manner:
screening out intermediate search webpages from all the search webpages according to the occurrence times of the basic keywords and the translation keywords in the search webpages;
and screening out a target webpage from all the intermediate search webpages based on the jump probability value between every two intermediate search webpages.
In an optional embodiment, the screening module 204 screens out intermediate search webpages from all the search webpages according to the times of the basic keywords and the translation keywords appearing in the search webpages in a specific manner:
calculating a compliance value of each of the search webpages by the following formula:
Figure BDA0003179909610000111
wherein N isiIs the number of times the target keyword appears in the search web page, NdThe number of search webpages containing the target keyword in all the search webpages is, the totalDocCount is the number of all the search webpages, wherein the target keyword is the basic keyword or the translation keyword;
and screening out the search webpages meeting preset conditions of the matching values from all the search webpages to serve as intermediate search webpages according to the matching values of all the search webpages.
In an optional embodiment, the specific way of screening out, by the screening module 204, the search webpages meeting the preset condition of the matching value from all the search webpages as intermediate search webpages according to the matching value of each search webpage is as follows:
sequencing all the search webpages according to the corresponding values of all the search webpages;
and taking the search web page in the preset sequencing range as the intermediate search web page, or,
acquiring a preset coincidence value threshold;
and taking the search webpage with the conformity value larger than the conformity value threshold value as the intermediate search webpage.
In an optional embodiment, the screening module 204 screens out the target web page from all the intermediate search web pages based on the probability value of the jump between every two intermediate search web pages in a specific manner:
calculating a relevance value of each of the intermediate search webpages by the following formula:
Bpr=(1-α)MT+αET
wherein, BprIs a vector of length n, BprThe ith element in the intermediate search web page is a correlation value of the ith intermediate search web page, n is the total number of the intermediate search web pages, M is an n x n matrix, each element in M is a jump probability value between every two intermediate search web pages, E is also an n x n matrix, the value of each element in E is 1/n, and alpha is a preset web page jump coefficient;
and screening out intermediate search webpages meeting preset conditions of the correlation values from all the intermediate search webpages to serve as target webpages according to the correlation values of all the intermediate search webpages.
In an optional embodiment, a specific way of screening out the intermediate search webpages meeting the preset condition of the correlation value from all the intermediate search webpages as the target webpages by the screening module 204 according to the correlation value of each intermediate search webpage is as follows:
sequencing all the intermediate search webpages according to the relevant values of each intermediate search webpage;
and taking the intermediate search web page in the preset sequencing range as the target web page, or,
acquiring a preset correlation value threshold;
and taking the intermediate search web page with the correlation value larger than the correlation value threshold value as the target web page.
In an alternative embodiment, the probability value of the jump between two web pages is calculated by the following formula:
Figure BDA0003179909610000121
the method comprises the steps of obtaining a jump probability value of a first webpage jumping to a second webpage in two webpages, c being the number of all jump links in the first webpage, and b being the number of all jump links jumping from the first webpage to the second webpage in the first webpage.
For the specific description of the information search apparatus, reference may be made to the specific description of the information search method, and for avoiding repetition, details are not repeated here.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 3, the computer apparatus may include:
a memory 301 storing executable program code;
a processor 302 connected to the memory 301;
the processor 302 calls the executable program code stored in the memory 301 to execute the steps in the information searching method disclosed in the first embodiment of the present invention.
Example four
Referring to fig. 4, an embodiment of the present invention discloses a computer storage medium 401, where the computer storage medium 401 stores computer instructions, and the computer instructions, when called, are used to execute steps in the information searching method disclosed in the embodiment of the present invention.
The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above detailed description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, where the storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc-Read-Only Memory (CD-ROM), or other disk memories, CD-ROMs, or other magnetic disks, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
Finally, it should be noted that: the information searching method, apparatus, computer device and storage medium disclosed in the embodiments of the present invention are only preferred embodiments of the present invention, and are only used for illustrating the technical solutions of the present invention, rather than limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for searching information, the method comprising:
acquiring basic keywords;
executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords;
executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords;
and screening out target webpages from all the searched webpages as search results of the basic keywords based on the jumping probability value between every two searched webpages, wherein the jumping probability value is a numerical value calculated according to the jumping relation between the two webpages.
2. The information searching method of claim 1, wherein the step of screening out the target web page from all the search web pages based on the probability value of skipping between every two search web pages comprises:
screening out intermediate search webpages from all the search webpages according to the occurrence times of the basic keywords and the translation keywords in the search webpages;
and screening out a target webpage from all the intermediate search webpages based on the jump probability value between every two intermediate search webpages.
3. The information searching method according to claim 2, wherein the screening of the intermediate search pages from all the search pages according to the number of times that the basic keyword and the translation keyword appear in the search pages comprises:
calculating a compliance value of each of the search webpages by the following formula:
Figure FDA0003179909600000011
wherein N isiIs the number of times the target keyword appears in the search web page, NdThe number of search webpages containing the target keyword in all the search webpages is, the totalDocCount is the number of all the search webpages, wherein the target keyword is the basic keyword or the translation keyword;
and screening out the search webpages meeting preset conditions of the matching values from all the search webpages to serve as intermediate search webpages according to the matching values of all the search webpages.
4. The information searching method according to claim 3, wherein the screening, as the intermediate search page, a search page that meets a preset condition of a matching value from all the search pages according to the matching value of each search page includes:
sequencing all the search webpages according to the corresponding values of all the search webpages;
and taking the search web page in the preset sequencing range as the intermediate search web page, or,
acquiring a preset coincidence value threshold;
and taking the search webpage with the conformity value larger than the conformity value threshold value as the intermediate search webpage.
5. The information searching method of claim 2, wherein the screening out the target web page from all the intermediate search web pages based on the probability value of skipping between every two intermediate search web pages comprises:
calculating a relevance value of each of the intermediate search webpages by the following formula:
Bpr=(1-α)MT+αET
wherein, BprIs a vector of length n, BprThe ith element in the intermediate search web page is a correlation value of the ith intermediate search web page, n is the total number of the intermediate search web pages, M is an n x n matrix, each element in M is a jump probability value between every two intermediate search web pages, E is also an n x n matrix, the value of each element in E is 1/n, and alpha is a preset web page jump coefficient;
and screening out intermediate search webpages meeting preset conditions of the correlation values from all the intermediate search webpages to serve as target webpages according to the correlation values of all the intermediate search webpages.
6. The information searching method according to claim 5, wherein the screening, according to the correlation value of each intermediate search web page, intermediate search web pages that meet a preset correlation value condition from all the intermediate search web pages as target web pages comprises:
sequencing all the intermediate search webpages according to the relevant values of each intermediate search webpage;
and taking the intermediate search web page in the preset sequencing range as the target web page, or,
acquiring a preset correlation value threshold;
and taking the intermediate search web page with the correlation value larger than the correlation value threshold value as the target web page.
7. The method of searching information according to any one of claims 1 to 6, wherein the value of the probability of skipping between two web pages is calculated by the following formula:
Figure FDA0003179909600000031
the method comprises the steps of obtaining a jump probability value of a first webpage jumping to a second webpage in two webpages, c being the number of all jump links in the first webpage, and b being the number of all jump links jumping from the first webpage to the second webpage in the first webpage.
8. An apparatus for searching information, the apparatus comprising:
the acquisition module is used for acquiring basic keywords;
the translation module is used for executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords;
the search module is used for executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords;
and the screening module is used for screening a target webpage from all the search webpages based on the jump probability value between every two search webpages to serve as the search result of the basic keyword, wherein the jump probability value is a numerical value obtained by calculation according to the jump relation between the two webpages.
9. A computer device, characterized in that the computer device comprises:
a memory storing executable program code;
a processor coupled to the memory;
the processor calls the executable program code stored in the memory to execute the method of searching for information according to any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements a method of searching for information according to any one of claims 1 to 7.
CN202110844238.2A 2021-07-26 2021-07-26 Information searching method, device, equipment and storage medium Active CN113486246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110844238.2A CN113486246B (en) 2021-07-26 2021-07-26 Information searching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110844238.2A CN113486246B (en) 2021-07-26 2021-07-26 Information searching method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113486246A true CN113486246A (en) 2021-10-08
CN113486246B CN113486246B (en) 2024-07-12

Family

ID=77942612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110844238.2A Active CN113486246B (en) 2021-07-26 2021-07-26 Information searching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113486246B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116595142A (en) * 2023-05-19 2023-08-15 大安健康科技(北京)有限公司 Retrieval matching method and system based on medical semantic analysis

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070124425A1 (en) * 2005-11-30 2007-05-31 Gross John N System & Method of Delivering Content Based Advertising
CN103455523A (en) * 2012-06-05 2013-12-18 深圳市世纪光速信息技术有限公司 Method and server for searching information
CN103488648A (en) * 2012-06-13 2014-01-01 阿里巴巴集团控股有限公司 Multilanguage mixed retrieval method and system
CN104217016A (en) * 2014-09-22 2014-12-17 北京国双科技有限公司 Method and device for calculating search keywords of webpage
CN104715063A (en) * 2015-03-31 2015-06-17 百度在线网络技术(北京)有限公司 Search ranking method and search ranking device
CN105404688A (en) * 2015-12-11 2016-03-16 北京奇虎科技有限公司 Searching method and searching device
CN111460810A (en) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 Crowd-sourced task spot check method and device, computer equipment and storage medium
CN111708911A (en) * 2020-06-17 2020-09-25 北京字节跳动网络技术有限公司 Search method, search device, electronic equipment and computer-readable storage medium
CN111831885A (en) * 2020-07-14 2020-10-27 深圳市众创达企业咨询策划有限公司 Internet information retrieval system and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070124425A1 (en) * 2005-11-30 2007-05-31 Gross John N System & Method of Delivering Content Based Advertising
CN103455523A (en) * 2012-06-05 2013-12-18 深圳市世纪光速信息技术有限公司 Method and server for searching information
CN103488648A (en) * 2012-06-13 2014-01-01 阿里巴巴集团控股有限公司 Multilanguage mixed retrieval method and system
CN104217016A (en) * 2014-09-22 2014-12-17 北京国双科技有限公司 Method and device for calculating search keywords of webpage
CN104715063A (en) * 2015-03-31 2015-06-17 百度在线网络技术(北京)有限公司 Search ranking method and search ranking device
CN105404688A (en) * 2015-12-11 2016-03-16 北京奇虎科技有限公司 Searching method and searching device
CN111460810A (en) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 Crowd-sourced task spot check method and device, computer equipment and storage medium
CN111708911A (en) * 2020-06-17 2020-09-25 北京字节跳动网络技术有限公司 Search method, search device, electronic equipment and computer-readable storage medium
CN111831885A (en) * 2020-07-14 2020-10-27 深圳市众创达企业咨询策划有限公司 Internet information retrieval system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116595142A (en) * 2023-05-19 2023-08-15 大安健康科技(北京)有限公司 Retrieval matching method and system based on medical semantic analysis

Also Published As

Publication number Publication date
CN113486246B (en) 2024-07-12

Similar Documents

Publication Publication Date Title
CN110837550B (en) Knowledge graph-based question answering method and device, electronic equipment and storage medium
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
US7503035B2 (en) Software tool for detecting plagiarism in computer source code
US8463593B2 (en) Natural language hypernym weighting for word sense disambiguation
CN112270196B (en) Entity relationship identification method and device and electronic equipment
CN110598206A (en) Text semantic recognition method and device, computer equipment and storage medium
EP3483747A1 (en) Preserving and processing ambiguity in natural language
CN110321437B (en) Corpus data processing method and device, electronic equipment and medium
CN106682387A (en) Method and device used for outputting information
CN111813905B (en) Corpus generation method, corpus generation device, computer equipment and storage medium
CN111310440A (en) Text error correction method, device and system
KR20200014047A (en) Method, system and computer program for knowledge extension based on triple-semantic
US11379527B2 (en) Sibling search queries
CN113010679A (en) Question and answer pair generation method, device and equipment and computer readable storage medium
CN113515589A (en) Data recommendation method, device, equipment and medium
CN117313861A (en) Model pre-training data acquisition method, model pre-training method, device and equipment
CN111563212A (en) Inner chain adding method and device
CN113486246A (en) Information searching method, device, equipment and storage medium
KR101505673B1 (en) Multi-language searching system, multi-language searching method, and image searching system based on meaning of word
CN111782821A (en) Method and device for predicting medical hotspots based on FM model and computer equipment
CN113902354B (en) Travel evaluation data processing method and device and computer equipment
JP2024507029A (en) Web page identification methods, devices, electronic devices, media and computer programs
CN114138954A (en) User consultation problem recommendation method, system, computer equipment and storage medium
CN113495964A (en) Method, device and equipment for screening triples and readable storage medium
CN112257408A (en) Text comparison method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant