CN113486246B - Information searching method, device, equipment and storage medium - Google Patents

Information searching method, device, equipment and storage medium Download PDF

Info

Publication number
CN113486246B
CN113486246B CN202110844238.2A CN202110844238A CN113486246B CN 113486246 B CN113486246 B CN 113486246B CN 202110844238 A CN202110844238 A CN 202110844238A CN 113486246 B CN113486246 B CN 113486246B
Authority
CN
China
Prior art keywords
search
web pages
web page
webpages
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110844238.2A
Other languages
Chinese (zh)
Other versions
CN113486246A (en
Inventor
卢春曦
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110844238.2A priority Critical patent/CN113486246B/en
Publication of CN113486246A publication Critical patent/CN113486246A/en
Application granted granted Critical
Publication of CN113486246B publication Critical patent/CN113486246B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information searching method, which comprises the following steps: obtaining basic keywords; executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords; executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords; and screening target webpages from all the search webpages based on the jump probability values between every two search webpages to serve as search results of basic keywords, wherein the jump probability values are calculated according to jump relations between the two webpages. Therefore, the invention can realize multilingual search of basic keywords, improve the searching breadth, and screen the searching webpages according to the jump probability value between every two searching webpages to obtain the searching result, thereby realizing the screening of the searching webpages by using relatively complex standards and improving the searching accuracy. The invention also relates to the technical field of block chains.

Description

Information searching method, device, equipment and storage medium
Technical Field
The present invention relates to the field of information searching technologies, and in particular, to a method and apparatus for searching information, a computer device, and a storage medium.
Background
With the advent of information-based society, information technology has gradually penetrated in human daily life, bringing great convenience to human daily life, for example, current information technologies such as communication technology, artificial intelligence technology, internet technology, and internet of things have created better living conditions for human beings. While information technology is widely used, there are numerous and varied web sites in the internet, where a large amount of information is stored, and people are easily submerged in the ocean of information. Information search engine technology (e.g., hundred degree search engine, google search engine, etc.) is emerging to assist users in finding desired information from a large amount of information. However, most of the current information search technologies only perform a single language search, that is, after the user inputs the searched keyword, only feedback to the user that the searched keyword belongs to the same language, for example, if the user inputs the keyword in chinese, the search result fed back to the user is chinese, and obviously, this search mode does not feed back the search result of other languages to the user, which results in a relatively smaller search range of the search result, and in the process of screening the search result, screening is generally performed only according to relatively simple criteria (for example, the number of times the keyword appears in the web page, the position where the keyword appears in the web page, etc.), which results in relatively lower accuracy of the search result. It can be seen that the search breadth and the search accuracy of the current information search method still have room for further improvement.
Disclosure of Invention
The technical problem to be solved by the invention is that the current information searching method has lower searching breadth and searching accuracy.
In order to solve the technical problem, a first aspect of the present invention discloses a method for searching information, which includes:
Obtaining basic keywords;
executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords;
Executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords;
And screening target webpages from all the search webpages based on the jump probability values between every two search webpages to serve as search results of the basic keywords, wherein the jump probability values are numerical values calculated according to jump relations between the two webpages.
The second aspect of the present invention discloses an information searching apparatus, the apparatus comprising:
the acquisition module is used for acquiring basic keywords;
The translation module is used for executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords;
The search module is used for executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords;
And the screening module is used for screening target webpages from all the search webpages based on the jump probability values between every two search webpages to serve as search results of the basic keywords, wherein the jump probability values are numerical values calculated according to jump relations between the two webpages.
A third aspect of the invention discloses a computer device comprising:
a memory storing executable program code;
A processor coupled to the memory;
The processor invokes the executable program code stored in the memory to perform some or all of the steps in the method for searching for information disclosed in the first aspect of the present invention.
A fourth aspect of the present invention discloses a computer storage medium storing computer instructions which, when invoked, are adapted to perform part or all of the steps of the method for searching for information disclosed in the first aspect of the present invention.
According to the embodiment of the invention, the basic keywords are firstly obtained, then the basic keywords are translated into the corresponding translation keywords, then the search is carried out according to the basic keywords and the translation keywords to obtain the search web pages corresponding to the basic keywords, and finally the target web pages are screened out from all the search web pages based on the jump probability values between every two search web pages to serve as search results, so that multilingual search of the basic keywords can be realized, the finally obtained search results can contain multiple languages, the search breadth is improved, and the search web pages are screened according to the jump probability values between every two search web pages to obtain the search results, so that the screening of the search web pages by using relatively complex standards can be realized, and the search accuracy is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for searching information according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an information searching apparatus according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Fig. 4 is a schematic structural view of a computer storage medium according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or article.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The invention discloses an information searching method, a device, computer equipment and a storage medium, wherein basic keywords are firstly obtained, then the basic keywords are translated into corresponding translation keywords, then searching is carried out according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords, and finally target webpages are screened out from all the search webpages based on jump probability values between every two search webpages to serve as search results, so that multilingual searching of the basic keywords can be realized, the finally obtained search results can contain multiple languages, the searching breadth is improved, and the search webpages are screened according to jump probability values between every two search webpages to obtain search results, so that the screening of the search webpages by using relatively complex standards can be realized, and the searching accuracy is improved. The following will describe in detail.
Example 1
Referring to fig. 1, fig. 1 is a flow chart of a method for searching information according to an embodiment of the present invention. As shown in fig. 1, the search method of information may include the following operations:
101. And obtaining basic keywords.
In the above step 101, the base keyword may be input by the user, for example, the user may input english word "Homomorphic encryption" as the base keyword used for the subsequent search. Alternatively, the basic keywords input by the user may also include words of multiple languages, for example, words of both Chinese and English.
102. And executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords.
In the step 102, the basic keywords may be automatically translated by a preset translator (e.g., hundred degree translation, google translation, etc.), so as to translate the basic keywords into words of different languages (i.e., translated keywords). After the user inputs the basic keyword, the basic keyword may be set as a keywordAnd marks the languages of the basic keywords as. The user may pre-select the languages in which the underlying keywords are to be translated, thereby forming a language list {,……,And then the translator can translate the basic keywords into words of corresponding languages according to the language list set by the user. Such as in a list of languagesThe basic keywords input by a user are English words Homomorphic encryption in sequence, namely English words Homomorphic encryption are translated into homomorphic encryption and DE Wen Ciyu Homomorphe Verschl U sselung by a translator, and the homomorphic encryption and DE Wen Ciyu Homomorphe Verschl U sselung are respectively set as keywordsKeywords and method for producing the same
103. And executing preset searching operation according to the basic keywords and the translation keywords to obtain a search webpage corresponding to the basic keywords.
In step 103, after obtaining the base keyword and the translation keyword, the internet may be searched using the base keyword and the translation keyword, thereby obtaining a search web page related to the base keyword and the translation keyword. For example, for the above keywordsThe search condition AND (TERM @ can be set in the following manner)),TERM(),TERM()),FROM(LAN(),LAN(),LAN() And searching the Internet according to the search conditions to obtain a search webpage. Wherein AND represents grabbing a web page on the Internet meeting one of a plurality of conditions behind AND, TERM @) Representing that the crawled web page contains keywords,TERM()、TERM() And the same is true. FROM sets the language of searching web page and LAN #) Representing languages requiring searching for web pages as,LAN()、LAN() And the same is true.
104. And screening target webpages from all the search webpages based on the jump probability values between every two search webpages to serve as search results of the basic keywords, wherein the jump probability values are numerical values calculated according to jump relations between the two webpages.
In step 104, after obtaining the search web pages, the search web pages may be screened according to the jump probability value between every two search web pages, so as to obtain the final search result, and the specific screening process will be described in detail later. The search web pages are screened according to the jump probability value between every two search web pages to obtain search results, so that the search web pages can be screened by using relatively complex standards, and the accuracy of information search is improved.
Therefore, implementing the method for searching information described in fig. 1, firstly obtaining the basic keyword, then translating the basic keyword into the corresponding translation keyword, then searching according to the basic keyword and the translation keyword to obtain the search web page corresponding to the basic keyword, and finally screening the target web page from all the search web pages based on the jump probability value between every two search web pages to serve as the search result, thereby being capable of realizing multilingual search for the basic keyword, enabling the finally obtained search result to contain multiple languages, improving the breadth of the search, and screening the search web page according to the jump probability value between every two search web pages to obtain the search result, so that the screening of the search web page by using relatively complex standards can be realized, and the accuracy of the search is improved.
In an optional embodiment, the screening the target web page from all the search web pages based on the jump probability value between every two search web pages includes:
screening intermediate search webpages from all the search webpages according to the occurrence times of the basic keywords and the translation keywords in the search webpages;
And screening target web pages from all the intermediate search web pages based on the jump probability value between every two intermediate search web pages.
In this alternative embodiment, when screening the search web page, the first screening may be performed on the search web page according to the number of occurrences of the basic keyword and the translation keyword in the search web page, and then based on the first screening result, the second screening may be performed based on the jump probability value, so as to screen the target web page. Therefore, the target webpage is screened out through twice screening, and the target webpage can be screened out of the search webpages more accurately to serve as a search result, so that the accuracy of information search is improved.
Therefore, according to the implementation of the alternative embodiment, the search web page is firstly screened for the first time according to the occurrence times of the basic keywords and the translation keywords in the search web page, then the second screening is carried out based on the jump probability value on the basis of the first screening result, and the target web page can be screened out from the search web page more accurately to serve as the search result, so that the accuracy of information search is improved.
In an alternative embodiment, the screening the intermediate search web pages from all the search web pages according to the number of occurrences of the basic keyword and the translation keyword in the search web pages includes:
calculating the coincidence value of each search web page through the following formula:
Wherein, Is the number of times the target keyword appears in the search web page,Is the number of search web pages including the target keyword among all the search web pages,The number of the search web pages is all, wherein the target keywords are the basic keywords or the translation keywords;
And screening out the search web pages meeting the preset matching value conditions from all the search web pages to serve as intermediate search web pages according to the matching value of each search web page.
In this alternative embodiment, the number of the elements in the system,The number of times the target keyword appears in the search web page is represented, so the more the target keyword appears in the search web page, i.e., the higher the matching value of the search web page. The higher the duty ratio of the search web page containing the target keyword in all the search web pages, i.e. the higher the matching value of the search web page containing the target keyword.
Therefore, according to the implementation of the alternative embodiment, the matching value of each search webpage is calculated through a preset formula containing the occurrence times of the basic keywords and the translation keywords in the search webpage, and then the middle search webpage is screened out according to the matching value of each search webpage, so that the screening of the middle search webpage according to the occurrence times of the basic keywords and the translation keywords in the search webpage can be realized, and the accuracy of information search is improved.
In an optional embodiment, the screening, according to the matching value of each search web page, search web pages meeting the preset matching value condition from all the search web pages as intermediate search web pages includes:
Sorting all the search webpages according to the matching value of each search webpage;
taking the search web pages within the preset ordering range as the intermediate search web page, or
Acquiring a preset coincidence value threshold value;
and taking the search web page with the coincidence value larger than the coincidence value threshold value as the intermediate search web page.
In this alternative embodiment, the search web pages may be ranked from high to low according to the matching value of each search web page, and then the search web pages within the preset ranking range (e.g., the top 50 of the ranking, the top 100 of the ranking) are used as the middle search web pages. The matching value threshold may be preset, and when the matching value of the search web page is calculated to be greater than the matching value threshold, the search web page may be used as an intermediate search web page.
It can be seen that, by implementing the alternative embodiment, the search web pages are ranked according to the matching values of the search web pages, then the search web pages in the preset ranking range are used as intermediate search web pages, or a matching value threshold is preset, and then the corresponding search web page with the matching value larger than the matching value threshold is used as the intermediate search web page, so that the appropriate intermediate search web page can be screened from all the search web pages, and the accuracy of information search is improved.
In an optional embodiment, the screening the target web page from all the intermediate search web pages based on the jump probability value between every two intermediate search web pages includes:
calculating the relevant value of each intermediate search web page through the following formula:
Wherein, Is a vector of length n,The ith element in the list is the related value of the ith intermediate search web page, n is the total number of the intermediate search web pages, M is a matrix of n x n, each element in M is a jump probability value between every two intermediate search web pages, E is a matrix of n x n, the value of each element in E is 1/n, and alpha is a preset web page jump coefficient;
and screening the intermediate search web pages meeting the preset relevant value conditions from all the intermediate search web pages to serve as target web pages according to the relevant values of each intermediate search web page.
In this alternative embodiment, the number of the elements in the system,I.e., the pagerank score (i.e., the relevance value of the intermediate search web page) corresponding to each intermediate search web page. Alpha is a preset web page jump factor, which may be typically empirically set, and may typically be set to 0.15.E is a random jump matrix that is used to represent the randomness of the web page jumps, so the value of each element in E is set to 1/n.
Therefore, according to the implementation of the alternative embodiment, the correlation value of each intermediate search webpage is calculated through a preset formula containing the jump probability value between every two intermediate search webpages, and then the target webpages are screened out according to the correlation value of each intermediate search webpage, so that screening of the target webpages according to the jump probability value between every two intermediate search webpages can be realized, and the accuracy of information search is improved.
In an optional embodiment, the screening, according to the relevance value of each intermediate search web page, the intermediate search web pages meeting the preset relevance value condition from all the intermediate search web pages as target web pages includes:
sorting all the intermediate search web pages according to the correlation value of each intermediate search web page;
taking the middle search web page within the preset ordering range as the target web page, or
Acquiring a preset correlation value threshold;
And taking the intermediate search web page with the correlation value larger than the correlation value threshold value as the target web page.
In this alternative embodiment, the middle search web pages may be ranked from high to low according to the relevant value of each middle search web page, and then the middle search web pages within the preset ranking range (for example, the top 50 ranking, the top 100 ranking) are used as the target web pages. And a correlation value threshold value can be preset, and when the correlation value of the intermediate search web page is calculated to be larger than the correlation value threshold value, the intermediate search web page can be used as a target web page.
It can be seen that, in implementing the alternative embodiment, the intermediate search web pages are ranked according to the correlation values of the intermediate search web pages, then the intermediate search web pages in the preset ranking range are used as target web pages, or a correlation value threshold is preset, and then the intermediate search web pages with the correlation values greater than the correlation value threshold are used as target web pages, so that appropriate target web pages can be selected from all the intermediate search web pages, and the accuracy of information search is improved.
In an alternative embodiment, the jump probability value between two web pages is calculated by the following formula:
Wherein a is the jump probability value of the first web page to the second web page in the two web pages, c is the number of all jump links in the first web page, and b is the number of all jump links in the first web page from the first web page to the second web page.
In this alternative embodiment, the jump probability value between two web pages is a value representing the probability of a first of the two web pages jumping to the second web page, and thus may be calculated based on the number of all jump links in the first web page and the number of all jump links in the first web page jumping from the first web page to the second web page. For example, there are three links in total in web page a, where only one link is jumped to web page b, so the jump probability value between web page a and web page b is 1/3.
It can be seen that, by implementing this alternative embodiment, the jump probability value between the two web pages is calculated according to the number of all the jump links in the first web page and the number of all the jump links from the first web page to the second web page in the first web page, so that screening of the target web page according to the jump relationship between the web pages can be implemented, and the accuracy of information search is improved.
Optionally, it is also possible to: and uploading the search information of the information search method to a block chain.
Specifically, the information searching information is obtained by running the information searching method and is used for recording information searching conditions, such as acquired basic keywords, translated keywords obtained by translation, searched search webpages, screened target webpages and the like. Uploading the search information of the information to the blockchain can ensure the security and the fair transparency to the user. The user can download the search information of the information from the blockchain to verify whether the search information of the information search method is tampered with. The blockchain referred to in this example is a novel mode of application for computer technology such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Therefore, by implementing the embodiment of the invention, the basic keywords are firstly obtained, then the basic keywords are translated into the corresponding translation keywords, then the search is carried out according to the basic keywords and the translation keywords to obtain the search web pages corresponding to the basic keywords, and finally the target web pages are screened out from all the search web pages based on the jump probability values between every two search web pages to serve as search results, so that multilingual search of the basic keywords can be realized, the finally obtained search results can contain multiple languages, the search breadth is improved, and the search web pages are screened according to the jump probability values between every two search web pages to obtain the search results, so that the screening of the search web pages by using relatively complex standards can be realized, and the search accuracy is improved. The method comprises the steps of selecting a search webpage according to the number of times of occurrence of basic keywords and translation keywords in the search webpage, selecting a target webpage from the search webpage more accurately as a search result on the basis of a first selection result, and selecting a second selection based on a jump probability value, so that the accuracy of information search is improved. And calculating the coincidence value of each search webpage through a preset formula containing the occurrence times of the basic keywords and the translation keywords in the search webpage, and screening out the middle search webpage according to the coincidence value of each search webpage, so that the screening of the middle search webpage according to the occurrence times of the basic keywords and the translation keywords in the search webpage can be realized, and the accuracy of information search is improved. And sorting the search webpages according to the matching values of the search webpages, taking the search webpages in a preset sorting range as intermediate search webpages, or presetting a matching value threshold, and taking the search webpages with the corresponding matching values larger than the matching value threshold as intermediate search webpages, so that the appropriate intermediate search webpages can be screened out from all the search webpages, and the accuracy of information search is improved. And calculating the correlation value of each intermediate search web page through a preset formula containing the jump probability value between every two intermediate search web pages, and screening out the target web page according to the correlation value of each intermediate search web page, so that the screening of the target web page according to the jump probability value between every two intermediate search web pages can be realized, and the accuracy of information search is improved. And sorting the intermediate search web pages according to the correlation values of the intermediate search web pages, taking the intermediate search web pages in a preset sorting range as target web pages, or presetting a correlation value threshold, and taking the intermediate search web pages with the corresponding correlation values larger than the correlation value threshold as target web pages, so that proper target web pages can be screened out from all the intermediate search web pages, and the accuracy of information search is improved. And calculating the jump probability value between the two webpages according to the number of all jump links in the first webpage and the number of all jump links in the first webpage from the first webpage to the second webpage, so that the screening of target webpages according to the jump relationship between the webpages can be realized, and the accuracy of information searching is improved.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of an information searching apparatus according to an embodiment of the present invention. As shown in fig. 2, the information searching apparatus may include:
an acquisition module 201, configured to acquire a basic keyword;
The translation module 202 is configured to perform a preset automatic translation operation on the base keyword, so as to obtain a translation keyword corresponding to the base keyword;
The search module 203 is configured to perform a preset search operation according to the base keyword and the translation keyword, so as to obtain a search web page corresponding to the base keyword;
and the screening module 204 is configured to screen a target webpage from all the search webpages based on a jump probability value between every two search webpages, so as to serve as a search result of the basic keyword, where the jump probability value is a value calculated according to a jump relationship between two webpages.
In an alternative embodiment, the filtering module 204 may filter the target web page from all the search web pages based on the jump probability value between every two search web pages in the following specific manner:
screening intermediate search webpages from all the search webpages according to the occurrence times of the basic keywords and the translation keywords in the search webpages;
And screening target web pages from all the intermediate search web pages based on the jump probability value between every two intermediate search web pages.
In an alternative embodiment, the filtering module 204 filters the intermediate search web pages from all the search web pages according to the number of occurrences of the base keyword and the translation keyword in the search web pages in the following specific manner:
calculating the coincidence value of each search web page through the following formula:
Wherein, Is the number of times the target keyword appears in the search web page,Is the number of search web pages including the target keyword among all the search web pages,The number of the search web pages is all, wherein the target keywords are the basic keywords or the translation keywords;
And screening out the search web pages meeting the preset matching value conditions from all the search web pages to serve as intermediate search web pages according to the matching value of each search web page.
In an alternative embodiment, the filtering module 204 filters, according to the matching value of each search web page, search web pages meeting the preset matching value condition from all the search web pages as intermediate search web pages in the following specific ways:
Sorting all the search webpages according to the matching value of each search webpage;
taking the search web pages within the preset ordering range as the intermediate search web page, or
Acquiring a preset coincidence value threshold value;
and taking the search web page with the coincidence value larger than the coincidence value threshold value as the intermediate search web page.
In an alternative embodiment, the filtering module 204 may filter the target web page from all the intermediate search web pages based on the jump probability value between every two intermediate search web pages in the following specific ways:
calculating the relevant value of each intermediate search web page through the following formula:
Wherein, Is a vector of length n,The ith element in the list is the related value of the ith intermediate search web page, n is the total number of the intermediate search web pages, M is a matrix of n x n, each element in M is a jump probability value between every two intermediate search web pages, E is a matrix of n x n, the value of each element in E is 1/n, and alpha is a preset web page jump coefficient;
and screening the intermediate search web pages meeting the preset relevant value conditions from all the intermediate search web pages to serve as target web pages according to the relevant values of each intermediate search web page.
In an alternative embodiment, the filtering module 204 filters, according to the correlation value of each intermediate search web page, the intermediate search web pages that meet the preset correlation value condition from all the intermediate search web pages as the target web pages in the following specific manner:
sorting all the intermediate search web pages according to the correlation value of each intermediate search web page;
taking the middle search web page within the preset ordering range as the target web page, or
Acquiring a preset correlation value threshold;
And taking the intermediate search web page with the correlation value larger than the correlation value threshold value as the target web page.
In an alternative embodiment, the jump probability value between two web pages is calculated by the following formula:
Wherein a is the jump probability value of the first web page to the second web page in the two web pages, c is the number of all jump links in the first web page, and b is the number of all jump links in the first web page from the first web page to the second web page.
For the specific description of the information searching apparatus, reference may be made to the specific description of the information searching method, and for avoiding repetition, a detailed description is omitted herein.
Example III
Referring to fig. 3, fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the invention. As shown in fig. 3, the computer device may include:
a memory 301 storing executable program code;
A processor 302 connected to the memory 301;
The processor 302 invokes executable program code stored in the memory 301 to perform steps in the method for searching information disclosed in the first embodiment of the present invention.
Example IV
Referring to fig. 4, an embodiment of the present invention discloses a computer storage medium 401, where the computer storage medium 401 stores computer instructions for executing steps in the information searching method disclosed in the first embodiment of the present invention when the computer instructions are called.
The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product that may be stored in a computer-readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic disc Memory, tape Memory, or any other medium that can be used for computer-readable carrying or storing data.
Finally, it should be noted that: the disclosure of the method, the device, the computer equipment and the storage medium for searching information in the embodiment of the invention is only a preferred embodiment of the invention, and is only used for illustrating the technical scheme of the invention, but not limiting the technical scheme; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme recorded in the various embodiments can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (7)

1. A method for searching information, the method comprising:
Obtaining basic keywords;
executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords;
Executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords;
screening target webpages from all the search webpages based on the jump probability values between every two search webpages to serve as search results of the basic keywords, wherein the jump probability values are numerical values calculated according to jump relations between the two webpages;
The screening the target web page from all the search web pages based on the jump probability value between every two search web pages comprises the following steps:
screening intermediate search webpages from all the search webpages according to the occurrence times of the basic keywords and the translation keywords in the search webpages;
screening target web pages from all the intermediate search web pages based on the jump probability value between every two intermediate search web pages;
and screening intermediate search webpages from all the search webpages according to the occurrence times of the basic keywords and the translation keywords in the search webpages, wherein the method comprises the following steps:
calculating the coincidence value of each search web page through the following formula:
Wherein, Is the number of times the target keyword appears in the search web page,Is the number of search web pages including the target keyword among all the search web pages,The number of the search web pages is all, wherein the target keywords are the basic keywords or the translation keywords;
Screening out the search web pages meeting the preset matching value conditions from all the search web pages to serve as intermediate search web pages according to the matching value of each search web page;
the screening the target web page from all the intermediate search web pages based on the jump probability value between every two intermediate search web pages comprises the following steps:
calculating the relevant value of each intermediate search web page through the following formula:
Wherein, Is a vector of length n,The ith element in the list is the related value of the ith intermediate search web page, n is the total number of the intermediate search web pages, M is a matrix of n x n, each element in M is a jump probability value between every two intermediate search web pages, E is a matrix of n x n, the value of each element in E is 1/n, and alpha is a preset web page jump coefficient;
and screening the intermediate search web pages meeting the preset relevant value conditions from all the intermediate search web pages to serve as target web pages according to the relevant values of each intermediate search web page.
2. The method according to claim 1, wherein the step of screening out search pages meeting a preset matching condition from among all the search pages as an intermediate search page according to the matching value of each of the search pages comprises:
Sorting all the search webpages according to the matching value of each search webpage;
taking the search web pages within the preset ordering range as the intermediate search web page, or
Acquiring a preset coincidence value threshold value;
and taking the search web page with the coincidence value larger than the coincidence value threshold value as the intermediate search web page.
3. The information searching method according to claim 1, wherein the step of screening the intermediate search web pages meeting the preset condition of the correlation value from all the intermediate search web pages as the target web pages according to the correlation value of each of the intermediate search web pages comprises:
sorting all the intermediate search web pages according to the correlation value of each intermediate search web page;
taking the middle search web page within the preset ordering range as the target web page, or
Acquiring a preset correlation value threshold;
And taking the intermediate search web page with the correlation value larger than the correlation value threshold value as the target web page.
4. A method of searching for information according to any one of claims 1 to 3, wherein the jump probability value between two web pages is calculated by the following formula:
Wherein a is the jump probability value of the first web page to the second web page in the two web pages, c is the number of all jump links in the first web page, and b is the number of all jump links in the first web page from the first web page to the second web page.
5. An information searching apparatus for implementing the information searching method according to any one of claims 1 to 4, characterized in that the apparatus comprises:
the acquisition module is used for acquiring basic keywords;
The translation module is used for executing preset automatic translation operation on the basic keywords to obtain translation keywords corresponding to the basic keywords;
The search module is used for executing preset search operation according to the basic keywords and the translation keywords to obtain search webpages corresponding to the basic keywords;
And the screening module is used for screening target webpages from all the search webpages based on the jump probability values between every two search webpages to serve as search results of the basic keywords, wherein the jump probability values are numerical values calculated according to jump relations between the two webpages.
6. A computer device, the computer device comprising:
a memory storing executable program code;
A processor coupled to the memory;
the processor invokes the executable program code stored in the memory to perform the method of searching for information as claimed in any one of claims 1 to 4.
7. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of searching for information according to any of claims 1-4.
CN202110844238.2A 2021-07-26 2021-07-26 Information searching method, device, equipment and storage medium Active CN113486246B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110844238.2A CN113486246B (en) 2021-07-26 2021-07-26 Information searching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110844238.2A CN113486246B (en) 2021-07-26 2021-07-26 Information searching method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113486246A CN113486246A (en) 2021-10-08
CN113486246B true CN113486246B (en) 2024-07-12

Family

ID=77942612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110844238.2A Active CN113486246B (en) 2021-07-26 2021-07-26 Information searching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113486246B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116595142B (en) * 2023-05-19 2024-07-16 大安健康科技(北京)有限公司 Retrieval matching method and system based on medical semantic analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455523A (en) * 2012-06-05 2013-12-18 深圳市世纪光速信息技术有限公司 Method and server for searching information
CN103488648A (en) * 2012-06-13 2014-01-01 阿里巴巴集团控股有限公司 Multilanguage mixed retrieval method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8924558B2 (en) * 2005-11-30 2014-12-30 John Nicholas and Kristin Gross System and method of delivering content based advertising
CN104217016B (en) * 2014-09-22 2018-02-02 北京国双科技有限公司 Webpage search keyword statistical method and device
CN104715063B (en) * 2015-03-31 2018-11-02 百度在线网络技术(北京)有限公司 search ordering method and device
CN105404688A (en) * 2015-12-11 2016-03-16 北京奇虎科技有限公司 Searching method and searching device
CN111460810A (en) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 Crowd-sourced task spot check method and device, computer equipment and storage medium
CN111708911B (en) * 2020-06-17 2022-06-24 北京字节跳动网络技术有限公司 Searching method, searching device, electronic equipment and computer-readable storage medium
CN111831885B (en) * 2020-07-14 2021-03-16 深圳市众创达企业咨询策划有限公司 Internet information retrieval system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455523A (en) * 2012-06-05 2013-12-18 深圳市世纪光速信息技术有限公司 Method and server for searching information
CN103488648A (en) * 2012-06-13 2014-01-01 阿里巴巴集团控股有限公司 Multilanguage mixed retrieval method and system

Also Published As

Publication number Publication date
CN113486246A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
US9489401B1 (en) Methods and systems for object recognition
US8261237B2 (en) Software tool for detecting plagiarism in computer source code
US8463593B2 (en) Natural language hypernym weighting for word sense disambiguation
JP4633162B2 (en) Index generation system, information retrieval system, and index generation method
US20090198676A1 (en) Indexing Documents for Information Retrieval
US8417657B2 (en) Methods and apparatus for computing graph similarity via sequence similarity
US9251274B2 (en) Grouping search results into a profile page
CN110321437B (en) Corpus data processing method and device, electronic equipment and medium
WO2020041413A1 (en) Sibling search queries
CN110569419A (en) question-answering system optimization method and device, computer equipment and storage medium
JP2010049372A (en) Content search apparatus
CN113515589A (en) Data recommendation method, device, equipment and medium
CN113486246B (en) Information searching method, device, equipment and storage medium
CN104778232B (en) Searching result optimizing method and device based on long query
CN116756392B (en) Medical information tracing method, cloud platform and storage medium
CN113010771A (en) Training method and device for personalized semantic vector model in search engine
KR101505673B1 (en) Multi-language searching system, multi-language searching method, and image searching system based on meaning of word
JP2010282403A (en) Document retrieval method
CN112257408A (en) Text comparison method and related device
CN111444707B (en) Title generation method and device and computer readable storage medium
US20220405336A1 (en) System and Method for Modification, Personalization and Customization of Search Results and Search Result Ranking in an Internet-Based Search Engine
CN113176878B (en) Automatic query method, device and equipment
Blanco-Fernández et al. Automatically Assembling a Custom-Built Training Corpus for Improving the Learning of In-Domain Word/Document Embeddings
CN113505889B (en) Processing method and device of mapping knowledge base, computer equipment and storage medium
WO2011011777A2 (en) Pre-computed ranking using proximity terms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant