CN114610973A - Information search matching method and device, computer equipment and storage medium - Google Patents

Information search matching method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114610973A
CN114610973A CN202210285456.1A CN202210285456A CN114610973A CN 114610973 A CN114610973 A CN 114610973A CN 202210285456 A CN202210285456 A CN 202210285456A CN 114610973 A CN114610973 A CN 114610973A
Authority
CN
China
Prior art keywords
information
search
source file
post
web crawler
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210285456.1A
Other languages
Chinese (zh)
Inventor
王明磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202210285456.1A priority Critical patent/CN114610973A/en
Publication of CN114610973A publication Critical patent/CN114610973A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application relates to the field of search engines, in particular to a method, a device, computer equipment and a storage medium for searching and matching information, wherein the method comprises the following steps: acquiring a source file of a company homepage through a web crawler; analyzing the source file and extracting recruitment information in the source file; screening out post information with the same post characteristics from the recruitment information according to a preset clustering rule to obtain aggregated post information; storing the post characteristics corresponding to the aggregation post information association into a search engine database; and after a search instruction of a search engine is received, matching the post characteristics and the corresponding target post information from the search engine database according to the search characteristics contained in the search instruction. According to the information searching and matching method and device based on the network crawler recruitment system, the problems of low efficiency and the aspects of searching a large amount of advertisements, false information and the like are effectively solved, the searched information is more accurate, and the efficiency of the target information is improved.

Description

Information search matching method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of search engines, and in particular, to a method and an apparatus for searching and matching information, a computer device, and a storage medium.
Background
In the massive information age, finding accurate information in massive information is a problem that every network user has to face at present. The job hunting is no exception, more and more job information is published on the internet, and for massive information, most search engines are used to acquire information at present, the present search engines are mainly general-purpose search engines, and with the exponential growth of the network information quantity, the general-purpose search engines have the defects that many result sets are searched and are not information required by searchers, no matter what the user is, and what kind of target is held, all searched information is the same as long as the searched information is the same, so that a large amount of manpower is wasted on searching, and particularly, the efficiency and accuracy of searching for the job information are low.
Disclosure of Invention
The main purpose of the present application is to provide a method, an apparatus, a computer device, and a storage medium for searching and matching information, which aim to solve the problem of low efficiency and accuracy of information obtained by searching in a recruitment system.
In order to achieve the above object, the present application provides a method for searching and matching information, the method including:
acquiring a source file of a company homepage through a web crawler;
analyzing the source file and extracting recruitment information in the source file;
screening out post information with the same post characteristics from the recruitment information according to a preset clustering rule to obtain aggregated post information;
storing the post characteristics corresponding to the aggregation post information association into a search engine database;
and after a search instruction of a search engine is received, matching the post characteristics and the corresponding target post information from the search engine database according to the search characteristics contained in the search instruction.
Further, the obtaining of the source file of the company homepage through the web crawler includes:
acquiring first website information of a company homepage;
acquiring a first source file of a company homepage corresponding to the first website information through a web crawler;
extracting second website information of a recruitment homepage of the company homepage from the first source file;
and acquiring a second source file of the recruitment homepage corresponding to the second website information through a web crawler.
Further, the obtaining, by the web crawler, the first source file of the company homepage corresponding to the first website information includes:
acquiring login information of the first website information;
storing the login information to a user identity file of a browser;
and acquiring a first source file of a company homepage corresponding to the first website information on the browser through a web crawler based on the user identity file.
Further, before the obtaining the source file of the company homepage by the web crawler, the method includes:
acquiring a web crawler protocol of the homepage of the company;
analyzing the web crawler protocol to obtain web crawler types forbidden by the company homepage;
and determining an available web crawler according to the web crawler type so as to acquire a source file of a company homepage based on the available web crawler.
Further, the obtaining of the source file of the company homepage through the web crawler includes:
acquiring a pre-configured time node;
configuring a time interval of the web crawler according to the time node;
and controlling the web crawler to acquire the source file of the homepage of the company based on the time interval.
Further, the parsing the source file and extracting the recruitment information in the source file includes:
analyzing the source file based on a preset element selector, and extracting a plurality of element information of the source file;
and analyzing the element information based on a preset regular expression, and extracting the recruitment information in the element information.
Further, after the post characteristics and the corresponding target post information are matched from the search engine database according to the search characteristics included in the search instruction, the method includes:
acquiring a search characteristic of the search instruction;
acquiring the search utilization rate of the search features;
configuring the weight of the search features according to the search utilization rate;
and arranging the target post information obtained by matching according to the weight to obtain the ordered target post information.
The present application further provides a device for searching and matching information, including:
the file acquisition module is used for acquiring a source file of a company homepage through a web crawler;
the file analyzing module is used for analyzing the source file and extracting the recruitment information in the source file;
the information screening module is used for screening out the post information with the same post characteristics from the recruitment information according to a preset clustering rule to obtain the aggregated post information;
the information storage module is used for storing the post characteristics corresponding to the aggregation post information association into a search engine database;
and the information matching module is used for matching the post characteristics and the corresponding target post information from the search engine database according to the search characteristics contained in the search instruction after receiving the search instruction of the search engine.
The application further provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the search matching method of the information when executing the computer program.
The present application also provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor implements a search matching method for information according to any one of the above.
The embodiment of the application provides a method for searching and matching information by a recruitment system based on a network web crawler, which comprises the steps of firstly obtaining a source file of company homepages through the network crawler, configuring one or more addresses of the company homepages in the network crawler, then obtaining information displayed by the company homepages through the network crawler, packaging the obtained information to form a source file of the company homepages, then entering the next company homepage, repeating the process, thus obtaining the source file of the company homepages through the network crawler, analyzing the source file, extracting recruitment information in the source file, enabling the recruitment information to contain a large amount of different position information including the position information of the same company and the position information of different companies, then screening out the position information with the same position characteristics from the recruitment information according to a preset clustering rule, the method comprises the steps of obtaining aggregation post information, storing the post characteristics corresponding to the association of the aggregation post information into a search engine database, converting information on a company homepage acquired by a web crawler into the post information and storing the post information into the search engine database, being capable of providing information search service facing a specific field, analyzing a search instruction after receiving the search instruction of a search engine, acquiring the search characteristics contained in the search instruction, matching the post characteristics and corresponding target post information from the search engine database according to the search characteristics contained in the search instruction, displaying the target post information for a user, adopting a recruitment system based on the web crawler, providing search and matching of information in a specified field through the search engine, and effectively solving the problems of low efficiency, large amount of advertisements and false information searched and the like, the searched information is more accurate, and the efficiency of the target information is improved.
Drawings
Fig. 1 is a schematic flowchart of an embodiment of a search matching method for information according to the present application;
FIG. 2 is a flowchart illustrating an embodiment of obtaining a source file of a company homepage by a web crawler according to the present application;
FIG. 3 is a flowchart illustrating an embodiment of obtaining a first source file of a company homepage corresponding to the first website information by a web crawler according to the present application;
FIG. 4 is a flowchart illustrating an embodiment of the present application before a web crawler obtains a source file for a company homepage;
FIG. 5 is a flowchart illustrating an embodiment of obtaining a source file of a company homepage by a web crawler according to the present application;
fig. 6 is a flowchart illustrating an embodiment of parsing the source file and extracting recruitment information from the source file according to the present application;
FIG. 7 is a flowchart illustrating an embodiment of the present application after matching the position characteristics and corresponding target position information from the search engine database according to the search characteristics included in the search instruction;
FIG. 8 is a schematic structural diagram of an embodiment of a search matching apparatus according to the present disclosure;
FIG. 9 is a block diagram illustrating a computer device according to an embodiment of the present invention.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a method for searching and matching information, which includes steps S101 to S105, and the steps of the method for searching and matching information are described in detail as follows.
S101, obtaining a source file of a company homepage through a web crawler.
The embodiment is applied to a searching and matching scene of a recruitment system, in the scene, recruitment information is firstly acquired from homepages of a plurality of companies, then the acquired recruitment information is stored in a database of the recruitment system, a specified search engine is configured for the recruitment system, and corresponding recruitment content information can be quickly matched through retrieval of keywords or characteristics under the specified search engine. Specifically, first, a source file of a company homepage is obtained through a web crawler, in one embodiment, first, addresses of one or more company homepages are configured in the web crawler, for example, www.
S102, analyzing the source file, and extracting recruitment information in the source file.
In this embodiment, after a source file of a company homepage is acquired by a web crawler, the source file is parsed, recruitment information in the source file is extracted, only information facing a specific field is retained in order to screen information on the company homepage, the source file is parsed, and then the recruitment information in the source file is extracted according to matching of preset keywords, so that the recruitment information on the company homepage is acquired by the web crawler.
S103, screening out the post information with the same post characteristics from the recruitment information according to a preset clustering rule to obtain the aggregated post information.
In this embodiment, after parsing the source file and extracting the recruitment information in the source file, because at least one or more of the number of position information included in the recruitment information on a company homepage and the source files of different company homepages are obtained by the web crawler, that is, at least one or more of the number of the source files are obtained, the source file is parsed, and the number of the recruitment information in the extracted source file includes a plurality of the position information, that is, the recruitment information includes a large number of different position information including the position information of the same company and the position information of different companies, then the position information with the same position characteristic is screened out from the recruitment information according to a preset clustering rule to obtain aggregated position information, in one implementation, first, one position characteristic is determined, and then all the position information with the position characteristic is screened out from the recruitment information for aggregation, and acquiring the aggregated post information, wherein the post characteristics are 'electronic, java and programming', and all post information with the post characteristics, including an electronic engineer, a java engineer and the like, are screened from the recruitment information.
S104, storing the post characteristics corresponding to the aggregation post information association into a search engine database.
In this embodiment, after the post information with the same post characteristics is screened out from the recruitment information according to a preset clustering rule to obtain the aggregated post information, the post characteristics corresponding to the association of the aggregated post information are stored in a search engine database, specifically, a plurality of pieces of post information corresponding to each post characteristic are obtained, and then the post characteristics are used as the primary key identifiers of data storage, and the plurality of pieces of post information corresponding to the post characteristics are stored in the search engine database after being associated with the primary key identifiers, so that the information on the homepage of the company obtained by the web crawler is converted into the post information and stored in the search engine database.
And S105, after receiving a search instruction of a search engine, matching the post characteristics and the corresponding target post information from the search engine database according to the search characteristics contained in the search instruction.
In this embodiment, after the post features corresponding to the aggregated post information are stored in the search engine database, that is, after a large amount of recruitment information and post information displayed on the company homepage are acquired and stored, an information search service facing a specific field may be provided, and specifically, a search instruction of the search engine is received, where the search instruction includes limited information of the specific field, for example, "xx, recruitment" is input in the search engine, and after the search instruction of the search engine is received, the search instruction is analyzed to acquire a search feature included in the search instruction, and then the search feature is compared with the post features in the search engine database, a post feature whose similarity with the search feature is higher than a preset value is matched from the search engine database, and then the post information corresponding to the post feature is acquired, the post characteristics and the corresponding target post information are matched from the search engine database according to the search characteristics contained in the search instruction, then the target post information is displayed to the user, and the recruitment system based on the network web crawler effectively solves the problems of low efficiency, large amount of advertisements and false information searched through the search and matching of the search engine, so that the searched information is more accurate, and the efficiency of the target information is improved.
The embodiment provides a method for searching and matching information by a recruitment system based on a network web crawler, which comprises the steps of firstly obtaining a source file of a company homepage through the network crawler, configuring one or more addresses of the company homepage in the network crawler, then obtaining information displayed by the company homepage through the network crawler, packaging the obtained information to form a source file of the company homepage, then entering the next company homepage, repeating the process, thus obtaining the source file of the company homepage through the network crawler, analyzing the source file, extracting recruitment information in the source file, enabling the recruitment information to contain a large amount of different position information including position information of the same company and position information of different companies, and then screening out the position information with the same position characteristics from the recruitment information according to a preset clustering rule, the method comprises the steps of obtaining aggregation post information, storing the post characteristics corresponding to the association of the aggregation post information into a search engine database, converting information on a company homepage acquired by a web crawler into the post information and storing the post information into the search engine database, being capable of providing information search service facing a specific field, analyzing a search instruction after receiving the search instruction of a search engine, acquiring the search characteristics contained in the search instruction, matching the post characteristics and corresponding target post information from the search engine database according to the search characteristics contained in the search instruction, displaying the target post information for a user, adopting a recruitment system based on the web crawler, providing search and matching of information in a specified field through the search engine, and effectively solving the problems of low efficiency, large amount of advertisements and false information searched and the like, the searched information is more accurate, and the efficiency of the target information is improved.
In one embodiment, as shown in fig. 2, the obtaining of the source file of the company homepage by the web crawler further includes steps S201 to S204:
s201, acquiring first website information of a company homepage;
s202, acquiring a first source file of a company homepage corresponding to the first website information through a web crawler;
s203, extracting second website information of the recruitment homepage of the company homepage from the first source file;
s204, acquiring a second source file of the recruitment homepage corresponding to the second website information through a web crawler.
In this embodiment, in the process of obtaining a source file of a company homepage by a web crawler, first web address information of the company homepage is obtained, since the company homepage is a page containing various information, the first source file of the company homepage corresponding to the first web address information is obtained by the web crawler, then the first source file is parsed, second web address information of a recruitment homepage of the company homepage is extracted from the first source file, that is, the recruitment homepage contained in the company homepage is automatically accessed by the web crawler, then the second source file of the recruitment homepage corresponding to the second web address information is obtained by the web crawler, so that content information on the recruitment homepage is obtained by the web crawler, and recruitment information contained in the source file is more accurately extracted, in an implementation manner, first web address information www. The method comprises the steps of obtaining website information of a company homepage, obtaining first website information of the company homepage corresponding to the website information, extracting second website information www of a recruitment homepage from the first website information, and obtaining second website information of the recruitment homepage corresponding to the second website information.
In one embodiment, as shown in fig. 3, the obtaining of the first source file of the company homepage corresponding to the first website information by the web crawler includes steps S301 to S303:
s301, obtaining login information of the first website information;
s302, storing the login information to a user identity file of a browser;
s303, on the browser, acquiring a first source file of a company homepage corresponding to the first website information through a web crawler based on the user identity file.
In this embodiment, in order to enable the browser to normally access the server in the process of obtaining the first source file of the company homepage corresponding to the first website information through the web crawler, the server must be able to identify the browser to enable the web crawler to obtain the data of the company homepage, and therefore stable login information is required, the login information of the first website information is first obtained, and then the login information is stored in the user identity file of the browser, when identity authentication is required, the preconfigured login information is filled in through the user identity file, so that the server can identify the browser, and thus the browser can normally access the server, and then the first source file of the company homepage corresponding to the first website information is obtained on the browser through the web crawler based on the user identity file, thereby improving the success rate of information acquisition on the company homepage.
In one embodiment, as shown in fig. 4, before the obtaining of the source file of the company homepage by the web crawler, steps S401 to S403 are further included:
s401, acquiring a web crawler protocol of the homepage of the company;
s402, analyzing the web crawler protocol, and acquiring the types of web crawlers forbidden by the company homepage;
s403, determining an available web crawler according to the web crawler type, and acquiring a source file of a company homepage based on the available web crawler.
In this embodiment, before the source file of the company homepage is obtained by the web crawler, since different web site servers will set up the anti-web crawler protocol, in order to be able to efficiently acquire content information on a company homepage, a web crawler protocol of the company homepage is first acquired, in one embodiment, the web site of the homepage of the company is logged on to view the web site's web crawler protocol (robots protocol), e.g. login www. wang.com/robots.txt to look at the robots protocol of a certain net, then analyzing the web crawler protocol, obtaining the web crawler type prohibited by the company homepage, determining the available web crawler according to the web crawler type, and selecting other web crawlers from the forbidden web crawler types to acquire the source file of the homepage of the company based on the available web crawlers, thereby improving the success rate of acquiring the information on the homepage of the company.
In one embodiment, as shown in fig. 5, the obtaining of the source file of the company homepage by the web crawler includes steps S501-S503:
s501, acquiring a pre-configured time node;
s502, configuring the time interval of the web crawler according to the time node;
s503, controlling the web crawler to acquire the source file of the company homepage based on the time interval.
In this embodiment, in the process of obtaining a source file of a company homepage through a web crawler, a reverse-crawling mechanism of a server may determine whether to manually operate or crawl data again according to a speed at which a browser sends a request, in order to improve a success rate of obtaining content information on the company homepage by the web crawler, and to simulate browsing the company homepage manually as much as possible, specifically, a time node is obtained first, then a time interval of the web crawler is configured according to the time node, and then the web crawler is controlled to obtain the source file of the company homepage based on the time interval, in one embodiment, a shortest time interval of obtaining the company homepage by the web crawler is set to 7 seconds, self, request _ sleep is 7, and an application interval time set by the company homepage information obtained by the web crawler is set to 30 seconds, so that there is a waiting time for a program to parse the company homepage 30, it is difficult for the server to perceive that the web crawler is acquiring the content data to prevent the server from disabling the IP address of the browser, thereby improving the success rate of information acquisition on the homepage of the company.
In one embodiment, as shown in fig. 6, the parsing the source file and extracting the recruitment information in the source file includes steps S601-S602:
s601, analyzing the source file based on a preset element selector, and extracting a plurality of element information of the source file;
and S602, analyzing the element information based on a preset regular expression, and extracting the recruitment information in the element information.
In the embodiment, in the process of analyzing the source file and extracting the recruitment information in the source file, the source file is analyzed based on a preset element selector to extract a plurality of kinds of element information of the source file, the element selection right comprises a CSS selector, the CSS selector comprises a category selector, a tag selector and an ID selector, the element selector is used for extracting the plurality of kinds of element information of the source file, the element information is analyzed based on a preset regular expression to extract the recruitment information in the element information, and the regular expression can quickly analyze the element information to extract the recruitment information in the element information, so that the acquisition efficiency of the content information on the homepage of the company is improved.
In one embodiment, as shown in fig. 7, after the position characteristics and the corresponding target position information are matched from the search engine database according to the search characteristics included in the search instruction, the method further includes steps S701 to S704:
s701, acquiring the search characteristics of the search instruction;
s702, acquiring the search utilization rate of the search features;
s703, configuring the weight of the search feature according to the search utilization rate;
s704, arranging the target post information obtained by matching according to the weight to obtain the ordered target post information.
In this embodiment, after the post features and the corresponding target post information are matched from the search engine database according to the search features included in the search instruction, the target post information needs to be sorted, so that the sorted target post information can better meet the requirement of user search, specifically, the search features of the search instruction are firstly obtained, then the search utilization rate of the search features is obtained, that is, the frequency of the search features is used, when a plurality of search features are included, each search feature can match different post information, then the weight of the search features is configured according to the search utilization rate, the target post information obtained by matching is arranged according to the weight, so as to obtain the sorted target post information, and when the weight of the search features is larger, the target post information obtained by matching is sorted in the front, and when the weight of the search features is smaller, the target post information obtained by matching is sequenced later, so that the target post information is sequenced, the sequenced target post information can better meet the search requirement of a user, and the accuracy of information search matching is improved.
Referring to fig. 8, the present application further provides an information search matching apparatus, including:
the file acquisition module 101 is used for acquiring a source file of a company homepage through a web crawler;
the file analyzing module 102 is configured to analyze the source file and extract the recruitment information from the source file;
the information screening module 103 is used for screening out the post information with the same post characteristics from the recruitment information according to a preset clustering rule to obtain the aggregated post information;
an information storage module 104, configured to store the post characteristics corresponding to the aggregation post information association in a search engine database;
and the information matching module 105 is used for matching the post characteristics and the corresponding target post information from the search engine database according to the search characteristics contained in the search instruction after receiving the search instruction of the search engine.
As described above, it is understood that the components of the information search matching device proposed in the present application may implement the functions of any one of the information search matching methods described above.
In one embodiment, the obtaining of the source file of the company homepage by the web crawler includes:
acquiring first website information of a company homepage;
acquiring a first source file of a company homepage corresponding to the first website information through a web crawler;
extracting second website information of a recruitment homepage of the company homepage from the first source file;
and acquiring a second source file of the recruitment homepage corresponding to the second website information through a web crawler.
In one embodiment, the obtaining, by the web crawler, a first source file of a company homepage corresponding to the first website information includes:
acquiring login information of the first website information;
storing the login information to a user identity file of a browser;
and acquiring a first source file of a company homepage corresponding to the first website information on the browser through a web crawler based on the user identity file.
In one embodiment, before obtaining the source file of the company homepage through the web crawler, the method further includes:
acquiring a web crawler protocol of the homepage of the company;
analyzing the web crawler protocol to obtain web crawler types forbidden by the company homepage;
and determining an available web crawler according to the web crawler type so as to acquire a source file of a company homepage based on the available web crawler.
In one embodiment, the obtaining of the source file of the company homepage by the web crawler includes:
acquiring a pre-configured time node;
configuring a time interval of the web crawler according to the time node;
and controlling the web crawler to acquire a source file of a company homepage based on the time interval.
In one embodiment, the parsing the source file and extracting recruitment information in the source file comprises:
analyzing the source file based on a preset element selector, and extracting a plurality of element information of the source file;
and analyzing the element information based on a preset regular expression, and extracting the recruitment information in the element information.
In one embodiment, after matching the position characteristics and the corresponding target position information from the search engine database according to the search characteristics included in the search instruction, the method includes:
acquiring a search characteristic of the search instruction;
acquiring the search utilization rate of the search features;
configuring the weight of the search features according to the search utilization rate;
and arranging the target post information obtained by matching according to the weight to obtain the ordered target post information.
Referring to fig. 9, an embodiment of the present application further provides a computer device, where the computer device may be a mobile terminal, and an internal structure of the computer device may be as shown in fig. 9. The computer equipment comprises a processor, a memory, a network interface, a display device and an input device which are connected through a system bus. Wherein, the network interface of the computer equipment is used for communicating with an external terminal through network connection. The display device of the computer device is used for displaying the offline application. The input device of the computer device is used for receiving the input of the user in offline application. The computer designed processor is used to provide computational and control capabilities. The memory of the computer device includes non-volatile storage media. The non-volatile storage medium stores an operating system, a computer program, and a database. The database of the computer device is used for storing the original data. The computer program is executed by a processor to implement a method of search matching of information.
The processor executes the above information search matching method, and the method includes: acquiring a source file of a company homepage through a web crawler; analyzing the source file and extracting recruitment information in the source file; screening out post information with the same post characteristics from the recruitment information according to a preset clustering rule to obtain aggregated post information; storing the post characteristics corresponding to the aggregation post information association into a search engine database; and after a search instruction of a search engine is received, matching the post characteristics and the corresponding target post information from the search engine database according to the search characteristics contained in the search instruction.
The computer equipment provides a method for searching and matching information based on a recruitment system of a network web crawler, which comprises the steps of firstly obtaining a source file of a company homepage through the network crawler, configuring one or more addresses of the company homepage in the network crawler, then obtaining information displayed by the company homepage through the network crawler, packaging the obtained information to form a source file of the company homepage, then entering the next company homepage, repeating the process, thus obtaining the source file of the company homepage through the network crawler, analyzing the source file, extracting recruitment information in the source file, enabling the recruitment information to contain a large amount of different position information including the position information of the same company and the position information of different companies, and then screening out the position information with the same position characteristics from the recruitment information according to a preset clustering rule, the method comprises the steps of obtaining aggregation post information, storing the post characteristics corresponding to the association of the aggregation post information into a search engine database, converting information on a company homepage acquired by a network crawler into the post information and storing the post information into the search engine database, being capable of providing information search service facing a specific field, analyzing a search instruction after receiving the search instruction of a search engine, acquiring the search characteristics contained in the search instruction, matching the post characteristics and corresponding target post information from the search engine database according to the search characteristics contained in the search instruction, displaying the target post information for a user, providing searching and matching of information in the specified field through the search engine based on a recruitment system of the network crawler, effectively solving the problems of low efficiency, large amount of advertisements and false information searching and the like, the searched information is more accurate, and the efficiency of the target information is improved.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by the processor, implementing a method for search matching of information, including the steps of: acquiring a source file of a company homepage through a web crawler; analyzing the source file and extracting recruitment information in the source file; screening out post information with the same post characteristics from the recruitment information according to a preset clustering rule to obtain aggregated post information; storing the post characteristics corresponding to the aggregation post information association into a search engine database; and after a search instruction of a search engine is received, matching the post characteristics and the corresponding target post information from the search engine database according to the search characteristics contained in the search instruction.
The computer readable storage medium provides a method for searching and matching information for a recruitment system based on network crawlers, which comprises the steps of firstly obtaining a source file of a company homepage through the network crawlers, configuring one or more addresses of the company homepage in the network crawlers, obtaining information displayed by the company homepage through the network crawlers, packaging the obtained information to form a source file of the company homepage, then entering the next company homepage, repeating the process, thus obtaining the source file of the company homepage through the network crawlers, analyzing the source file, extracting recruitment information in the source file, enabling the recruitment information to contain a large number of different post information including the post information of the same company and the post information of different companies, then screening the post information with the same post characteristics from the recruitment information according to a preset clustering rule, the method comprises the steps of obtaining aggregation post information, storing the post characteristics corresponding to the association of the aggregation post information into a search engine database, converting information on a company homepage acquired by a web crawler into the post information and storing the post information into the search engine database, being capable of providing information search service facing a specific field, analyzing a search instruction after receiving the search instruction of a search engine, acquiring the search characteristics contained in the search instruction, matching the post characteristics and corresponding target post information from the search engine database according to the search characteristics contained in the search instruction, displaying the target post information for a user, adopting a recruitment system based on the web crawler, providing search and matching of information in a specified field through the search engine, and effectively solving the problems of low efficiency, large amount of advertisements and false information searched and the like, the searched information is more accurate, and the efficiency of the target information is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method for searching and matching information, the method comprising:
acquiring a source file of a company homepage through a web crawler;
analyzing the source file and extracting recruitment information in the source file;
screening out post information with the same post characteristics from the recruitment information according to a preset clustering rule to obtain aggregated post information;
storing the post characteristics corresponding to the association of the aggregation post information into a search engine database;
and after a search instruction of a search engine is received, matching the post characteristics and the corresponding target post information from the search engine database according to the search characteristics contained in the search instruction.
2. The method for searching and matching information according to claim 1, wherein the obtaining of the source file of the homepage of the company through the web crawler comprises:
acquiring first website information of a company homepage;
acquiring a first source file of a company homepage corresponding to the first website information through a web crawler;
extracting second website information of a recruitment homepage of the company homepage from the first source file;
and acquiring a second source file of the recruitment homepage corresponding to the second website information through a web crawler.
3. The method according to claim 2, wherein the obtaining a first source file of a company homepage corresponding to the first web address information by a web crawler includes:
acquiring login information of the first website information;
storing the login information to a user identity file of a browser;
and acquiring a first source file of a company homepage corresponding to the first website information on the browser through a web crawler based on the user identity file.
4. The method for searching and matching information according to claim 1, wherein before the obtaining of the source file of the company homepage by the web crawler, the method further comprises:
acquiring a web crawler protocol of the homepage of the company;
analyzing the web crawler protocol to obtain web crawler types forbidden by the company homepage;
and determining an available web crawler according to the web crawler type so as to acquire a source file of a company homepage based on the available web crawler.
5. The method for searching and matching information according to claim 1, wherein the obtaining of the source file of the homepage of the company through the web crawler comprises:
acquiring a pre-configured time node;
configuring a time interval of the web crawler according to the time node;
and controlling the web crawler to acquire a source file of a company homepage based on the time interval.
6. The information search matching method according to claim 1, wherein the parsing the source file and extracting recruitment information in the source file comprises:
analyzing the source file based on a preset element selector, and extracting a plurality of element information of the source file;
and analyzing the element information based on a preset regular expression, and extracting the recruitment information in the element information.
7. The method for searching and matching information according to claim 1, wherein after matching the position characteristics and the corresponding target position information from the search engine database according to the search characteristics included in the search instruction, the method comprises:
acquiring the search characteristics of the search instruction;
acquiring the search utilization rate of the search features;
configuring the weight of the search features according to the search utilization rate;
and arranging the target post information obtained by matching according to the weight to obtain the ordered target post information.
8. An apparatus for search matching of information, the apparatus comprising:
the file acquisition module is used for acquiring a source file of a company homepage through a web crawler;
the file analyzing module is used for analyzing the source file and extracting the recruitment information in the source file;
the information screening module is used for screening out the post information with the same post characteristics from the recruitment information according to a preset clustering rule to obtain the aggregated post information;
the information storage module is used for storing the post characteristics corresponding to the aggregation post information association into a search engine database;
and the information matching module is used for matching the post characteristics and the corresponding target post information from the search engine database according to the search characteristics contained in the search instruction after receiving the search instruction of the search engine.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements a method of search matching of information according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of search matching of information according to any one of claims 1 to 7.
CN202210285456.1A 2022-03-22 2022-03-22 Information search matching method and device, computer equipment and storage medium Pending CN114610973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210285456.1A CN114610973A (en) 2022-03-22 2022-03-22 Information search matching method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210285456.1A CN114610973A (en) 2022-03-22 2022-03-22 Information search matching method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114610973A true CN114610973A (en) 2022-06-10

Family

ID=81864035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210285456.1A Pending CN114610973A (en) 2022-03-22 2022-03-22 Information search matching method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114610973A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102536000B1 (en) * 2022-07-08 2023-05-26 최명재 Method and device for providing service for recruiting based on commerce

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102536000B1 (en) * 2022-07-08 2023-05-26 최명재 Method and device for providing service for recruiting based on commerce

Similar Documents

Publication Publication Date Title
CN110321408B (en) Searching method and device based on knowledge graph, computer equipment and storage medium
CN111310427A (en) Service data configuration processing method and device, computer equipment and storage medium
US20050021997A1 (en) Guaranteeing hypertext link integrity
US20110231386A1 (en) Indexing and searching employing virtual documents
US20050216845A1 (en) Utilizing cookies by a search engine robot for document retrieval
KR20180074774A (en) How to identify malicious websites, devices and computer storage media
CN111460254B (en) Webpage crawling method and device based on multithreading, storage medium and equipment
RU2339078C2 (en) Designation of web-pages for identification of geographical positions
CN111431767B (en) Multi-browser resource synchronization method and device, computer equipment and storage medium
CN111090797A (en) Data acquisition method and device, computer equipment and storage medium
CN108154024B (en) Data retrieval method and device and electronic equipment
CN111083054B (en) Route configuration processing method and device, computer equipment and storage medium
CN111597422A (en) Buried point mapping method and device, computer equipment and storage medium
CN114610973A (en) Information search matching method and device, computer equipment and storage medium
CN111209325A (en) Service system interface identification method, device and storage medium
CN102937977A (en) Search server and search method
CN112115328B (en) Page flow map construction method and device and computer readable storage medium
CN115687810A (en) Webpage searching method and device and related equipment
CN110955855A (en) Information interception method, device and terminal
CN108460116B (en) Search method, search device, computer equipment, storage medium and search system
KR102214990B1 (en) System for providing bookmark management and information searching service and method for providing bookmark management and information searching service using it
CN112559671B (en) ES-based text search engine construction method, device, equipment and medium
US9098174B1 (en) Expanding the functionality of the browser URL box
CN115145674A (en) Page jump method, device, equipment and medium based on dynamic anchor point
CN110825976B (en) Website page detection method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination