WO2023022649A1 - A system for searching company information and a method thereof - Google Patents
A system for searching company information and a method thereof Download PDFInfo
- Publication number
- WO2023022649A1 WO2023022649A1 PCT/SG2021/050489 SG2021050489W WO2023022649A1 WO 2023022649 A1 WO2023022649 A1 WO 2023022649A1 SG 2021050489 W SG2021050489 W SG 2021050489W WO 2023022649 A1 WO2023022649 A1 WO 2023022649A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- company
- information
- categories
- search query
- project
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000004891 communication Methods 0.000 claims abstract description 6
- 238000007790 scraping Methods 0.000 claims description 25
- 238000010276 construction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/40—Data acquisition and logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
Definitions
- the present invention relates to a system for searching company information of a company. Further, the present invention relates to a method for searching company information.
- a user who wishes to find information of a company may be able to find basic information, e.g. company address, history, contact details, of the company on the internet, e.g. at the company website. While the user may find the basic information, he or she is unable to find detailed information of the company, e.g. capital, company projects, etc. Although the user may be able to search a number of websites to obtain some of the detailed information, it is time consuming and inefficient, and the user may not be able to gather all the information required.
- basic information e.g. company address, history, contact details
- the user may find the basic information, he or she is unable to find detailed information of the company, e.g. capital, company projects, etc.
- the user may be able to search a number of websites to obtain some of the detailed information, it is time consuming and inefficient, and the user may not be able to gather all the information required.
- a system for searching company information of a company includes a processor, a memory in communication with the processor for storing instructions executable by the processor, such that the processor is configured to scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories, receive a search query to search the company information, and retrieve the company information based on the search query.
- the search query may include at least one of the plurality of categories.
- the processor may further be configured to scrape the at least one website for project information using the company information, generate a plurality of project categories based on the project information, categorise the project information into the plurality of project categories, merge and/or link the project information to the company information, and retrieve the project information based on the search query.
- the search query may include at least one of the plurality of project categories.
- the processor may further be configured to scrape the at least one website for product information using the company information, generate a plurality of product categories based on the product information, categorise the product information into the plurality of product categories, merge and/or link the product information to the company information, and retrieve the product information based on the search query.
- the search query may include at least one of the plurality of product categories.
- the processor may be configured to execute multithread scraping to scrape the at least one website.
- the processor may be configured to scrape the at least one website for tender project information using the company information, merge and/or link the tender project information to the company information, and retrieve the tender project information based on the search query.
- the processor may be configured to scrape the at least one website for service information of services provided by the company using the company information, merge and/or link the service information to the company information, and retrieve the service information based on the search query.
- the present invention relates to a method for searching company information of a company.
- the method includes scraping at least one website relevant to a company to extract company information, generating a company identifier of the company, identifying a company record of the company in at least one entity database based on the company identifier, extracting additional company information of the company from the company record, merging the additional company information with the company information, generating a plurality of categories from the at least one website, categorising the company information into the plurality of categories, receiving a search query to search the company information, and retrieving the company information based on the search query.
- the search query may include at least one of the plurality of categories.
- the method may further include scraping the at least one website for project information using the company information, generating a plurality of project categories based on the project information, categorising the project information into the plurality of project categories, merging and/or linking the project information to the company information, and retrieving the project information based on the search query.
- the search query may include at least one of the plurality of project categories.
- the method may further include scraping the at least one website for product information using the company information, generating a plurality of product categories based on the product information, categorising the product information into the plurality of product categories, merging and/or linking the product information to the company information, and retrieving the product information based on the search query.
- the search query may include at least one of the plurality of product categories.
- scraping the at least one website may include executing multithread scraping to scrape the at least one website.
- the method may further include scraping the at least one website for tender project information using the company information, merging and/or linking the tender project information to the company information, and retrieving the tender project information based on the search query.
- the method may further include scraping the at least one website for service information of services provided by the company using the company information, merging and/or linking the service information to the company information, and retrieving the service information based on the search query.
- Fig. 1 shows an exemplary embodiment of a system for searching company information of a company.
- Fig. 2 shows a flow diagram of a method for searching company information of a company.
- Fig. 1 shows an exemplary embodiment of a system 100 for searching company information of a company.
- System 100 may include a server.
- System 100 includes a processor 110, a memory 120 in communication with the processor 110 for storing instruction executable by the processor 110, such that the processor 110 is configured to scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories, receive a search query to search the company information and retrieve the company information based on the search query.
- System 100 may include an I/O interface 130 configured to provide an interface between the processor 110 and peripheral interface modules, e.g. keyboard, mouse, touchscreen, etc.
- System 100 may include a communication module 150 configured to facilitate communication, wired or wirelessly, between the system 100 and other user devices 160, e.g. mobile devices, laptops, via the internet.
- System 100 may include a system database 140 configured to store input data, e.g. company information, received from the processor 110.
- System 100 may include a scraping module 122 configured to scrape one or more websites to extract data, e.g. company information, relevant to a company.
- the system 100 may be configured to scrape the one or more websites to extract company information, e.g. company name, email address, identification number, etc., of the company using a crawler.
- Websites may include the company website, authority website, government website, etc.
- System 100 may be configured to generate a company identifier or company ID based on the company information.
- the system 100 may receive the company identifier from a system administrator or identify the company identifier selected by the system administrator via the peripheral interface modules.
- Company identifier may include the company name, company identification number, phone number, etc.
- System 100 may be configured to generate and/or store one or more company identifiers.
- System 100 may be configured to access at least one entity database that stores company records of a plurality of companies, e.g. an entity database maintained by an authority, e.g. Building and Construction Authority, to extract additional company information.
- System 100 may access the entity database via the website of the authority.
- the system 100 may identify the company record of the company and extract the additional company information in the company record and store the additional company information in the system database 140.
- System 100 may include a merging module 124 configured to merge the additional company information with the company information in the system database 140. Merging module 124 may be used to merge any extracted information to the company information.
- System 100 may be configured to link the additional company information with the company information via the company identifier. By scraping the plurality of websites for company information and extracting additional company information based on one or more company identifiers, it is possible to extract substantial company information of the company.
- System 100 may be configured to generate a plurality of the categories based on the categories in the at least one website. When the system 100 is scraping the at least one website, the system 100 is configured to identify the plurality of categories in each website. Plurality of categories may be from one or more than one website. Based on the categories identified, the system 100 may replicate one or more of the categories to be used to store the company information in the system database 140. Alternatively, the system 100 may receive a plurality of categories input by the system administrator via the peripheral interface modules.
- System 100 may include a categorising module 126 configured to categorise the company information into the plurality of categories.
- System 100 may be configured to receive a search query from a user to search the company information. When the search query is received, the system 100 may be configured to retrieve the company information from the system database 140 based on the search query.
- Search query may include at least one of search terms, keywords and at least one of the plurality of categories.
- System 100 may display the plurality of categories on the display of the user device 160 for the user to choose. User may enter the search query into the user device 160 to be transmitted to the system 100.
- Fig. 2 shows a flow diagram of a method 1000 for searching company information of a company.
- the method includes scraping at least one website relevant to a company to extract company information in block 1020, generating a company identifier of the company in block 1040, identifying a company record of the company in at least one entity database based on the company identifier in block 1060, extracting additional company information of the company from the company record in block 1080, merging the additional company information with the company information in block 1100, generating a plurality of categories from the at least one website in block 1120, categorising the company information into the plurality of categories in block 1140, receiving a search query to search the company information in block 1160, and retrieving the company information based on the search query in block 1180.
- Fig. 2 shows a flow diagram of a method 1000 for searching company information of a company, the method includes scraping at least one website relevant to a company to extract company information in block 1020, generating a company identifier of the company in block 1040, identifying a company record of the company in at least one entity database based on the company identifier in block 1060, extracting additional company information of the company from the company record in block 1080, merging the additional company information with the company information in block 1100, generating a plurality of categories from the at least one website in block 1120, categorising the company information into the plurality of categories in block 1140, receiving a search query to search the company information in block 1160, and retrieving the company information based on the search query in block 1180.
- System 100 may include an extraction module configured to search the plurality of websites to extract missing data not extracted during the scraping of the websites. Missing data may include logo, description of company, banner image, social media accounts, etc. Extraction module may be configured to identify the missing data, search the plurality of websites and extract the missing data from the at least one website. Alternatively, the system 100 may receive the missing data via input from the system administrator.
- System 100 may be configured to scrape for goods and services information relevant to the company using the company information. Besides company information, it is advantageous to incorporate information of the goods and services provided by the company so that such information may be readily available to the user for searching. While such information is available on the internet, the information may not be easily found or easily associated or linked to the company. Hence, the system 100 is able to scrape such information using the company information extracted and merge or link such information to the company information. Further, by categorising such information, the system 100 is able to efficiently, accurately and quickly identify the relevant information based on the plurality of categories chosen by the user. Thus, providing the user the search results that is required. Some of the goods and services information are provided in the following examples.
- System 100 may be configured to scrape the at least one website for project information using the company information, generate a plurality of project categories based on the project information, categorise the project information into the plurality of project categories, merge and/or link the project information to the company information, and retrieve the project information based on the search query.
- the goods of a construction company may have completed projects and information of the projects may include length of projects, costs of projects, name of subcontractors, etc.
- the system 100 may retrieve the plurality of project categories from the at least one website or generate the plurality of project categories based on the categories found on the at least one website.
- the system 100 may receive the plurality of project categories from the system administrator.
- System 100 may then categorise the project information according to the plurality of categories before merging the project information with the company information or link the project information to the company information via the company identifier or both.
- the user may search based on at least one of the plurality of project categories.
- the search query may include at least one of the plurality of project categories.
- System 100 may be configured to scrape the at least one website for product information, e.g. taps, using the company information.
- System 100 may be configured to retrieve the plurality of product categories from the at least one website or generate a plurality of product categories based on the product information.
- the system 100 may receive the plurality of product categories from the system administrator.
- the system 100 may categorise the product information into the plurality of product categories.
- System 100 may categorise the product information based on the plurality of categories.
- System 100 may be configured to merge with the company information or link the product information to the company information via the company identifier or both.
- the system 100 receives the search query of the user, the system 100 is able to retrieve the product information based on the search query.
- the search query may include at least one of the plurality of product categories chosen by the user.
- the company may have potential projects in the pipeline.
- System 100 may be configured to scrape the at least one website for tender project information using the company information.
- System 100 may be configured to merge the tender projection information to the company information or link the tender project information to the company information via the company identifier or both.
- the system 100 may be configured to retrieve the tender project information based on the search query.
- Company may also provide services apart from the products.
- System 100 may be configured to scrape the at least one website for service information of services provided by the company using the company information.
- System 100 may be configured to merge the service information with the company information or link the service information to the company information via the company identifier or both.
- the system 100 may be configured to retrieve the service information based on the search query.
- Company information may include grading of the company assigned by the authority. The user may be presented the grading when searching for a company or the company information so that the user is able to assess the suitability of the company based on the grading against the user’ s requirement.
- System 100 may be configured to execute multithread scraping to scrape the at least one website.
- Processor 110 of the system 100 may be configured to execute the multithread scraping.
- multithread scraping which is a parallel programming concepts, the number of websites that can be scraped by the system 100 increases for each unit of time spent. Conversely, the time required to scrape the plurality of websites may be significantly reduced.
- Merging module 124 may be configured to index the company information and another information so as to improve searchability of the information for the merging the company information and the another information, e.g. product information.
- Merging module 124 may be configured to select one or more fields or receive one or more fields from the system administrator for indexing the company information and the another information.
- Merging module 124 may be configured to merge the company information with another information using multithreading.
- Memory 120 may be tuned to optimise the merging performance of the merging module 124.
- System 100 allows the user to comprehensively search for company information of a company.
- a search query e.g. search terms and keywords
- the user device 160 may communicate with the system 100 via the internet to transmit the search query to the system 100.
- the user device 160 may receive a plurality of categories, including project categories, product categories, from the system 100 and display the plurality of categories on the user device 160. User may choose one of more of the plurality of categories to be included in the search query.
- the system 100 may access the system database 140 to retrieve the company information, which may include information like the project information, tender information, product information and/or service, based on the search query and transmit the search result to the user device 160.
- the search result transmitted to the user provides the user the detailed information that the user requires quickly and efficiently.
- the present invention relates to a system and method for searching company information generally as herein described, with reference to and/or illustrated in the accompanying drawings.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A system for searching company information of a company is provided. The system includes a processor, a memory in communication with the processor for storing instructions executable by the processor, such that the processor is configured to scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories, receive a search query to search the company information, and retrieve the company information based on the search query. A method of the system is provided.
Description
A System For Searching Company Information And A Method Thereof
Technical Field
[0001] The present invention relates to a system for searching company information of a company. Further, the present invention relates to a method for searching company information.
Background
[0002] Typically, a user who wishes to find information of a company, e.g. a construction company, real estate companies, may be able to find basic information, e.g. company address, history, contact details, of the company on the internet, e.g. at the company website. While the user may find the basic information, he or she is unable to find detailed information of the company, e.g. capital, company projects, etc. Although the user may be able to search a number of websites to obtain some of the detailed information, it is time consuming and inefficient, and the user may not be able to gather all the information required.
[0003] Therefore, it is beneficial and advantageous to have a system that is able to provide a fast and efficient search system and method to obtain the detailed information based on the user’s search request.
Summary
[0004] According to various embodiments, a system for searching company information of a company is provided. The system includes a processor, a memory in communication with the processor for storing instructions executable by the processor, such that the processor is configured to scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories,
receive a search query to search the company information, and retrieve the company information based on the search query.
[0005] According to various embodiments, the search query may include at least one of the plurality of categories.
[0006] According to various embodiments, the processor may further be configured to scrape the at least one website for project information using the company information, generate a plurality of project categories based on the project information, categorise the project information into the plurality of project categories, merge and/or link the project information to the company information, and retrieve the project information based on the search query.
[0007] According to various embodiments, the search query may include at least one of the plurality of project categories.
[0008] According to various embodiments, the processor may further be configured to scrape the at least one website for product information using the company information, generate a plurality of product categories based on the product information, categorise the product information into the plurality of product categories, merge and/or link the product information to the company information, and retrieve the product information based on the search query.
[0009] According to various embodiments, the search query may include at least one of the plurality of product categories.
[0010] According to various embodiments, the processor may be configured to execute multithread scraping to scrape the at least one website.
[0011] According to various embodiments, the processor may be configured to scrape the at least one website for tender project information using the company information, merge and/or link the tender project information to the company information, and retrieve the tender project information based on the search query.
[0012] According to various embodiments, the processor may be configured to scrape the at least one website for service information of services provided by the company using the company information, merge and/or link the service information to the company information, and retrieve the service information based on the search query.
[0013] The present invention relates to a method for searching company information of a company. The method includes scraping at least one website relevant to a company to extract company information, generating a company identifier of the company, identifying a company record of the company in at least one entity database based on the company identifier, extracting additional company information of the company from the company record, merging the additional company information with the company information, generating a plurality of categories from the at least one website, categorising the company information into the plurality of categories, receiving a search query to search the company information, and retrieving the company information based on the search query.
[0014] According to various embodiments, the search query may include at least one of the plurality of categories.
[0015] According to various embodiments, the method may further include scraping the at least one website for project information using the company information, generating a plurality of project categories based on the project information, categorising the project information into the plurality of project categories, merging and/or linking the project information to the company information, and retrieving the project information based on the search query.
[0016] According to various embodiments, the search query may include at least one of the plurality of project categories.
[0017] According to various embodiments, the method may further include scraping the at least one website for product information using the company information, generating a plurality of product categories based on the product information, categorising the product information into the plurality of product categories, merging and/or linking the product
information to the company information, and retrieving the product information based on the search query.
[0018] According to various embodiments, the search query may include at least one of the plurality of product categories.
[0019] According to various embodiments, scraping the at least one website may include executing multithread scraping to scrape the at least one website.
[0020] According to various embodiments, the method may further include scraping the at least one website for tender project information using the company information, merging and/or linking the tender project information to the company information, and retrieving the tender project information based on the search query.
[0021] According to various embodiments, the method may further include scraping the at least one website for service information of services provided by the company using the company information, merging and/or linking the service information to the company information, and retrieving the service information based on the search query.
Brief Description of Drawings
[0022] Fig. 1 shows an exemplary embodiment of a system for searching company information of a company.
[0023] Fig. 2 shows a flow diagram of a method for searching company information of a company.
Detailed Description
[0024] Fig. 1 shows an exemplary embodiment of a system 100 for searching company information of a company. System 100 may include a server. System 100 includes a processor 110, a memory 120 in communication with the processor 110 for storing instruction executable by the processor 110, such that the processor 110 is configured to
scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories, receive a search query to search the company information and retrieve the company information based on the search query. System 100 may include an I/O interface 130 configured to provide an interface between the processor 110 and peripheral interface modules, e.g. keyboard, mouse, touchscreen, etc. System 100 may include a communication module 150 configured to facilitate communication, wired or wirelessly, between the system 100 and other user devices 160, e.g. mobile devices, laptops, via the internet. System 100 may include a system database 140 configured to store input data, e.g. company information, received from the processor 110.
[0025] System 100 may include a scraping module 122 configured to scrape one or more websites to extract data, e.g. company information, relevant to a company. For example, the system 100 may be configured to scrape the one or more websites to extract company information, e.g. company name, email address, identification number, etc., of the company using a crawler. Websites may include the company website, authority website, government website, etc. System 100 may be configured to generate a company identifier or company ID based on the company information. Alternatively, the system 100 may receive the company identifier from a system administrator or identify the company identifier selected by the system administrator via the peripheral interface modules. Company identifier may include the company name, company identification number, phone number, etc. System 100 may be configured to generate and/or store one or more company identifiers. System 100 may be configured to access at least one entity database that stores company records of a plurality of companies, e.g. an entity database maintained by an authority, e.g. Building and Construction Authority, to extract additional company information. System 100 may access the entity database via the website of the authority. Based on the company identifier, the system 100 may identify the company record of the company and extract the additional company information in the company record and store the additional company information in the system database 140. System 100 may include a merging module 124 configured to merge
the additional company information with the company information in the system database 140. Merging module 124 may be used to merge any extracted information to the company information. System 100 may be configured to link the additional company information with the company information via the company identifier. By scraping the plurality of websites for company information and extracting additional company information based on one or more company identifiers, it is possible to extract substantial company information of the company. System 100 may be configured to generate a plurality of the categories based on the categories in the at least one website. When the system 100 is scraping the at least one website, the system 100 is configured to identify the plurality of categories in each website. Plurality of categories may be from one or more than one website. Based on the categories identified, the system 100 may replicate one or more of the categories to be used to store the company information in the system database 140. Alternatively, the system 100 may receive a plurality of categories input by the system administrator via the peripheral interface modules. System 100 may include a categorising module 126 configured to categorise the company information into the plurality of categories. System 100 may be configured to receive a search query from a user to search the company information. When the search query is received, the system 100 may be configured to retrieve the company information from the system database 140 based on the search query. Search query may include at least one of search terms, keywords and at least one of the plurality of categories. System 100 may display the plurality of categories on the display of the user device 160 for the user to choose. User may enter the search query into the user device 160 to be transmitted to the system 100.
[0026] Fig. 2 shows a flow diagram of a method 1000 for searching company information of a company. The method includes scraping at least one website relevant to a company to extract company information in block 1020, generating a company identifier of the company in block 1040, identifying a company record of the company in at least one entity database based on the company identifier in block 1060, extracting additional company information of the company from the company record in block 1080, merging the additional company information with the company information in block 1100, generating a plurality of categories from the at least one website in block 1120, categorising the company information into the plurality of categories in block 1140, receiving a search query to search the company information in block 1160, and retrieving the company information based on the search query in block 1180.
[0027] Fig. 2 shows a flow diagram of a method 1000 for searching company information of a company, the method includes scraping at least one website relevant to a company to extract company information in block 1020, generating a company identifier of the company in block 1040, identifying a company record of the company in at least one entity database based on the company identifier in block 1060, extracting additional company information of the company from the company record in block 1080, merging the additional company information with the company information in block 1100, generating a plurality of categories from the at least one website in block 1120, categorising the company information into the plurality of categories in block 1140, receiving a search query to search the company information in block 1160, and retrieving the company information based on the search query in block 1180.
[0028] System 100 may include an extraction module configured to search the plurality of websites to extract missing data not extracted during the scraping of the websites. Missing data may include logo, description of company, banner image, social media accounts, etc. Extraction module may be configured to identify the missing data, search the plurality of websites and extract the missing data from the at least one website. Alternatively, the system 100 may receive the missing data via input from the system administrator.
[0029] System 100 may be configured to scrape for goods and services information relevant to the company using the company information. Besides company information, it is advantageous to incorporate information of the goods and services provided by the company so that such information may be readily available to the user for searching. While such information is available on the internet, the information may not be easily found or easily associated or linked to the company. Hence, the system 100 is able to scrape such information using the company information extracted and merge or link such information to the company information. Further, by categorising such information, the system 100 is able to efficiently, accurately and quickly identify the relevant information based on the plurality of categories chosen by the user. Thus, providing the user the search results that is required. Some of the goods and services information are provided in the following examples.
[0030] A company may have projects and project information may be obtained on such projects. System 100 may be configured to scrape the at least one website for project information using the company information, generate a plurality of project categories based on the project information, categorise the project information into the plurality of project categories, merge and/or link the project information to the company information, and retrieve the project information based on the search query.
[0031] For example, the goods of a construction company may have completed projects and information of the projects may include length of projects, costs of projects, name of subcontractors, etc. Once the system 100 has scraped the project information, the system 100 may retrieve the plurality of project categories from the at least one website or generate the plurality of project categories based on the categories found on the at least one website. Alternatively, the system 100 may receive the plurality of project categories from the system administrator. System 100 may then categorise the project information according to the plurality of categories before merging the project information with the company information or link the project information to the company information via the company identifier or both. When searching for such information, the user may search based on at least one of the plurality of project categories. Hence, the search query may include at least one of the plurality of project categories.
[0032] In another example, a company that supplies toilet fittings may have participated in the project of the construction company. System 100 may be configured to scrape the at least one website for product information, e.g. taps, using the company information. System 100 may be configured to retrieve the plurality of product categories from the at least one website or generate a plurality of product categories based on the product information. Alternatively, the system 100 may receive the plurality of product categories from the system administrator. Based on the at least one website, the system 100 may categorise the product information into the plurality of product categories. System 100 may categorise the product information based on the plurality of categories. System 100 may be configured to merge with the company information or link the product information to the company information via the company identifier or both. When the system 100 receives the search query of the user, the system 100 is able to retrieve the product information based on the search query. As mentioned above,
the search query may include at least one of the plurality of product categories chosen by the user.
[0033] Besides completed projects, the company may have potential projects in the pipeline. For example, in a construction company, there may be tender projects that have been tendered by the company. System 100 may be configured to scrape the at least one website for tender project information using the company information. System 100 may be configured to merge the tender projection information to the company information or link the tender project information to the company information via the company identifier or both. Upon receipt of the search query from the user, the system 100 may be configured to retrieve the tender project information based on the search query.
[0034] Company may also provide services apart from the products. System 100 may be configured to scrape the at least one website for service information of services provided by the company using the company information. System 100 may be configured to merge the service information with the company information or link the service information to the company information via the company identifier or both. Upon receipt of the search query, the system 100 may be configured to retrieve the service information based on the search query.
[0035] Company information may include grading of the company assigned by the authority. The user may be presented the grading when searching for a company or the company information so that the user is able to assess the suitability of the company based on the grading against the user’ s requirement.
[0036] System 100 may be configured to execute multithread scraping to scrape the at least one website. Processor 110 of the system 100 may be configured to execute the multithread scraping. By using multithread scraping, which is a parallel programming concepts, the number of websites that can be scraped by the system 100 increases for each unit of time spent. Conversely, the time required to scrape the plurality of websites may be significantly reduced.
[0037] Merging module 124 may be configured to index the company information and another information so as to improve searchability of the information for the merging the company information and the another information, e.g. product information. Merging module 124 may be configured to select one or more fields or receive one or more fields from the system administrator for indexing the company information and the another information. Merging module 124 may be configured to merge the company information with another information using multithreading. Memory 120 may be tuned to optimise the merging performance of the merging module 124.
[0038] System 100 allows the user to comprehensively search for company information of a company. When the user enters a search query, e.g. search terms and keywords, into the user device 160, the user device 160 may communicate with the system 100 via the internet to transmit the search query to the system 100. As mentioned, the user device 160 may receive a plurality of categories, including project categories, product categories, from the system 100 and display the plurality of categories on the user device 160. User may choose one of more of the plurality of categories to be included in the search query. Upon receipt of the search query, the system 100 may access the system database 140 to retrieve the company information, which may include information like the project information, tender information, product information and/or service, based on the search query and transmit the search result to the user device 160. The search result transmitted to the user provides the user the detailed information that the user requires quickly and efficiently.
[0039] The present invention relates to a system and method for searching company information generally as herein described, with reference to and/or illustrated in the accompanying drawings.
Claims
1. A system for searching company information of a company, the system comprising: a processor, a memory in communication with the processor for storing instructions executable by the processor, wherein the processor is configured to: scrape at least one website relevant to a company to extract company information; generate a company identifier of the company; identify a company record of the company in at least one entity database based on the company identifier; extract additional company information of the company from the company record; merge the additional company information with the company information; generate a plurality of categories from the at least one website; categorise the company information into the plurality of categories; receive a search query to search the company information; and retrieve the company information based on the search query.
2. A system according to claim 1, wherein the search query comprises at least one of the plurality of categories.
3. A system according to claim 1 or 2, wherein the processor is further configured to: scrape the at least one website for project information using the company information; generate a plurality of project categories based on the project information; categorise the project information into the plurality of project categories; merge and/or link the project information to the company information; and retrieve the project information based on the search query.
4. A system according to claim 3, wherein the search query comprises at least one of the plurality of project categories.
5. A system according to any one of claims 1 to 4, wherein the processor is further configured to: scrape the at least one website for product information using the company information; generate a plurality of product categories based on the product information; categorise the product information into the plurality of product categories; merge and/or link the product information to the company information; and retrieve the product information based on the search query.
6. A system according to claim 5, wherein the search query comprises at least one of the plurality of product categories.
7. A system according to any one of claims 1 to 6, wherein the processor is configured to execute multithread scraping to scrape the at least one website.
8. A system according to any one of claims 1 to 7, wherein the processor is configured to: scrape the at least one website for tender project information using the company information; merge and/or link the tender project information to the company information; and retrieve the tender project information based on the search query.
9. A system according to any one of claims 1 to 8, wherein the processor is configured to: scrape the at least one website for service information of services provided by the company using the company information; merge and/or link the service information to the company information; and retrieve the service information based on the search query.
10. A method for searching company information of a company, the method comprising scraping at least one website relevant to a company to extract company information; generating a company identifier of the company; identifying a company record of the company in at least one entity database based on the company identifier; extracting additional company information of the company from the company record; merging the additional company information with the company information; generating a plurality of categories from the at least one website; categorising the company information into the plurality of categories; receiving a search query to search the company information; and retrieving the company information based on the search query.
11. A method according to claim 10, wherein the search query comprises at least one of the plurality of categories
12. A method according to claim 10 or 11, further comprising: scraping the at least one website for project information using the company information; generating a plurality of project categories based on the project information; categorising the project information into the plurality of project categories; merging and/or linking the project information to the company information; and retrieving the project information based on the search query.
13. A method according to claim 12, wherein the search query comprises at least one of the plurality of project categories.
14. A method according to any one of claims 10 to 13, further comprising: scraping the at least one website for product information using the company information; generating a plurality of product categories based on the product information;
categorising the product information into the plurality of product categories; merging and/or linking the product information to the company information; and retrieving the product information based on the search query.
15. A method according to claim 14, wherein the search query comprises at least one of the plurality of product categories.
16. A method according to any one of claims 10 to 15, further comprising scraping the at least one website comprises executing multithread scraping to scrape the at least one website.
17. A method according to any one of claims 10 to 16, further comprising: scraping the at least one website for tender project information using the company information; merging and/or linking the tender project information to the company information; and retrieving the tender project information based on the search query.
18. A method according to any one of claims 10 to 17, further comprising: scraping the at least one website for service information of services provided by the company using the company information; merging and/or linking the service information to the company information; and retrieving the service information based on the search query.
19. A non-transitory computer readable storage medium comprising instructions, wherein the instructions, when executed by a processor in a terminal device, cause the terminal device to: scrape at least one website relevant to a company to extract company information; generate a company identifier of the company; identify a company record of the company in at least one entity database based on the company identifier;
extract additional company information of the company from the company record; merge the additional company information with the company information; generate a plurality of categories from the at least one website; categorise the company information into the plurality of categories; receive a search query to search the company information; and retrieve the company information based on the search query.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/SG2021/050489 WO2023022649A1 (en) | 2021-08-20 | 2021-08-20 | A system for searching company information and a method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/SG2021/050489 WO2023022649A1 (en) | 2021-08-20 | 2021-08-20 | A system for searching company information and a method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023022649A1 true WO2023022649A1 (en) | 2023-02-23 |
Family
ID=85240900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2021/050489 WO2023022649A1 (en) | 2021-08-20 | 2021-08-20 | A system for searching company information and a method thereof |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023022649A1 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050120006A1 (en) * | 2003-05-30 | 2005-06-02 | Geosign Corporation | Systems and methods for enhancing web-based searching |
US20090204569A1 (en) * | 2008-02-11 | 2009-08-13 | International Business Machines Corporation | Method and system for identifying companies with specific business objectives |
US7949646B1 (en) * | 2005-12-23 | 2011-05-24 | At&T Intellectual Property Ii, L.P. | Method and apparatus for building sales tools by mining data from websites |
US20120265610A1 (en) * | 2011-01-31 | 2012-10-18 | Yaacov Shama | Techniques for Generating Business Leads |
US20200242170A1 (en) * | 2019-01-29 | 2020-07-30 | Salesforce.Com, Inc. | Method and system for automatically enriching collected seeds with information extracted from one or more websites |
-
2021
- 2021-08-20 WO PCT/SG2021/050489 patent/WO2023022649A1/en active Search and Examination
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050120006A1 (en) * | 2003-05-30 | 2005-06-02 | Geosign Corporation | Systems and methods for enhancing web-based searching |
US7949646B1 (en) * | 2005-12-23 | 2011-05-24 | At&T Intellectual Property Ii, L.P. | Method and apparatus for building sales tools by mining data from websites |
US20090204569A1 (en) * | 2008-02-11 | 2009-08-13 | International Business Machines Corporation | Method and system for identifying companies with specific business objectives |
US20120265610A1 (en) * | 2011-01-31 | 2012-10-18 | Yaacov Shama | Techniques for Generating Business Leads |
US20200242170A1 (en) * | 2019-01-29 | 2020-07-30 | Salesforce.Com, Inc. | Method and system for automatically enriching collected seeds with information extracted from one or more websites |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9256686B2 (en) | Using a bloom filter in a web analytics application | |
US20070282867A1 (en) | Extraction and summarization of sentiment information | |
US20210026860A1 (en) | Method and device for generating ranking model | |
US9529926B2 (en) | Snapshot refreshment for search results page preview | |
US11636078B2 (en) | Personally identifiable information storage detection by searching a metadata source | |
US20220019742A1 (en) | Situational awareness by fusing multi-modal data with semantic model | |
CN110737824B (en) | Content query method and device | |
US10679230B2 (en) | Associative memory-based project management system | |
CN111078980A (en) | Management method, device, equipment and storage medium based on credit investigation big data | |
CA2793400C (en) | Associative memory-based project management system | |
US11868363B2 (en) | Method and system for persisting data | |
CN116594683A (en) | Code annotation information generation method, device, equipment and storage medium | |
CN111488386B (en) | Data query method and device | |
US20210240928A1 (en) | Mapping feedback to a process | |
WO2023022649A1 (en) | A system for searching company information and a method thereof | |
CN106528590B (en) | Query method and device | |
US11599801B2 (en) | Method for solving problem, computing system and program product | |
CN114895997A (en) | Task association method and device and electronic equipment | |
Imker et al. | A machine learning-enabled open biodata resource inventory from the scientific literature | |
KR20170044408A (en) | System and method for recommending project | |
CN114254081B (en) | Enterprise big data search system, method and electronic equipment | |
CN109710673B (en) | Work processing method, device, equipment and medium | |
CN114519090B (en) | Method and device for managing stop words and electronic equipment | |
US10824681B2 (en) | Enterprise resource textual analysis | |
JP2017072978A (en) | Knowledge information management apparatus, knowledge information management system, knowledge information management method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21954369 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
NENP | Non-entry into the national phase |
Ref country code: DE |