WO2023022649A1 - A system for searching company information and a method thereof - Google Patents

A system for searching company information and a method thereof Download PDF

Info

Publication number
WO2023022649A1
WO2023022649A1 PCT/SG2021/050489 SG2021050489W WO2023022649A1 WO 2023022649 A1 WO2023022649 A1 WO 2023022649A1 SG 2021050489 W SG2021050489 W SG 2021050489W WO 2023022649 A1 WO2023022649 A1 WO 2023022649A1
Authority
WO
WIPO (PCT)
Prior art keywords
company
information
categories
search query
project
Prior art date
Application number
PCT/SG2021/050489
Other languages
French (fr)
Inventor
Wee Kiat Webb POH
Original Assignee
Pwk Holdings Pte. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pwk Holdings Pte. Ltd. filed Critical Pwk Holdings Pte. Ltd.
Priority to PCT/SG2021/050489 priority Critical patent/WO2023022649A1/en
Publication of WO2023022649A1 publication Critical patent/WO2023022649A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/40Data acquisition and logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Definitions

  • the present invention relates to a system for searching company information of a company. Further, the present invention relates to a method for searching company information.
  • a user who wishes to find information of a company may be able to find basic information, e.g. company address, history, contact details, of the company on the internet, e.g. at the company website. While the user may find the basic information, he or she is unable to find detailed information of the company, e.g. capital, company projects, etc. Although the user may be able to search a number of websites to obtain some of the detailed information, it is time consuming and inefficient, and the user may not be able to gather all the information required.
  • basic information e.g. company address, history, contact details
  • the user may find the basic information, he or she is unable to find detailed information of the company, e.g. capital, company projects, etc.
  • the user may be able to search a number of websites to obtain some of the detailed information, it is time consuming and inefficient, and the user may not be able to gather all the information required.
  • a system for searching company information of a company includes a processor, a memory in communication with the processor for storing instructions executable by the processor, such that the processor is configured to scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories, receive a search query to search the company information, and retrieve the company information based on the search query.
  • the search query may include at least one of the plurality of categories.
  • the processor may further be configured to scrape the at least one website for project information using the company information, generate a plurality of project categories based on the project information, categorise the project information into the plurality of project categories, merge and/or link the project information to the company information, and retrieve the project information based on the search query.
  • the search query may include at least one of the plurality of project categories.
  • the processor may further be configured to scrape the at least one website for product information using the company information, generate a plurality of product categories based on the product information, categorise the product information into the plurality of product categories, merge and/or link the product information to the company information, and retrieve the product information based on the search query.
  • the search query may include at least one of the plurality of product categories.
  • the processor may be configured to execute multithread scraping to scrape the at least one website.
  • the processor may be configured to scrape the at least one website for tender project information using the company information, merge and/or link the tender project information to the company information, and retrieve the tender project information based on the search query.
  • the processor may be configured to scrape the at least one website for service information of services provided by the company using the company information, merge and/or link the service information to the company information, and retrieve the service information based on the search query.
  • the present invention relates to a method for searching company information of a company.
  • the method includes scraping at least one website relevant to a company to extract company information, generating a company identifier of the company, identifying a company record of the company in at least one entity database based on the company identifier, extracting additional company information of the company from the company record, merging the additional company information with the company information, generating a plurality of categories from the at least one website, categorising the company information into the plurality of categories, receiving a search query to search the company information, and retrieving the company information based on the search query.
  • the search query may include at least one of the plurality of categories.
  • the method may further include scraping the at least one website for project information using the company information, generating a plurality of project categories based on the project information, categorising the project information into the plurality of project categories, merging and/or linking the project information to the company information, and retrieving the project information based on the search query.
  • the search query may include at least one of the plurality of project categories.
  • the method may further include scraping the at least one website for product information using the company information, generating a plurality of product categories based on the product information, categorising the product information into the plurality of product categories, merging and/or linking the product information to the company information, and retrieving the product information based on the search query.
  • the search query may include at least one of the plurality of product categories.
  • scraping the at least one website may include executing multithread scraping to scrape the at least one website.
  • the method may further include scraping the at least one website for tender project information using the company information, merging and/or linking the tender project information to the company information, and retrieving the tender project information based on the search query.
  • the method may further include scraping the at least one website for service information of services provided by the company using the company information, merging and/or linking the service information to the company information, and retrieving the service information based on the search query.
  • Fig. 1 shows an exemplary embodiment of a system for searching company information of a company.
  • Fig. 2 shows a flow diagram of a method for searching company information of a company.
  • Fig. 1 shows an exemplary embodiment of a system 100 for searching company information of a company.
  • System 100 may include a server.
  • System 100 includes a processor 110, a memory 120 in communication with the processor 110 for storing instruction executable by the processor 110, such that the processor 110 is configured to scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories, receive a search query to search the company information and retrieve the company information based on the search query.
  • System 100 may include an I/O interface 130 configured to provide an interface between the processor 110 and peripheral interface modules, e.g. keyboard, mouse, touchscreen, etc.
  • System 100 may include a communication module 150 configured to facilitate communication, wired or wirelessly, between the system 100 and other user devices 160, e.g. mobile devices, laptops, via the internet.
  • System 100 may include a system database 140 configured to store input data, e.g. company information, received from the processor 110.
  • System 100 may include a scraping module 122 configured to scrape one or more websites to extract data, e.g. company information, relevant to a company.
  • the system 100 may be configured to scrape the one or more websites to extract company information, e.g. company name, email address, identification number, etc., of the company using a crawler.
  • Websites may include the company website, authority website, government website, etc.
  • System 100 may be configured to generate a company identifier or company ID based on the company information.
  • the system 100 may receive the company identifier from a system administrator or identify the company identifier selected by the system administrator via the peripheral interface modules.
  • Company identifier may include the company name, company identification number, phone number, etc.
  • System 100 may be configured to generate and/or store one or more company identifiers.
  • System 100 may be configured to access at least one entity database that stores company records of a plurality of companies, e.g. an entity database maintained by an authority, e.g. Building and Construction Authority, to extract additional company information.
  • System 100 may access the entity database via the website of the authority.
  • the system 100 may identify the company record of the company and extract the additional company information in the company record and store the additional company information in the system database 140.
  • System 100 may include a merging module 124 configured to merge the additional company information with the company information in the system database 140. Merging module 124 may be used to merge any extracted information to the company information.
  • System 100 may be configured to link the additional company information with the company information via the company identifier. By scraping the plurality of websites for company information and extracting additional company information based on one or more company identifiers, it is possible to extract substantial company information of the company.
  • System 100 may be configured to generate a plurality of the categories based on the categories in the at least one website. When the system 100 is scraping the at least one website, the system 100 is configured to identify the plurality of categories in each website. Plurality of categories may be from one or more than one website. Based on the categories identified, the system 100 may replicate one or more of the categories to be used to store the company information in the system database 140. Alternatively, the system 100 may receive a plurality of categories input by the system administrator via the peripheral interface modules.
  • System 100 may include a categorising module 126 configured to categorise the company information into the plurality of categories.
  • System 100 may be configured to receive a search query from a user to search the company information. When the search query is received, the system 100 may be configured to retrieve the company information from the system database 140 based on the search query.
  • Search query may include at least one of search terms, keywords and at least one of the plurality of categories.
  • System 100 may display the plurality of categories on the display of the user device 160 for the user to choose. User may enter the search query into the user device 160 to be transmitted to the system 100.
  • Fig. 2 shows a flow diagram of a method 1000 for searching company information of a company.
  • the method includes scraping at least one website relevant to a company to extract company information in block 1020, generating a company identifier of the company in block 1040, identifying a company record of the company in at least one entity database based on the company identifier in block 1060, extracting additional company information of the company from the company record in block 1080, merging the additional company information with the company information in block 1100, generating a plurality of categories from the at least one website in block 1120, categorising the company information into the plurality of categories in block 1140, receiving a search query to search the company information in block 1160, and retrieving the company information based on the search query in block 1180.
  • Fig. 2 shows a flow diagram of a method 1000 for searching company information of a company, the method includes scraping at least one website relevant to a company to extract company information in block 1020, generating a company identifier of the company in block 1040, identifying a company record of the company in at least one entity database based on the company identifier in block 1060, extracting additional company information of the company from the company record in block 1080, merging the additional company information with the company information in block 1100, generating a plurality of categories from the at least one website in block 1120, categorising the company information into the plurality of categories in block 1140, receiving a search query to search the company information in block 1160, and retrieving the company information based on the search query in block 1180.
  • System 100 may include an extraction module configured to search the plurality of websites to extract missing data not extracted during the scraping of the websites. Missing data may include logo, description of company, banner image, social media accounts, etc. Extraction module may be configured to identify the missing data, search the plurality of websites and extract the missing data from the at least one website. Alternatively, the system 100 may receive the missing data via input from the system administrator.
  • System 100 may be configured to scrape for goods and services information relevant to the company using the company information. Besides company information, it is advantageous to incorporate information of the goods and services provided by the company so that such information may be readily available to the user for searching. While such information is available on the internet, the information may not be easily found or easily associated or linked to the company. Hence, the system 100 is able to scrape such information using the company information extracted and merge or link such information to the company information. Further, by categorising such information, the system 100 is able to efficiently, accurately and quickly identify the relevant information based on the plurality of categories chosen by the user. Thus, providing the user the search results that is required. Some of the goods and services information are provided in the following examples.
  • System 100 may be configured to scrape the at least one website for project information using the company information, generate a plurality of project categories based on the project information, categorise the project information into the plurality of project categories, merge and/or link the project information to the company information, and retrieve the project information based on the search query.
  • the goods of a construction company may have completed projects and information of the projects may include length of projects, costs of projects, name of subcontractors, etc.
  • the system 100 may retrieve the plurality of project categories from the at least one website or generate the plurality of project categories based on the categories found on the at least one website.
  • the system 100 may receive the plurality of project categories from the system administrator.
  • System 100 may then categorise the project information according to the plurality of categories before merging the project information with the company information or link the project information to the company information via the company identifier or both.
  • the user may search based on at least one of the plurality of project categories.
  • the search query may include at least one of the plurality of project categories.
  • System 100 may be configured to scrape the at least one website for product information, e.g. taps, using the company information.
  • System 100 may be configured to retrieve the plurality of product categories from the at least one website or generate a plurality of product categories based on the product information.
  • the system 100 may receive the plurality of product categories from the system administrator.
  • the system 100 may categorise the product information into the plurality of product categories.
  • System 100 may categorise the product information based on the plurality of categories.
  • System 100 may be configured to merge with the company information or link the product information to the company information via the company identifier or both.
  • the system 100 receives the search query of the user, the system 100 is able to retrieve the product information based on the search query.
  • the search query may include at least one of the plurality of product categories chosen by the user.
  • the company may have potential projects in the pipeline.
  • System 100 may be configured to scrape the at least one website for tender project information using the company information.
  • System 100 may be configured to merge the tender projection information to the company information or link the tender project information to the company information via the company identifier or both.
  • the system 100 may be configured to retrieve the tender project information based on the search query.
  • Company may also provide services apart from the products.
  • System 100 may be configured to scrape the at least one website for service information of services provided by the company using the company information.
  • System 100 may be configured to merge the service information with the company information or link the service information to the company information via the company identifier or both.
  • the system 100 may be configured to retrieve the service information based on the search query.
  • Company information may include grading of the company assigned by the authority. The user may be presented the grading when searching for a company or the company information so that the user is able to assess the suitability of the company based on the grading against the user’ s requirement.
  • System 100 may be configured to execute multithread scraping to scrape the at least one website.
  • Processor 110 of the system 100 may be configured to execute the multithread scraping.
  • multithread scraping which is a parallel programming concepts, the number of websites that can be scraped by the system 100 increases for each unit of time spent. Conversely, the time required to scrape the plurality of websites may be significantly reduced.
  • Merging module 124 may be configured to index the company information and another information so as to improve searchability of the information for the merging the company information and the another information, e.g. product information.
  • Merging module 124 may be configured to select one or more fields or receive one or more fields from the system administrator for indexing the company information and the another information.
  • Merging module 124 may be configured to merge the company information with another information using multithreading.
  • Memory 120 may be tuned to optimise the merging performance of the merging module 124.
  • System 100 allows the user to comprehensively search for company information of a company.
  • a search query e.g. search terms and keywords
  • the user device 160 may communicate with the system 100 via the internet to transmit the search query to the system 100.
  • the user device 160 may receive a plurality of categories, including project categories, product categories, from the system 100 and display the plurality of categories on the user device 160. User may choose one of more of the plurality of categories to be included in the search query.
  • the system 100 may access the system database 140 to retrieve the company information, which may include information like the project information, tender information, product information and/or service, based on the search query and transmit the search result to the user device 160.
  • the search result transmitted to the user provides the user the detailed information that the user requires quickly and efficiently.
  • the present invention relates to a system and method for searching company information generally as herein described, with reference to and/or illustrated in the accompanying drawings.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A system for searching company information of a company is provided. The system includes a processor, a memory in communication with the processor for storing instructions executable by the processor, such that the processor is configured to scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories, receive a search query to search the company information, and retrieve the company information based on the search query. A method of the system is provided.

Description

A System For Searching Company Information And A Method Thereof
Technical Field
[0001] The present invention relates to a system for searching company information of a company. Further, the present invention relates to a method for searching company information.
Background
[0002] Typically, a user who wishes to find information of a company, e.g. a construction company, real estate companies, may be able to find basic information, e.g. company address, history, contact details, of the company on the internet, e.g. at the company website. While the user may find the basic information, he or she is unable to find detailed information of the company, e.g. capital, company projects, etc. Although the user may be able to search a number of websites to obtain some of the detailed information, it is time consuming and inefficient, and the user may not be able to gather all the information required.
[0003] Therefore, it is beneficial and advantageous to have a system that is able to provide a fast and efficient search system and method to obtain the detailed information based on the user’s search request.
Summary
[0004] According to various embodiments, a system for searching company information of a company is provided. The system includes a processor, a memory in communication with the processor for storing instructions executable by the processor, such that the processor is configured to scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories, receive a search query to search the company information, and retrieve the company information based on the search query.
[0005] According to various embodiments, the search query may include at least one of the plurality of categories.
[0006] According to various embodiments, the processor may further be configured to scrape the at least one website for project information using the company information, generate a plurality of project categories based on the project information, categorise the project information into the plurality of project categories, merge and/or link the project information to the company information, and retrieve the project information based on the search query.
[0007] According to various embodiments, the search query may include at least one of the plurality of project categories.
[0008] According to various embodiments, the processor may further be configured to scrape the at least one website for product information using the company information, generate a plurality of product categories based on the product information, categorise the product information into the plurality of product categories, merge and/or link the product information to the company information, and retrieve the product information based on the search query.
[0009] According to various embodiments, the search query may include at least one of the plurality of product categories.
[0010] According to various embodiments, the processor may be configured to execute multithread scraping to scrape the at least one website.
[0011] According to various embodiments, the processor may be configured to scrape the at least one website for tender project information using the company information, merge and/or link the tender project information to the company information, and retrieve the tender project information based on the search query. [0012] According to various embodiments, the processor may be configured to scrape the at least one website for service information of services provided by the company using the company information, merge and/or link the service information to the company information, and retrieve the service information based on the search query.
[0013] The present invention relates to a method for searching company information of a company. The method includes scraping at least one website relevant to a company to extract company information, generating a company identifier of the company, identifying a company record of the company in at least one entity database based on the company identifier, extracting additional company information of the company from the company record, merging the additional company information with the company information, generating a plurality of categories from the at least one website, categorising the company information into the plurality of categories, receiving a search query to search the company information, and retrieving the company information based on the search query.
[0014] According to various embodiments, the search query may include at least one of the plurality of categories.
[0015] According to various embodiments, the method may further include scraping the at least one website for project information using the company information, generating a plurality of project categories based on the project information, categorising the project information into the plurality of project categories, merging and/or linking the project information to the company information, and retrieving the project information based on the search query.
[0016] According to various embodiments, the search query may include at least one of the plurality of project categories.
[0017] According to various embodiments, the method may further include scraping the at least one website for product information using the company information, generating a plurality of product categories based on the product information, categorising the product information into the plurality of product categories, merging and/or linking the product information to the company information, and retrieving the product information based on the search query.
[0018] According to various embodiments, the search query may include at least one of the plurality of product categories.
[0019] According to various embodiments, scraping the at least one website may include executing multithread scraping to scrape the at least one website.
[0020] According to various embodiments, the method may further include scraping the at least one website for tender project information using the company information, merging and/or linking the tender project information to the company information, and retrieving the tender project information based on the search query.
[0021] According to various embodiments, the method may further include scraping the at least one website for service information of services provided by the company using the company information, merging and/or linking the service information to the company information, and retrieving the service information based on the search query.
Brief Description of Drawings
[0022] Fig. 1 shows an exemplary embodiment of a system for searching company information of a company.
[0023] Fig. 2 shows a flow diagram of a method for searching company information of a company.
Detailed Description
[0024] Fig. 1 shows an exemplary embodiment of a system 100 for searching company information of a company. System 100 may include a server. System 100 includes a processor 110, a memory 120 in communication with the processor 110 for storing instruction executable by the processor 110, such that the processor 110 is configured to scrape at least one website relevant to a company to extract company information, generate a company identifier of the company, identify a company record of the company in at least one entity database based on the company identifier, extract additional company information of the company from the company record, merge the additional company information with the company information, generate a plurality of categories from the at least one website, categorise the company information into the plurality of categories, receive a search query to search the company information and retrieve the company information based on the search query. System 100 may include an I/O interface 130 configured to provide an interface between the processor 110 and peripheral interface modules, e.g. keyboard, mouse, touchscreen, etc. System 100 may include a communication module 150 configured to facilitate communication, wired or wirelessly, between the system 100 and other user devices 160, e.g. mobile devices, laptops, via the internet. System 100 may include a system database 140 configured to store input data, e.g. company information, received from the processor 110.
[0025] System 100 may include a scraping module 122 configured to scrape one or more websites to extract data, e.g. company information, relevant to a company. For example, the system 100 may be configured to scrape the one or more websites to extract company information, e.g. company name, email address, identification number, etc., of the company using a crawler. Websites may include the company website, authority website, government website, etc. System 100 may be configured to generate a company identifier or company ID based on the company information. Alternatively, the system 100 may receive the company identifier from a system administrator or identify the company identifier selected by the system administrator via the peripheral interface modules. Company identifier may include the company name, company identification number, phone number, etc. System 100 may be configured to generate and/or store one or more company identifiers. System 100 may be configured to access at least one entity database that stores company records of a plurality of companies, e.g. an entity database maintained by an authority, e.g. Building and Construction Authority, to extract additional company information. System 100 may access the entity database via the website of the authority. Based on the company identifier, the system 100 may identify the company record of the company and extract the additional company information in the company record and store the additional company information in the system database 140. System 100 may include a merging module 124 configured to merge the additional company information with the company information in the system database 140. Merging module 124 may be used to merge any extracted information to the company information. System 100 may be configured to link the additional company information with the company information via the company identifier. By scraping the plurality of websites for company information and extracting additional company information based on one or more company identifiers, it is possible to extract substantial company information of the company. System 100 may be configured to generate a plurality of the categories based on the categories in the at least one website. When the system 100 is scraping the at least one website, the system 100 is configured to identify the plurality of categories in each website. Plurality of categories may be from one or more than one website. Based on the categories identified, the system 100 may replicate one or more of the categories to be used to store the company information in the system database 140. Alternatively, the system 100 may receive a plurality of categories input by the system administrator via the peripheral interface modules. System 100 may include a categorising module 126 configured to categorise the company information into the plurality of categories. System 100 may be configured to receive a search query from a user to search the company information. When the search query is received, the system 100 may be configured to retrieve the company information from the system database 140 based on the search query. Search query may include at least one of search terms, keywords and at least one of the plurality of categories. System 100 may display the plurality of categories on the display of the user device 160 for the user to choose. User may enter the search query into the user device 160 to be transmitted to the system 100.
[0026] Fig. 2 shows a flow diagram of a method 1000 for searching company information of a company. The method includes scraping at least one website relevant to a company to extract company information in block 1020, generating a company identifier of the company in block 1040, identifying a company record of the company in at least one entity database based on the company identifier in block 1060, extracting additional company information of the company from the company record in block 1080, merging the additional company information with the company information in block 1100, generating a plurality of categories from the at least one website in block 1120, categorising the company information into the plurality of categories in block 1140, receiving a search query to search the company information in block 1160, and retrieving the company information based on the search query in block 1180. [0027] Fig. 2 shows a flow diagram of a method 1000 for searching company information of a company, the method includes scraping at least one website relevant to a company to extract company information in block 1020, generating a company identifier of the company in block 1040, identifying a company record of the company in at least one entity database based on the company identifier in block 1060, extracting additional company information of the company from the company record in block 1080, merging the additional company information with the company information in block 1100, generating a plurality of categories from the at least one website in block 1120, categorising the company information into the plurality of categories in block 1140, receiving a search query to search the company information in block 1160, and retrieving the company information based on the search query in block 1180.
[0028] System 100 may include an extraction module configured to search the plurality of websites to extract missing data not extracted during the scraping of the websites. Missing data may include logo, description of company, banner image, social media accounts, etc. Extraction module may be configured to identify the missing data, search the plurality of websites and extract the missing data from the at least one website. Alternatively, the system 100 may receive the missing data via input from the system administrator.
[0029] System 100 may be configured to scrape for goods and services information relevant to the company using the company information. Besides company information, it is advantageous to incorporate information of the goods and services provided by the company so that such information may be readily available to the user for searching. While such information is available on the internet, the information may not be easily found or easily associated or linked to the company. Hence, the system 100 is able to scrape such information using the company information extracted and merge or link such information to the company information. Further, by categorising such information, the system 100 is able to efficiently, accurately and quickly identify the relevant information based on the plurality of categories chosen by the user. Thus, providing the user the search results that is required. Some of the goods and services information are provided in the following examples. [0030] A company may have projects and project information may be obtained on such projects. System 100 may be configured to scrape the at least one website for project information using the company information, generate a plurality of project categories based on the project information, categorise the project information into the plurality of project categories, merge and/or link the project information to the company information, and retrieve the project information based on the search query.
[0031] For example, the goods of a construction company may have completed projects and information of the projects may include length of projects, costs of projects, name of subcontractors, etc. Once the system 100 has scraped the project information, the system 100 may retrieve the plurality of project categories from the at least one website or generate the plurality of project categories based on the categories found on the at least one website. Alternatively, the system 100 may receive the plurality of project categories from the system administrator. System 100 may then categorise the project information according to the plurality of categories before merging the project information with the company information or link the project information to the company information via the company identifier or both. When searching for such information, the user may search based on at least one of the plurality of project categories. Hence, the search query may include at least one of the plurality of project categories.
[0032] In another example, a company that supplies toilet fittings may have participated in the project of the construction company. System 100 may be configured to scrape the at least one website for product information, e.g. taps, using the company information. System 100 may be configured to retrieve the plurality of product categories from the at least one website or generate a plurality of product categories based on the product information. Alternatively, the system 100 may receive the plurality of product categories from the system administrator. Based on the at least one website, the system 100 may categorise the product information into the plurality of product categories. System 100 may categorise the product information based on the plurality of categories. System 100 may be configured to merge with the company information or link the product information to the company information via the company identifier or both. When the system 100 receives the search query of the user, the system 100 is able to retrieve the product information based on the search query. As mentioned above, the search query may include at least one of the plurality of product categories chosen by the user.
[0033] Besides completed projects, the company may have potential projects in the pipeline. For example, in a construction company, there may be tender projects that have been tendered by the company. System 100 may be configured to scrape the at least one website for tender project information using the company information. System 100 may be configured to merge the tender projection information to the company information or link the tender project information to the company information via the company identifier or both. Upon receipt of the search query from the user, the system 100 may be configured to retrieve the tender project information based on the search query.
[0034] Company may also provide services apart from the products. System 100 may be configured to scrape the at least one website for service information of services provided by the company using the company information. System 100 may be configured to merge the service information with the company information or link the service information to the company information via the company identifier or both. Upon receipt of the search query, the system 100 may be configured to retrieve the service information based on the search query.
[0035] Company information may include grading of the company assigned by the authority. The user may be presented the grading when searching for a company or the company information so that the user is able to assess the suitability of the company based on the grading against the user’ s requirement.
[0036] System 100 may be configured to execute multithread scraping to scrape the at least one website. Processor 110 of the system 100 may be configured to execute the multithread scraping. By using multithread scraping, which is a parallel programming concepts, the number of websites that can be scraped by the system 100 increases for each unit of time spent. Conversely, the time required to scrape the plurality of websites may be significantly reduced. [0037] Merging module 124 may be configured to index the company information and another information so as to improve searchability of the information for the merging the company information and the another information, e.g. product information. Merging module 124 may be configured to select one or more fields or receive one or more fields from the system administrator for indexing the company information and the another information. Merging module 124 may be configured to merge the company information with another information using multithreading. Memory 120 may be tuned to optimise the merging performance of the merging module 124.
[0038] System 100 allows the user to comprehensively search for company information of a company. When the user enters a search query, e.g. search terms and keywords, into the user device 160, the user device 160 may communicate with the system 100 via the internet to transmit the search query to the system 100. As mentioned, the user device 160 may receive a plurality of categories, including project categories, product categories, from the system 100 and display the plurality of categories on the user device 160. User may choose one of more of the plurality of categories to be included in the search query. Upon receipt of the search query, the system 100 may access the system database 140 to retrieve the company information, which may include information like the project information, tender information, product information and/or service, based on the search query and transmit the search result to the user device 160. The search result transmitted to the user provides the user the detailed information that the user requires quickly and efficiently.
[0039] The present invention relates to a system and method for searching company information generally as herein described, with reference to and/or illustrated in the accompanying drawings.

Claims

Claim
1. A system for searching company information of a company, the system comprising: a processor, a memory in communication with the processor for storing instructions executable by the processor, wherein the processor is configured to: scrape at least one website relevant to a company to extract company information; generate a company identifier of the company; identify a company record of the company in at least one entity database based on the company identifier; extract additional company information of the company from the company record; merge the additional company information with the company information; generate a plurality of categories from the at least one website; categorise the company information into the plurality of categories; receive a search query to search the company information; and retrieve the company information based on the search query.
2. A system according to claim 1, wherein the search query comprises at least one of the plurality of categories.
3. A system according to claim 1 or 2, wherein the processor is further configured to: scrape the at least one website for project information using the company information; generate a plurality of project categories based on the project information; categorise the project information into the plurality of project categories; merge and/or link the project information to the company information; and retrieve the project information based on the search query.
4. A system according to claim 3, wherein the search query comprises at least one of the plurality of project categories.
5. A system according to any one of claims 1 to 4, wherein the processor is further configured to: scrape the at least one website for product information using the company information; generate a plurality of product categories based on the product information; categorise the product information into the plurality of product categories; merge and/or link the product information to the company information; and retrieve the product information based on the search query.
6. A system according to claim 5, wherein the search query comprises at least one of the plurality of product categories.
7. A system according to any one of claims 1 to 6, wherein the processor is configured to execute multithread scraping to scrape the at least one website.
8. A system according to any one of claims 1 to 7, wherein the processor is configured to: scrape the at least one website for tender project information using the company information; merge and/or link the tender project information to the company information; and retrieve the tender project information based on the search query.
9. A system according to any one of claims 1 to 8, wherein the processor is configured to: scrape the at least one website for service information of services provided by the company using the company information; merge and/or link the service information to the company information; and retrieve the service information based on the search query.
10. A method for searching company information of a company, the method comprising scraping at least one website relevant to a company to extract company information; generating a company identifier of the company; identifying a company record of the company in at least one entity database based on the company identifier; extracting additional company information of the company from the company record; merging the additional company information with the company information; generating a plurality of categories from the at least one website; categorising the company information into the plurality of categories; receiving a search query to search the company information; and retrieving the company information based on the search query.
11. A method according to claim 10, wherein the search query comprises at least one of the plurality of categories
12. A method according to claim 10 or 11, further comprising: scraping the at least one website for project information using the company information; generating a plurality of project categories based on the project information; categorising the project information into the plurality of project categories; merging and/or linking the project information to the company information; and retrieving the project information based on the search query.
13. A method according to claim 12, wherein the search query comprises at least one of the plurality of project categories.
14. A method according to any one of claims 10 to 13, further comprising: scraping the at least one website for product information using the company information; generating a plurality of product categories based on the product information; categorising the product information into the plurality of product categories; merging and/or linking the product information to the company information; and retrieving the product information based on the search query.
15. A method according to claim 14, wherein the search query comprises at least one of the plurality of product categories.
16. A method according to any one of claims 10 to 15, further comprising scraping the at least one website comprises executing multithread scraping to scrape the at least one website.
17. A method according to any one of claims 10 to 16, further comprising: scraping the at least one website for tender project information using the company information; merging and/or linking the tender project information to the company information; and retrieving the tender project information based on the search query.
18. A method according to any one of claims 10 to 17, further comprising: scraping the at least one website for service information of services provided by the company using the company information; merging and/or linking the service information to the company information; and retrieving the service information based on the search query.
19. A non-transitory computer readable storage medium comprising instructions, wherein the instructions, when executed by a processor in a terminal device, cause the terminal device to: scrape at least one website relevant to a company to extract company information; generate a company identifier of the company; identify a company record of the company in at least one entity database based on the company identifier; extract additional company information of the company from the company record; merge the additional company information with the company information; generate a plurality of categories from the at least one website; categorise the company information into the plurality of categories; receive a search query to search the company information; and retrieve the company information based on the search query.
PCT/SG2021/050489 2021-08-20 2021-08-20 A system for searching company information and a method thereof WO2023022649A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SG2021/050489 WO2023022649A1 (en) 2021-08-20 2021-08-20 A system for searching company information and a method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SG2021/050489 WO2023022649A1 (en) 2021-08-20 2021-08-20 A system for searching company information and a method thereof

Publications (1)

Publication Number Publication Date
WO2023022649A1 true WO2023022649A1 (en) 2023-02-23

Family

ID=85240900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2021/050489 WO2023022649A1 (en) 2021-08-20 2021-08-20 A system for searching company information and a method thereof

Country Status (1)

Country Link
WO (1) WO2023022649A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120006A1 (en) * 2003-05-30 2005-06-02 Geosign Corporation Systems and methods for enhancing web-based searching
US20090204569A1 (en) * 2008-02-11 2009-08-13 International Business Machines Corporation Method and system for identifying companies with specific business objectives
US7949646B1 (en) * 2005-12-23 2011-05-24 At&T Intellectual Property Ii, L.P. Method and apparatus for building sales tools by mining data from websites
US20120265610A1 (en) * 2011-01-31 2012-10-18 Yaacov Shama Techniques for Generating Business Leads
US20200242170A1 (en) * 2019-01-29 2020-07-30 Salesforce.Com, Inc. Method and system for automatically enriching collected seeds with information extracted from one or more websites

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120006A1 (en) * 2003-05-30 2005-06-02 Geosign Corporation Systems and methods for enhancing web-based searching
US7949646B1 (en) * 2005-12-23 2011-05-24 At&T Intellectual Property Ii, L.P. Method and apparatus for building sales tools by mining data from websites
US20090204569A1 (en) * 2008-02-11 2009-08-13 International Business Machines Corporation Method and system for identifying companies with specific business objectives
US20120265610A1 (en) * 2011-01-31 2012-10-18 Yaacov Shama Techniques for Generating Business Leads
US20200242170A1 (en) * 2019-01-29 2020-07-30 Salesforce.Com, Inc. Method and system for automatically enriching collected seeds with information extracted from one or more websites

Similar Documents

Publication Publication Date Title
US9256686B2 (en) Using a bloom filter in a web analytics application
US20070282867A1 (en) Extraction and summarization of sentiment information
US20210026860A1 (en) Method and device for generating ranking model
US9529926B2 (en) Snapshot refreshment for search results page preview
US11636078B2 (en) Personally identifiable information storage detection by searching a metadata source
US20220019742A1 (en) Situational awareness by fusing multi-modal data with semantic model
CN110737824B (en) Content query method and device
US10679230B2 (en) Associative memory-based project management system
CN111078980A (en) Management method, device, equipment and storage medium based on credit investigation big data
CA2793400C (en) Associative memory-based project management system
US11868363B2 (en) Method and system for persisting data
CN116594683A (en) Code annotation information generation method, device, equipment and storage medium
CN111488386B (en) Data query method and device
US20210240928A1 (en) Mapping feedback to a process
WO2023022649A1 (en) A system for searching company information and a method thereof
CN106528590B (en) Query method and device
US11599801B2 (en) Method for solving problem, computing system and program product
CN114895997A (en) Task association method and device and electronic equipment
Imker et al. A machine learning-enabled open biodata resource inventory from the scientific literature
KR20170044408A (en) System and method for recommending project
CN114254081B (en) Enterprise big data search system, method and electronic equipment
CN109710673B (en) Work processing method, device, equipment and medium
CN114519090B (en) Method and device for managing stop words and electronic equipment
US10824681B2 (en) Enterprise resource textual analysis
JP2017072978A (en) Knowledge information management apparatus, knowledge information management system, knowledge information management method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21954369

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE