WO2015183098A1 - Method and system for collecting, transforming, storing, and presentation of data from multiple data sources. - Google Patents

Method and system for collecting, transforming, storing, and presentation of data from multiple data sources. Download PDF

Info

Publication number
WO2015183098A1
WO2015183098A1 PCT/NO2015/050090 NO2015050090W WO2015183098A1 WO 2015183098 A1 WO2015183098 A1 WO 2015183098A1 NO 2015050090 W NO2015050090 W NO 2015050090W WO 2015183098 A1 WO2015183098 A1 WO 2015183098A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
business
sources
module
company
Prior art date
Application number
PCT/NO2015/050090
Other languages
French (fr)
Inventor
Harald Jellum
Original Assignee
Companybook As
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Companybook As filed Critical Companybook As
Priority to EP15799252.0A priority Critical patent/EP3149690A4/en
Publication of WO2015183098A1 publication Critical patent/WO2015183098A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Definitions

  • the invention provides a method and system for collecting, transforming, storing and presentation of data from multiple data sources on a common platform, where data from multiple business sources transforms to same structure which makes it possible to summarize, compare, see differences, statistics, trends and other related relations between the sources.
  • FIGURES
  • Fig 1 is a block diagram overview of the system
  • Fig 2 is a block diagram overview of Filtering, Entity extraction and standardized ID's
  • Fig 3 is a flow chart of an embodiment of the method DETAILED DESCRIPTION OF THE INVENTION
  • the invention provides a method and system for providing a complete "world” overview of information related to business information systems, such as company web pages , company data bases, private and public registers , search engines and business applications, users, employees, owners, consultancies, business professionals or other relevant business relations, News, forum, Blogs, Social networks or similar.
  • business information systems such as company web pages , company data bases, private and public registers , search engines and business applications, users, employees, owners, consultancies, business professionals or other relevant business relations, News, forum, Blogs, Social networks or similar.
  • the invention provides a method and system for viewing information from different sources having originally different format.
  • the object of the invention is to provide an information system usable for employees, owners, consultants or business contacts of a company, and optionally any person in need for a complete overview of a topic or entity, irrespectively if the information originates from any type of source, being a company information system such as company database, financial registers, public or private company registers, company catalogues, other business intelligence systems, company applications, search engines, or other relevant business systems.
  • the invention solves the above mentioned problems of prior art by structuring all the information presented in available data sources, and thereby providing a relation between information about for example, companies, products, services, trends, sentiments, connections between companies or other business related information, and thereby presenting an "world" overview were the relations between information and source becomes clearer.
  • the invention will also present a combined information overview of an object collected from different sources. A typical use will be to use the invention to aid in the search for businesses, products, persons or business opportunities.
  • the invention can be integrated with other company information solutions.
  • the method and system of the invention transforms data from multiple business sources into the same format which makes it possible to summarize, compare, see differences, statistics, trends and other relations between the information from the various sources and to enable combination of the total volume of information in a way that it represents the complete "world's" combined business information volume.
  • the method and system of the invention continuously read 600s all business sources 700, 710, 720, 730, 740, 750 comprising structured or unstructured information. All information in the search results then is filtered 500 and results in relevant business news. All entities are extracted 400 and the entities are mapped to «standardized» ID's 300 and stored in a database 200 of the invention. It is thus possible to summarize 100, see differences 120, statistics 140, trends 160 or other relation between the sources.
  • the method and system of the invention automatically transforms unstructured and structured business data into the same structure which makes it possible to summarize, compare, see differences, statistics, trends and other relations between the different sources, in other words a method and system which makes it possible to combine all business information together.
  • the method and system of the invention uses advanced search technology to crawl 600 business sources 700, 710, 720, 730, 740, 750 which is then filtered 600 and automatically extracting entities 400 which is then transformed to standardized ID's 300 which converts the data into a common structure and stored in a search data base 200.
  • the search database 200 offer a number of services where the information can be combined to provide for example, but not limited to: Weighted sum from all sources 100, Differences between sources 120, Statistics 140, Trends 160, Relations between sources, filtering and content 180, other combinations of information forms the bases for the information presentation form.
  • the invention may use machine learning, natural language processing, training sets, word vectors, stemming and other relevant techniques combined with synonyms, dictionaries, databases and language translators.
  • the method of the invention provides a method to transform data from multiple business and data sources to the same data structure to enable a user to summarize, compare, see differences, statistic, trends, and other relations between the sources by using modern search technology 600 together with filtering 500, entity extraction 400 and "mapping" 300 to a common structure 200 in a database which provides the possibilities to summarize, compare, see differences, statistic, trends and other relations between the sources 100-160.
  • crawler (600) modules which automatically read information from different business data sources and the like.
  • These sources can comprise both structured and unstructured information such as on or more of, but not limited to: company web sites 700, company databases, private or public registers 710, search engines, applications within business/company services 720, users, employees, owners, consultancies, business professionals or other relevant business relations 730, News 740 and Forum, Blogs, Social networks or similar 750.
  • the information is passing through a business relevance filter (500) module filtering out information such as, but not limited to: Business related terms and expressions as: Company name, products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content.
  • entity extraction (400) module for identification of entities such as, but not limited to company name, person name and title, industry, product, location, market, financial data or other business related entities.
  • a mapping module 300 will then map entities to standardized ID's.
  • the entities are mapped to a standardized unique ID 300 for each type of entity group in such a way that all information is stored on a common structure form in the database 200.From this common structure one or it is possible to derive relations such as, but not limited to: the weighted sum from all sources 100, the differences between the sources 120, statistics 140 and trends 160 or other relations or operations between the sources.
  • the system is further discussed in figure 2 where it is shown that business data from structured and unstructured sources 700-750 will be checked if they are "business relevant". Examples of techniques used are business name dictionaries, addresses, contact data, persons, industry, catalogues, dictionaries, languages modules, location, rules and learnings from read content.
  • entity extraction module 400 may be further optimized depending on what type of business data that shall be passed on to the transformation process in the entity extraction module 400.
  • entity extraction module 400 From the relevant business information it will be extracted entities from the text. This is done by for example machine learning, natural language processing (NLP), training set, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other text recognition technology 405.
  • entities which can be extracted may be one or more of, but not limited to: company name, person name and title 410, industry 415, product 420, location 425, market 430, financial data 435 or other business related entities.
  • the entities are then sent further to a mapping module 300 which standardizes the entities to unique ID's.
  • An example of an entity may be «smart mobile phones» and «smart telephone)) which both means the same, and thus will be mapped and associated with the same ID.
  • Other techniques may be used, such as industries SIC (Standard Industrial Classification) codes.
  • Search technology stemming is another method which identifies the base form of the word, yet another method is soundex which identifies the sound picture of the word, synonyms, NLP, vector representation of expressions, machine learning and known trainsets (300) are other examples.
  • FIG. 3 An embodiment of providing a weighted sum of all sources is presented in fig. 3, where the example make use of data from the database 200 comprising common structure for all sources.
  • Business information from different sources comprising company name and associated products can be summarized.
  • For each source a 3 dimensional representation 101 comprising company name ID as one axis and associated Product ID as the other axis is created, and the probability to belong to a marked is indicated by the height 102 of the displayed surface. It is also possible to combine this with the company industry code, location, financial figures or other business related parameters. This is done in the same way per different business source illustrated as different layers 103-105 in the figure. These can then be summarized 106 to show the sum of all probabilities within a given market. In addition one can weigh the different sources respective quality, size, feedback from users etc.
  • the height of the 3-D plane is the probability for a company or product to belong to a given market. It is possible to include other parameters which also have effect on the height, such as: Weight of source (trust score), and Size of source. This principle applies for all other properties and combinations of the information from the company or source.
  • the system may be comprising one or more crawler (600) modules, wherein the crawler modules are set up to search and fetch data from structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
  • structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
  • a filtering (500) module where the filter module are set up to filter out from the searched and fetched data from the crawler (600) modules terms and expressions such as, but not limited to:
  • Company name products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content,
  • entity extraction (400) module identifies entities such as, but not limited to: company name, person name and title, industry, product, location, market, financial data or other business related entities,
  • mapping (300) module for mapping entities to standard ID's
  • one or more output (100 - 160) modules for providing relational information between the data sources.
  • system further comprise a network service and a communication module, for providing communication between the system and a user.
  • the entity extraction module comprise an entity recognizer (405) module for further optimization of the searched and fetched data and recognition of relevant business information by for example, but not limited to: machine learning, natural language processing (NLP), training set, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other text recognition technologies.
  • entity recognizer 405
  • the relational information from the one or more output modules comprise one or more of, but not restricted to: a weighted sum from all sources(lOO), differences between sources (120), statistics(140) and trends (160) between the sources.
  • the system is comprised in a cloud service.
  • the method provides a common platform for representation of data from multiple data sources using the system defined in any of the previous claims, the method comprising performing the following steps:
  • crawlers from structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
  • structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
  • filtering out from the searched and fetched data from the crawler (600) modules in a filtering (500) module, terms and expressions such as, but not limited to: Company name, products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content,
  • entity extraction (400) module identifying and extracting entities in an entity extraction (400) module, wherein the entities may comprise, but not limited to: company name, person name and title, industry, product, location, market, financial data and other business related entities,
  • mapping entities to standard ID's in a mapping (300)
  • the filtering operation is further set up to learn from previously filtered content.
  • the identifying and extracting entities operation is further set up to learn from previously identified and extracted content.
  • the identifying and extraction of entities operation may use techniques as machine learning, «natural language processing)) (NLP), training sets, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other relevant text recognition technologies (405).
  • NLP natural language processing
  • mapping of entities operation to standardized ID's can use techniques as, but not limited to: others ID standards, own standards, search technology as stemming, «soundex», synonyms, NLP, vector representation of expressions, machine learning and know training sets (300).
  • the common structure (200) of the sources may be used as a weighted sum of the business information from all sources to give a summarized possibility to belong to a given market based on a set of products(lOO).
  • a company name ID and product ID are set together in a 3 - dimensional plane (101) and where the height is the probability to belong to a given market given a set of products (102) or other relevant combination of a company properties.
  • the different planes (101) represents corresponding different business sources (103-105) and that these may be summarized (106) into one weighted probability for all sources.
  • the different sources may be weighted based on their quality, trust, reputation or other relevant parameters.
  • the probability to belong to a market given a set of products may depend on the company's official industry code, location, financial numbers and other business related parameters.
  • the output of a common structure (200) may show trends over time to develop relationships between companies, products, locations, market, financial strength and other business relations.
  • the output of a common structure (200) may show statistics over most common trends, most popular products and services, most popular companies, industries, locations, megatrends, technology development or other relevant relations.
  • the output for a common structure (200) may show differences between sources as e.g. based on locations, deviation from the normal, normal distributions, standard deviation, derived over time or similar.
  • the solution may be integrated as a part of other systems as company databases, financial registers, public company registers, company catalogues, other for dictionaries for companies, business applications, search engines and other relevant business systems.
  • the method may be integrated with mobile applications, tablets, «phablets» or other communication devices which uses the devices information about information about time, location, user, language, profile etc.
  • the total knowledge from the sources may be shown as different graphs.
  • the entities can be words, known sentences, relations between word or other text relations.
  • the search from different sources may be combined.
  • the output from the output modules (100 - 160) is communicated to a user.
  • the method is implemented as a cloud service.

Abstract

Method and system for collecting, transforming, storing and presentation of data from multiple data sources on a common platform, where data from multiple business sources transforms to same Method and system for collecting, transforming, storing, and presentation of data from multiple data sources. structure which makes it possible to summarize, compare, see differences, statistics, trends and other related relations between the sources.

Description

TITLE:
Method and system for collecting, transforming, storing, and presentation of data from multiple data sources. PRIOR ART
Business Information databases:
There exists several business information systems which contain information about company names, addresses, contact data, turnover, result and other typical structured data elements. Examples can be Yellow pages, Proff.no, D&B, Experian and others. Typical for these are that they are comprised in a database which typically is updated manually or partly automated through a content management system where input originates from persons calling the companies in the database to collect information. Update frequency is typically every 1-2 years. The problem with such products and solutions are that the content production is primarily based on input from one or , at best, a few sources. Therefore there may be information which are of importance, but never becomes part of the system because the information does not appear in the input bases used. The other problem is that these systems are very resource demanding to keep updated.
Media monitoring:
There exists various systems for media monitoring, which allow the user to manually search for a company name which then let the user see news article comprising the exactly defined company name. The challenge is that there are many examples of names, places, and things that have the same name as a Company. Examples are Apple the fruit or the company Apple, persons named Ericsson versus the company name Ericsson or the generic distance term "miles" as compared to the company Miles. In these cases a user performing a search based on such search terms will experience a lot of "noise", or unwanted hits. The similar will also apply when a search for products is performed. A problem with these systems is that they offer no possibilities to avoid the "noise".
Search engines:
There is a lot of business information on internet. This content is typically indexed by search engines so the user can search by free text to find the best matching Web sites. When looking for a company you typically get a list of URL's. These are often ranked by different parameters controlled by the search engine. In a typical search engine there is a problem that the search engine does not understand the content, and a further problem is that the pages resulting from a search have only one common feature, namely the search string, or parts of the search string. The problem is then that completely unrelated articles / pages may be shown as a result from the same search.
BRIEF DESCRIPTION OF THE INVENTION :
The invention provides a method and system for collecting, transforming, storing and presentation of data from multiple data sources on a common platform, where data from multiple business sources transforms to same structure which makes it possible to summarize, compare, see differences, statistics, trends and other related relations between the sources. FIGURES
Fig 1 is a block diagram overview of the system
Fig 2 is a block diagram overview of Filtering, Entity extraction and standardized ID's
Fig 3 is a flow chart of an embodiment of the method DETAILED DESCRIPTION OF THE INVENTION
The invention provides a method and system for providing a complete "world" overview of information related to business information systems, such as company web pages , company data bases, private and public registers , search engines and business applications, users, employees, owners, consultancies, business professionals or other relevant business relations, News, forum, Blogs, Social networks or similar.
The invention provides a method and system for viewing information from different sources having originally different format. The object of the invention is to provide an information system usable for employees, owners, consultants or business contacts of a company, and optionally any person in need for a complete overview of a topic or entity, irrespectively if the information originates from any type of source, being a company information system such as company database, financial registers, public or private company registers, company catalogues, other business intelligence systems, company applications, search engines, or other relevant business systems.
Further, the invention solves the above mentioned problems of prior art by structuring all the information presented in available data sources, and thereby providing a relation between information about for example, companies, products, services, trends, sentiments, connections between companies or other business related information, and thereby presenting an "world" overview were the relations between information and source becomes clearer. The invention will also present a combined information overview of an object collected from different sources. A typical use will be to use the invention to aid in the search for businesses, products, persons or business opportunities.
It is an object of the invention to provide a user with a method and system for improved ability to create new opportunities for businesses by services with much better combined business knowledge from various sources. The invention can be integrated with other company information solutions.
In other words, the method and system of the invention transforms data from multiple business sources into the same format which makes it possible to summarize, compare, see differences, statistics, trends and other relations between the information from the various sources and to enable combination of the total volume of information in a way that it represents the complete "world's" combined business information volume.
The method and system of the invention, with reference to fig. 1, continuously read 600s all business sources 700, 710, 720, 730, 740, 750 comprising structured or unstructured information. All information in the search results then is filtered 500 and results in relevant business news. All entities are extracted 400 and the entities are mapped to «standardized» ID's 300 and stored in a database 200 of the invention. It is thus possible to summarize 100, see differences 120, statistics 140, trends 160 or other relation between the sources.
The method and system of the invention automatically transforms unstructured and structured business data into the same structure which makes it possible to summarize, compare, see differences, statistics, trends and other relations between the different sources, in other words a method and system which makes it possible to combine all business information together.
The method and system of the invention uses advanced search technology to crawl 600 business sources 700, 710, 720, 730, 740, 750 which is then filtered 600 and automatically extracting entities 400 which is then transformed to standardized ID's 300 which converts the data into a common structure and stored in a search data base 200. The search database 200 offer a number of services where the information can be combined to provide for example, but not limited to: Weighted sum from all sources 100, Differences between sources 120, Statistics 140, Trends 160, Relations between sources, filtering and content 180, other combinations of information forms the bases for the information presentation form.
The invention may use machine learning, natural language processing, training sets, word vectors, stemming and other relevant techniques combined with synonyms, dictionaries, databases and language translators.
The method of the invention provides a method to transform data from multiple business and data sources to the same data structure to enable a user to summarize, compare, see differences, statistic, trends, and other relations between the sources by using modern search technology 600 together with filtering 500, entity extraction 400 and "mapping" 300 to a common structure 200 in a database which provides the possibilities to summarize, compare, see differences, statistic, trends and other relations between the sources 100-160.
One embodiment of the system of the invention is outlined in fig. 1, where the system comprises crawler (600) modules which automatically read information from different business data sources and the like. These sources can comprise both structured and unstructured information such as on or more of, but not limited to: company web sites 700, company databases, private or public registers 710, search engines, applications within business/company services 720, users, employees, owners, consultancies, business professionals or other relevant business relations 730, News 740 and Forum, Blogs, Social networks or similar 750. The information is passing through a business relevance filter (500) module filtering out information such as, but not limited to: Business related terms and expressions as: Company name, products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content. Next is an entity extraction (400) module for identification of entities such as, but not limited to company name, person name and title, industry, product, location, market, financial data or other business related entities. A mapping module 300 will then map entities to standardized ID's. The entities are mapped to a standardized unique ID 300 for each type of entity group in such a way that all information is stored on a common structure form in the database 200.From this common structure one or it is possible to derive relations such as, but not limited to: the weighted sum from all sources 100, the differences between the sources 120, statistics 140 and trends 160 or other relations or operations between the sources. The system is further discussed in figure 2 where it is shown that business data from structured and unstructured sources 700-750 will be checked if they are "business relevant". Examples of techniques used are business name dictionaries, addresses, contact data, persons, industry, catalogues, dictionaries, languages modules, location, rules and learnings from read content. These may be further optimized depending on what type of business data that shall be passed on to the transformation process in the entity extraction module 400. From the relevant business information it will be extracted entities from the text. This is done by for example machine learning, natural language processing (NLP), training set, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other text recognition technology 405. Examples of entities which can be extracted may be one or more of, but not limited to: company name, person name and title 410, industry 415, product 420, location 425, market 430, financial data 435 or other business related entities. The entities are then sent further to a mapping module 300 which standardizes the entities to unique ID's. An example of an entity may be «smart mobile phones» and «smart telephone)) which both means the same, and thus will be mapped and associated with the same ID. Other techniques may be used, such as industries SIC (Standard Industrial Classification) codes. Search technology stemming is another method which identifies the base form of the word, yet another method is soundex which identifies the sound picture of the word, synonyms, NLP, vector representation of expressions, machine learning and known trainsets (300) are other examples.
An embodiment of providing a weighted sum of all sources is presented in fig. 3, where the example make use of data from the database 200 comprising common structure for all sources. Business information from different sources comprising company name and associated products can be summarized. For each source a 3 dimensional representation 101 comprising company name ID as one axis and associated Product ID as the other axis is created, and the probability to belong to a marked is indicated by the height 102 of the displayed surface. It is also possible to combine this with the company industry code, location, financial figures or other business related parameters. This is done in the same way per different business source illustrated as different layers 103-105 in the figure. These can then be summarized 106 to show the sum of all probabilities within a given market. In addition one can weigh the different sources respective quality, size, feedback from users etc.
In one embodiment the height of the 3-D plane is the probability for a company or product to belong to a given market. It is possible to include other parameters which also have effect on the height, such as: Weight of source (trust score), and Size of source. This principle applies for all other properties and combinations of the information from the company or source.
In a first embodiment of the system of the invention the system may be comprising one or more crawler (600) modules, wherein the crawler modules are set up to search and fetch data from structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
a filtering (500) module, where the filter module are set up to filter out from the searched and fetched data from the crawler (600) modules terms and expressions such as, but not limited to:
Company name, products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content,
an entity extraction (400) module, wherein the entity extraction (400) module identifies entities such as, but not limited to: company name, person name and title, industry, product, location, market, financial data or other business related entities,
a mapping (300) module for mapping entities to standard ID's,
a database (200) for storing the data that is searched and fetched by the crawler (600) modules, filtered in the filtering (500) module, extracted in the extracting (400) module and mapped in the mapping (300) module, in predefine data structures,
one or more output (100 - 160) modules for providing relational information between the data sources.
In a second embodiment of the system of the invention according to the first embodiment of the system of the invention, the system further comprise a network service and a communication module, for providing communication between the system and a user.
In a third embodiment of the system of the invention according to the first or second embodiments of the system of the invention, the entity extraction module comprise an entity recognizer (405) module for further optimization of the searched and fetched data and recognition of relevant business information by for example, but not limited to: machine learning, natural language processing (NLP), training set, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other text recognition technologies. In a fourth embodiment of the system of the invention according to any of the previous embodiments of the system of the invention, the relational information from the one or more output modules comprise one or more of, but not restricted to: a weighted sum from all sources(lOO), differences between sources (120), statistics(140) and trends (160) between the sources.
In a fifth embodiment of the system of the invention according to any of the previous embodiments of the system of the invention, the system is comprised in a cloud service. In a first embodiment of the method of the invention the method provides a common platform for representation of data from multiple data sources using the system defined in any of the previous claims, the method comprising performing the following steps:
searching and fetching data using crawlers (600) from structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
filtering out from the searched and fetched data from the crawler (600) modules, in a filtering (500) module, terms and expressions such as, but not limited to: Company name, products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content,
identifying and extracting entities in an entity extraction (400) module, wherein the entities may comprise, but not limited to: company name, person name and title, industry, product, location, market, financial data and other business related entities,
mapping entities to standard ID's, in a mapping (300),
storing the data that is searched and fetched by the crawler (600) modules, filtered in the filtering (500) module, extracted in the extracting (400) module and mapped in the mapping (300) module, in predefine data structures in a database (200), and
output relational information between the data sources by one or more output (100 - 160) modules.
In a second embodiment of the method of the invention according to the first embodiment of the method of the invention, the filtering operation is further set up to learn from previously filtered content. In a third embodiment of the method of the invention according to the first or second embodiments of the method of the invention, the identifying and extracting entities operation is further set up to learn from previously identified and extracted content.
In a fourth embodiment of the method of the invention according to any of the previous
embodiments of the method of the invention the identifying and extraction of entities operation may use techniques as machine learning, «natural language processing)) (NLP), training sets, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other relevant text recognition technologies (405).
In a fifth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the mapping of entities operation to standardized ID's can use techniques as, but not limited to: others ID standards, own standards, search technology as stemming, «soundex», synonyms, NLP, vector representation of expressions, machine learning and know training sets (300).
In a sixth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the common structure (200) of the sources may be used as a weighted sum of the business information from all sources to give a summarized possibility to belong to a given market based on a set of products(lOO).
In a seventh embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the a company name ID and product ID are set together in a 3 - dimensional plane (101) and where the height is the probability to belong to a given market given a set of products (102) or other relevant combination of a company properties.
In an eight embodiment of the method of the invention according to any of the previous
embodiments of the method of the invention the different planes (101) represents corresponding different business sources (103-105) and that these may be summarized (106) into one weighted probability for all sources. In a ninth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the different sources may be weighted based on their quality, trust, reputation or other relevant parameters. In a tenth embodiment of the method of the invention according to any of the previous
embodiments of the method of the invention the probability to belong to a market given a set of products may depend on the company's official industry code, location, financial numbers and other business related parameters. In an eleventh embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the output of a common structure (200) may show trends over time to develop relationships between companies, products, locations, market, financial strength and other business relations. In a twelwth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the output of a common structure (200) may show statistics over most common trends, most popular products and services, most popular companies, industries, locations, megatrends, technology development or other relevant relations.
In a thirteenth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the output for a common structure (200) may show differences between sources as e.g. based on locations, deviation from the normal, normal distributions, standard deviation, derived over time or similar.
In a fourteenth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the solution may be integrated as a part of other systems as company databases, financial registers, public company registers, company catalogues, other for dictionaries for companies, business applications, search engines and other relevant business systems.
In a fifteenth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention at filtering (500), entity extractions (400) and mapping of entities to ID's may be enhanced by feedback from users. In a sixteenth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the method may be integrated with mobile applications, tablets, «phablets» or other communication devices which uses the devices information about information about time, location, user, language, profile etc.
In a seventeenth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the total knowledge from the sources may be shown as different graphs.
In an eighteenth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the entities can be words, known sentences, relations between word or other text relations. In a nineteenth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the search from different sources may be combined.
In a twentieth embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the companies, persons and news from different sources may be combined.
In a twenty-first embodiment of the method of the invention according to any of the previous embodiments of the method of the invention the output from the output modules (100 - 160) is communicated to a user.
In a twenty-second embodiment of the method of the invention according to any of the previous first to twentieth embodiments of the method of the invention the method is implemented as a cloud service.

Claims

PATENT CLAIMS
1.
System for providing a common platform for representation of data from multiple data sources, the system being c h a r a c t e r i z e d b y comprising:
one or more crawler (600) modules, wherein the crawler modules are set up to search and fetch data from structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
a filtering (500) module, where the filter module are set up to filter out from the searched and fetched data from the crawler (600) modules terms and expressions such as, but not limited to: Company name, products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content,
an entity extraction (400) module, wherein the entity extraction (400) module identifies entities such as, but not limited to: company name, person name and title, industry, product, location, market, financial data or other business related entities,
a mapping (300) module for mapping entities to standard ID's,
a database (200) for storing the data that is searched and fetched by the crawler (600) modules, filtered in the filtering (500) module, extracted in the extracting (400) module and mapped in the mapping (300) module, in predefine data structures,
one or more output (100 - 160) modules for providing relational information between the data sources.
2.
System according to claim 1, wherein the system further comprise:
a network service and a communication module, for providing communication between the system and a user.
3.
System according to claim 1 or claim 2, wherein the entity extraction module comprise an entity recognizer (405) module for further optimization of the searched and fetched data and recognition of relevant business information by for example, but not limited to: machine learning, natural language processing (NLP), training set, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other text recognition technologies.
4.
System according to any of the previous claims, wherein the relational information from the one or more output modules comprise one or more of, but not restricted to: a weighted sum from all sources(lOO), differences between sources (120), statistics(140) and trends (160) between the sources.
5.
System according to any of the previous claims, wherein the system is comprised in a cloud service.
6.
Method for providing a common platform for representation of data from multiple data sources using the system defined in any of the previous claims, the method being
c h a r a c t e r i z e d b y comprising:
searching and fetching data using crawlers (600) from structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
filtering out from the searched and fetched data from the crawler (600) modules, in a filtering (500) module, terms and expressions such as, but not limited to: Company name, products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content,
identifying and extracting entities in an entity extraction (400) module, wherein the entities may comprise, but not limited to: company name, person name and title, industry, product, location, market, financial data and other business related entities,
mapping entities to standard ID's, in a mapping (300), storing the data that is searched and fetched by the crawler (600) modules, filtered in the filtering (500) module, extracted in the extracting (400) module and mapped in the mapping (300) module, in predefine data structures in a database (200), and
output relational information between the data sources by one or more output (100 - 160) modules.
7.
Method according to claim 6, wherein the filtering operation is further set up to learn from previously filtered content.
8.
Method according to any of claim 6 to 7, wherein the identifying and extracting entities operation is further set up to learn from previously identified and extracted content.
9.
Method according to any of claim 6 to 8, wherein the identifying and extraction of entities operation may use techniques as machine learning, «natural language processing)) (NLP), training sets, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other relevant text recognition technologies (405).
10.
Method according to any of claim 6 to 9, wherein the mapping of entities operation to standardized ID's can use techniques as, but not limited to: others ID standards, own standards, search technology as stemming, «soundex», synonyms, NLP, vector representation of expressions, machine learning and know training sets (300).
11.
Method according to any of claim 6 to 10, wherein thecommon structure (200) of the sources may be used as a weighted sum of the business information from all sources to give a summarized possibility to belong to a given market based on a set of products(lOO).
12.
Method according to any of claim 6 to 11, wherein the a company name ID and product ID are set together in a 3 - dimensional plane (101) and where the height is the probability to belong to a given market given a set of products (102) or other relevant combination of a company properties.
13.
Method according to any of claim 6 to 12, wherein different planes (101) represents corresponding different business sources (103-105) and that these may be summarized (106) into one weighted probability for all sources.
14.
Method according to any of claim 6 to 13, wherein the different sources may be weighted based on their quality, trust, reputation or other relevant parameters.
15.
Method according to any of claim 6 to 14, wherein the probability to belong to a market given a set of products may depend on the company's official industry code, location, financial numbers and other business related parameters.
16.
Method according to any of claim 6 to 16, wherein the output of a common structure (200) may show trends over time to develop relationships between companies, products, locations, market, financial strength and other business relations.
17.
Method according to any of claim 6 to 16, wherein the output of a common structure (200) may show statistics over most common trends, most popular products and services, most popular companies, industries, locations, megatrends, technology development or other relevant relations.
18.
Method according to any of claim 6 to 17, wherein the output for a common structure (200) may show differences between sources as e.g. based on locations, deviation from the normal, normal distributions, standard deviation, derived over time or similar.
19.
Method according to any of claim 6 to 18, wherein the solution may be integrated as a part of other systems as company databases, financial registers, public company registers, company catalogues, other for dictionaries for companies, business applications, search engines and other relevant business systems.
20.
Method according to any of claim 6 to 19, wherein at filtering (500), entity extractions (400) and mapping of entities to ID's may be enhanced by feedback from users.
21.
Method according to any of claim 6 to 20, wherein the method may be integrated with mobile applications, tablets, «phablets» or other communication devices which uses the devices information about information about time, location, user, language, profile etc.
22.
Method according to any of claim 6 to 21, wherein the total knowledge from the sources may be shown as different graphs.
23.
Method according to any of claim 6 to 22, wherein entities can be words, known sentences, relations between word or other text relations.
24.
Method according to any of claim 6 to 23, wherein search from different sources may be combined.
25.
Method according to any of claim 6 to 24, wherein companies, persons and news from different sources may be combined.
26.
Method according to any of claim 6 to 25, wherein the output from the output modules (100 - 160) is communicated to a user.
27.
Method according to any of claim 6 to 25, wherein the method is implemented as a cloud service.
PCT/NO2015/050090 2014-05-24 2015-05-26 Method and system for collecting, transforming, storing, and presentation of data from multiple data sources. WO2015183098A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP15799252.0A EP3149690A4 (en) 2014-05-24 2015-05-26 Method and system for collecting, transforming, storing, and presentation of data from multiple data sources.

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NO20140649 2014-05-24
NO20140649 2014-05-24

Publications (1)

Publication Number Publication Date
WO2015183098A1 true WO2015183098A1 (en) 2015-12-03

Family

ID=54699326

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NO2015/050090 WO2015183098A1 (en) 2014-05-24 2015-05-26 Method and system for collecting, transforming, storing, and presentation of data from multiple data sources.

Country Status (2)

Country Link
EP (1) EP3149690A4 (en)
WO (1) WO2015183098A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101713831B1 (en) * 2016-07-26 2017-03-09 한국과학기술정보연구원 Apparatus for recommending document and method for recommending document
KR101931714B1 (en) 2016-12-20 2018-12-26 주식회사 와이즈넛 System and method for extracting named entity using similar document recommand device
KR101962407B1 (en) * 2018-11-08 2019-03-26 한전케이디엔주식회사 System for Supporting Generation Electrical Approval Document using Artificial Intelligence and Method thereof
US20210358042A1 (en) * 2020-05-13 2021-11-18 Hunan Fumi Information Technology Co., Ltd. Stock recommendation method based on item attribute identification and the system thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110040805A1 (en) * 2009-08-11 2011-02-17 Carter Stephen R Techniques for parallel business intelligence evaluation and management
US20120260209A1 (en) * 2011-04-11 2012-10-11 Credibility Corp. Visualization Tools for Reviewing Credibility and Stateful Hierarchical Access to Credibility

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110040805A1 (en) * 2009-08-11 2011-02-17 Carter Stephen R Techniques for parallel business intelligence evaluation and management
US20120260209A1 (en) * 2011-04-11 2012-10-11 Credibility Corp. Visualization Tools for Reviewing Credibility and Stateful Hierarchical Access to Credibility

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3149690A4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101713831B1 (en) * 2016-07-26 2017-03-09 한국과학기술정보연구원 Apparatus for recommending document and method for recommending document
KR101931714B1 (en) 2016-12-20 2018-12-26 주식회사 와이즈넛 System and method for extracting named entity using similar document recommand device
KR101962407B1 (en) * 2018-11-08 2019-03-26 한전케이디엔주식회사 System for Supporting Generation Electrical Approval Document using Artificial Intelligence and Method thereof
US20210358042A1 (en) * 2020-05-13 2021-11-18 Hunan Fumi Information Technology Co., Ltd. Stock recommendation method based on item attribute identification and the system thereof

Also Published As

Publication number Publication date
EP3149690A1 (en) 2017-04-05
EP3149690A4 (en) 2017-11-01

Similar Documents

Publication Publication Date Title
TWI664539B (en) System, apparatus and method for monitoring internet media events based on a constructed industry knowledge graph database
US10902468B2 (en) Real-time, stream data information integration and analytics system
CN106250513B (en) Event modeling-based event personalized classification method and system
US9147154B2 (en) Classifying resources using a deep network
KR101605430B1 (en) SYSTEM AND METHOD FOR BUINDING QAs DATABASE AND SEARCH SYSTEM AND METHOD USING THE SAME
US20160162476A1 (en) Methods and systems for modeling complex taxonomies with natural language understanding
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN109033284A (en) The power information operational system database construction method of knowledge based map
CN110427480B (en) Intelligent personalized text recommendation method and device and computer readable storage medium
CN104392006B (en) A kind of event query processing method and processing device
CN102279894A (en) Method for searching, integrating and providing comment information based on semantics and searching system
US9858332B1 (en) Extracting and leveraging knowledge from unstructured data
CN105843796A (en) Microblog emotional tendency analysis method and device
US9720982B2 (en) Method and apparatus for natural language search for variables
US20180246880A1 (en) System for generating synthetic sentiment using multiple points of reference within a hierarchical head noun structure
US11775767B1 (en) Systems and methods for automated iterative population of responses using artificial intelligence
WO2014127673A1 (en) Method and apparatus for acquiring hot topics
WO2015183098A1 (en) Method and system for collecting, transforming, storing, and presentation of data from multiple data sources.
CN103544321A (en) Data processing method and device for micro-blog emotion information
Kanoje et al. User profiling for university recommender system using automatic information retrieval
CN105630813A (en) Keyword recommendation method and system based on user-defined template
CN110209659A (en) A kind of resume filter method, system and computer readable storage medium
CN109710739A (en) A kind of information processing method and device, storage medium
JP6392042B2 (en) Information providing apparatus, information providing method and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15799252

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015799252

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015799252

Country of ref document: EP