EP3149690A1 - Verfahren und system zur sammlung, umwandlung, speicherung und präsentation von daten aus mehreren datenquellen - Google Patents

Verfahren und system zur sammlung, umwandlung, speicherung und präsentation von daten aus mehreren datenquellen

Info

Publication number
EP3149690A1
EP3149690A1 EP15799252.0A EP15799252A EP3149690A1 EP 3149690 A1 EP3149690 A1 EP 3149690A1 EP 15799252 A EP15799252 A EP 15799252A EP 3149690 A1 EP3149690 A1 EP 3149690A1
Authority
EP
European Patent Office
Prior art keywords
data
business
sources
module
company
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP15799252.0A
Other languages
English (en)
French (fr)
Other versions
EP3149690A4 (de
Inventor
Harald Jellum
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Companybook As
Original Assignee
Companybook As
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Companybook As filed Critical Companybook As
Publication of EP3149690A1 publication Critical patent/EP3149690A1/de
Publication of EP3149690A4 publication Critical patent/EP3149690A4/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Definitions

  • the invention provides a method and system for collecting, transforming, storing and presentation of data from multiple data sources on a common platform, where data from multiple business sources transforms to same structure which makes it possible to summarize, compare, see differences, statistics, trends and other related relations between the sources.
  • FIGURES
  • Fig 1 is a block diagram overview of the system
  • Fig 2 is a block diagram overview of Filtering, Entity extraction and standardized ID's
  • Fig 3 is a flow chart of an embodiment of the method DETAILED DESCRIPTION OF THE INVENTION
  • the invention provides a method and system for providing a complete "world” overview of information related to business information systems, such as company web pages , company data bases, private and public registers , search engines and business applications, users, employees, owners, consultancies, business professionals or other relevant business relations, News, forum, Blogs, Social networks or similar.
  • business information systems such as company web pages , company data bases, private and public registers , search engines and business applications, users, employees, owners, consultancies, business professionals or other relevant business relations, News, forum, Blogs, Social networks or similar.
  • the invention provides a method and system for viewing information from different sources having originally different format.
  • the object of the invention is to provide an information system usable for employees, owners, consultants or business contacts of a company, and optionally any person in need for a complete overview of a topic or entity, irrespectively if the information originates from any type of source, being a company information system such as company database, financial registers, public or private company registers, company catalogues, other business intelligence systems, company applications, search engines, or other relevant business systems.
  • the invention solves the above mentioned problems of prior art by structuring all the information presented in available data sources, and thereby providing a relation between information about for example, companies, products, services, trends, sentiments, connections between companies or other business related information, and thereby presenting an "world" overview were the relations between information and source becomes clearer.
  • the invention will also present a combined information overview of an object collected from different sources. A typical use will be to use the invention to aid in the search for businesses, products, persons or business opportunities.
  • the invention can be integrated with other company information solutions.
  • the method and system of the invention transforms data from multiple business sources into the same format which makes it possible to summarize, compare, see differences, statistics, trends and other relations between the information from the various sources and to enable combination of the total volume of information in a way that it represents the complete "world's" combined business information volume.
  • the method and system of the invention continuously read 600s all business sources 700, 710, 720, 730, 740, 750 comprising structured or unstructured information. All information in the search results then is filtered 500 and results in relevant business news. All entities are extracted 400 and the entities are mapped to «standardized» ID's 300 and stored in a database 200 of the invention. It is thus possible to summarize 100, see differences 120, statistics 140, trends 160 or other relation between the sources.
  • the method and system of the invention automatically transforms unstructured and structured business data into the same structure which makes it possible to summarize, compare, see differences, statistics, trends and other relations between the different sources, in other words a method and system which makes it possible to combine all business information together.
  • the method and system of the invention uses advanced search technology to crawl 600 business sources 700, 710, 720, 730, 740, 750 which is then filtered 600 and automatically extracting entities 400 which is then transformed to standardized ID's 300 which converts the data into a common structure and stored in a search data base 200.
  • the search database 200 offer a number of services where the information can be combined to provide for example, but not limited to: Weighted sum from all sources 100, Differences between sources 120, Statistics 140, Trends 160, Relations between sources, filtering and content 180, other combinations of information forms the bases for the information presentation form.
  • the invention may use machine learning, natural language processing, training sets, word vectors, stemming and other relevant techniques combined with synonyms, dictionaries, databases and language translators.
  • the method of the invention provides a method to transform data from multiple business and data sources to the same data structure to enable a user to summarize, compare, see differences, statistic, trends, and other relations between the sources by using modern search technology 600 together with filtering 500, entity extraction 400 and "mapping" 300 to a common structure 200 in a database which provides the possibilities to summarize, compare, see differences, statistic, trends and other relations between the sources 100-160.
  • crawler (600) modules which automatically read information from different business data sources and the like.
  • These sources can comprise both structured and unstructured information such as on or more of, but not limited to: company web sites 700, company databases, private or public registers 710, search engines, applications within business/company services 720, users, employees, owners, consultancies, business professionals or other relevant business relations 730, News 740 and Forum, Blogs, Social networks or similar 750.
  • the information is passing through a business relevance filter (500) module filtering out information such as, but not limited to: Business related terms and expressions as: Company name, products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content.
  • entity extraction (400) module for identification of entities such as, but not limited to company name, person name and title, industry, product, location, market, financial data or other business related entities.
  • a mapping module 300 will then map entities to standardized ID's.
  • the entities are mapped to a standardized unique ID 300 for each type of entity group in such a way that all information is stored on a common structure form in the database 200.From this common structure one or it is possible to derive relations such as, but not limited to: the weighted sum from all sources 100, the differences between the sources 120, statistics 140 and trends 160 or other relations or operations between the sources.
  • the system is further discussed in figure 2 where it is shown that business data from structured and unstructured sources 700-750 will be checked if they are "business relevant". Examples of techniques used are business name dictionaries, addresses, contact data, persons, industry, catalogues, dictionaries, languages modules, location, rules and learnings from read content.
  • entity extraction module 400 may be further optimized depending on what type of business data that shall be passed on to the transformation process in the entity extraction module 400.
  • entity extraction module 400 From the relevant business information it will be extracted entities from the text. This is done by for example machine learning, natural language processing (NLP), training set, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other text recognition technology 405.
  • entities which can be extracted may be one or more of, but not limited to: company name, person name and title 410, industry 415, product 420, location 425, market 430, financial data 435 or other business related entities.
  • the entities are then sent further to a mapping module 300 which standardizes the entities to unique ID's.
  • An example of an entity may be «smart mobile phones» and «smart telephone)) which both means the same, and thus will be mapped and associated with the same ID.
  • Other techniques may be used, such as industries SIC (Standard Industrial Classification) codes.
  • Search technology stemming is another method which identifies the base form of the word, yet another method is soundex which identifies the sound picture of the word, synonyms, NLP, vector representation of expressions, machine learning and known trainsets (300) are other examples.
  • FIG. 3 An embodiment of providing a weighted sum of all sources is presented in fig. 3, where the example make use of data from the database 200 comprising common structure for all sources.
  • Business information from different sources comprising company name and associated products can be summarized.
  • For each source a 3 dimensional representation 101 comprising company name ID as one axis and associated Product ID as the other axis is created, and the probability to belong to a marked is indicated by the height 102 of the displayed surface. It is also possible to combine this with the company industry code, location, financial figures or other business related parameters. This is done in the same way per different business source illustrated as different layers 103-105 in the figure. These can then be summarized 106 to show the sum of all probabilities within a given market. In addition one can weigh the different sources respective quality, size, feedback from users etc.
  • the height of the 3-D plane is the probability for a company or product to belong to a given market. It is possible to include other parameters which also have effect on the height, such as: Weight of source (trust score), and Size of source. This principle applies for all other properties and combinations of the information from the company or source.
  • the system may be comprising one or more crawler (600) modules, wherein the crawler modules are set up to search and fetch data from structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
  • structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
  • a filtering (500) module where the filter module are set up to filter out from the searched and fetched data from the crawler (600) modules terms and expressions such as, but not limited to:
  • Company name products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content,
  • entity extraction (400) module identifies entities such as, but not limited to: company name, person name and title, industry, product, location, market, financial data or other business related entities,
  • mapping (300) module for mapping entities to standard ID's
  • one or more output (100 - 160) modules for providing relational information between the data sources.
  • system further comprise a network service and a communication module, for providing communication between the system and a user.
  • the entity extraction module comprise an entity recognizer (405) module for further optimization of the searched and fetched data and recognition of relevant business information by for example, but not limited to: machine learning, natural language processing (NLP), training set, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other text recognition technologies.
  • entity recognizer 405
  • the relational information from the one or more output modules comprise one or more of, but not restricted to: a weighted sum from all sources(lOO), differences between sources (120), statistics(140) and trends (160) between the sources.
  • the system is comprised in a cloud service.
  • the method provides a common platform for representation of data from multiple data sources using the system defined in any of the previous claims, the method comprising performing the following steps:
  • crawlers from structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
  • structured and unstructured business and data sources such as, but not limited by: company web pages (700), company data bases, private and public registers (710), search engines and business applications (720), users, employees, owners, consultancies, business professionals or other relevant business relations (730), News (740) and forum, Blogs, Social networks (750);
  • filtering out from the searched and fetched data from the crawler (600) modules in a filtering (500) module, terms and expressions such as, but not limited to: Company name, products and services, locations, Language, turnover, result or other financial data, customers, competitors or other business relations, Market, Industry, contact data, sentiments, rules or other Business related content,
  • entity extraction (400) module identifying and extracting entities in an entity extraction (400) module, wherein the entities may comprise, but not limited to: company name, person name and title, industry, product, location, market, financial data and other business related entities,
  • mapping entities to standard ID's in a mapping (300)
  • the filtering operation is further set up to learn from previously filtered content.
  • the identifying and extracting entities operation is further set up to learn from previously identified and extracted content.
  • the identifying and extraction of entities operation may use techniques as machine learning, «natural language processing)) (NLP), training sets, word vectors, stemming and other relevant techniques combined with dictionaries, synonyms, databases, language translators or other relevant text recognition technologies (405).
  • NLP natural language processing
  • mapping of entities operation to standardized ID's can use techniques as, but not limited to: others ID standards, own standards, search technology as stemming, «soundex», synonyms, NLP, vector representation of expressions, machine learning and know training sets (300).
  • the common structure (200) of the sources may be used as a weighted sum of the business information from all sources to give a summarized possibility to belong to a given market based on a set of products(lOO).
  • a company name ID and product ID are set together in a 3 - dimensional plane (101) and where the height is the probability to belong to a given market given a set of products (102) or other relevant combination of a company properties.
  • the different planes (101) represents corresponding different business sources (103-105) and that these may be summarized (106) into one weighted probability for all sources.
  • the different sources may be weighted based on their quality, trust, reputation or other relevant parameters.
  • the probability to belong to a market given a set of products may depend on the company's official industry code, location, financial numbers and other business related parameters.
  • the output of a common structure (200) may show trends over time to develop relationships between companies, products, locations, market, financial strength and other business relations.
  • the output of a common structure (200) may show statistics over most common trends, most popular products and services, most popular companies, industries, locations, megatrends, technology development or other relevant relations.
  • the output for a common structure (200) may show differences between sources as e.g. based on locations, deviation from the normal, normal distributions, standard deviation, derived over time or similar.
  • the solution may be integrated as a part of other systems as company databases, financial registers, public company registers, company catalogues, other for dictionaries for companies, business applications, search engines and other relevant business systems.
  • the method may be integrated with mobile applications, tablets, «phablets» or other communication devices which uses the devices information about information about time, location, user, language, profile etc.
  • the total knowledge from the sources may be shown as different graphs.
  • the entities can be words, known sentences, relations between word or other text relations.
  • the search from different sources may be combined.
  • the output from the output modules (100 - 160) is communicated to a user.
  • the method is implemented as a cloud service.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP15799252.0A 2014-05-24 2015-05-26 Verfahren und system zur sammlung, umwandlung, speicherung und präsentation von daten aus mehreren datenquellen Withdrawn EP3149690A4 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NO20140649 2014-05-24
PCT/NO2015/050090 WO2015183098A1 (en) 2014-05-24 2015-05-26 Method and system for collecting, transforming, storing, and presentation of data from multiple data sources.

Publications (2)

Publication Number Publication Date
EP3149690A1 true EP3149690A1 (de) 2017-04-05
EP3149690A4 EP3149690A4 (de) 2017-11-01

Family

ID=54699326

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15799252.0A Withdrawn EP3149690A4 (de) 2014-05-24 2015-05-26 Verfahren und system zur sammlung, umwandlung, speicherung und präsentation von daten aus mehreren datenquellen

Country Status (2)

Country Link
EP (1) EP3149690A4 (de)
WO (1) WO2015183098A1 (de)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101713831B1 (ko) * 2016-07-26 2017-03-09 한국과학기술정보연구원 문서추천장치 및 방법
KR101931714B1 (ko) 2016-12-20 2018-12-26 주식회사 와이즈넛 유사문서 추천장치를 이용하여 문서로부터 개체명을 추출하는 개체명 인식시스템 및 인식방법
KR101962407B1 (ko) * 2018-11-08 2019-03-26 한전케이디엔주식회사 인공지능을 이용한 전자결재 문서 작성 지원 시스템 및 그 방법
CN111611484B (zh) * 2020-05-13 2023-08-11 湖南微步信息科技有限责任公司 一种基于物品属性识别的股票推荐方法及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9123006B2 (en) * 2009-08-11 2015-09-01 Novell, Inc. Techniques for parallel business intelligence evaluation and management
WO2012142158A2 (en) * 2011-04-11 2012-10-18 Credibility Corp. Visualization tools for reviewing credibility and stateful hierarchical access to credibility

Also Published As

Publication number Publication date
WO2015183098A1 (en) 2015-12-03
EP3149690A4 (de) 2017-11-01

Similar Documents

Publication Publication Date Title
US20210081611A1 (en) Methods and systems for language-agnostic machine learning in natural language processing using feature extraction
US11599714B2 (en) Methods and systems for modeling complex taxonomies with natural language understanding
US10902468B2 (en) Real-time, stream data information integration and analytics system
CN106250513B (zh) 一种基于事件建模的事件个性化分类方法及系统
CN104573054B (zh) 一种信息推送方法和设备
US9449271B2 (en) Classifying resources using a deep network
KR101605430B1 (ko) 문답 데이터베이스 구축 시스템 및 방법, 그리고 이를 이용한 검색 시스템 및 방법
CN111797210A (zh) 基于用户画像的信息推荐方法、装置、设备及存储介质
CN109145216A (zh) 网络舆情监控方法、装置及存储介质
CN110427480B (zh) 个性化文本智能推荐方法、装置及计算机可读存储介质
CN102279894A (zh) 基于语义的查找、集成和提供评论信息的方法及搜索系统
US9720982B2 (en) Method and apparatus for natural language search for variables
CN107092639A (zh) 一种搜索引擎系统
WO2014127673A1 (en) Method and apparatus for acquiring hot topics
EP3149690A1 (de) Verfahren und system zur sammlung, umwandlung, speicherung und präsentation von daten aus mehreren datenquellen
CN104462396B (zh) 字符串处理方法和装置
US9245010B1 (en) Extracting and leveraging knowledge from unstructured data
CN109710739A (zh) 一种信息处理方法及装置、存储介质
US20130297546A1 (en) Generating synthetic sentiment using multiple transactions and bias criteria
CN112100396A (zh) 一种数据处理方法和装置
CN110209659A (zh) 一种简历过滤方法、系统和计算机可读存储介质
Zaidi et al. Implementation and comparison of text-based image retrieval schemes
WO2015074493A1 (zh) 一种低频点击的过滤方法、装置、计算机程序以及计算机可读介质
Sharma et al. Tourview: Sentiment based analysis on tourist domain
Kanagasabai et al. Classification of massive mobile web log URLs for customer profiling & analytics

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20161222

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20171002

RIC1 Information provided on ipc code assigned before grant

Ipc: G06Q 30/02 20120101AFI20170926BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20180501