WO2021060967A1

WO2021060967A1 - A system and method for predictive analytics of articles

Info

Publication number: WO2021060967A1
Application number: PCT/MY2020/050056
Authority: WO
Inventors: Mohamed Farid Bin NOOR BATCHA; Duc Nghia PHAM
Original assignee: Mimos Berhad
Priority date: 2019-09-27
Filing date: 2020-07-23
Publication date: 2021-04-01

Abstract

The present invention relates to a system and method for predictive analytics of articles. The system (100) of the present invention comprising at least one Entity of Interest Query Module (102); at least one Corpus Generator Module (104); at least one Crawler Module (106); at least one Articles Analysis Engine (108); at least one Prediction Engine (110); at least one Trend Analysis Engine (112); at least one Suggestion Generation Module (114); and at least one Actual Outcome Module (116). The at least one Prediction Engine (110) for performing predictive analysis of articles for next course of action further comprises (400) at least one Sentiment Correlation Engine (402) for analysing sentiment of statement in articles and weighing the sentiments according to influence of entities through closeness of corpus relationship; and at least one Prediction Correlation Engine (404) for performing prediction by using parameters obtained from at least historical trend, global trend, relevant statements and updated documents in articles. The present invention provides a system and method for a user to extract multiple levels of sentiment from an article, extract the trend of related happenings, and provide a prediction of the next course of action due to an event.

Description

A SYSTEM AND METHOD FOR PREDICTIVE ANALYTICS OF ARTICLES

FIELD OF INVENTION

The present invention relates to a system and method for predictive analytics of articles. In particular, the present invention provides a system and method for a user to extract multiple levels of sentiment from an article, extract the trend of related happenings, and provide a prediction of the next course of action due to an event.

BACKGROUND ART

Availability of abundance of information and data over the internet has caused a huge amount of issues from blogs, forums or social sites on the perception of an entity which includes a person or an organization. It is therefore difficult to create intelligence from the huge amount of data to visualize an overall sentiment of the entities caused by many perspectives. Currently, an automated system for prediction of possible outcome upon processing the article is not available.

United States Patent Application Publication No. US 2013/0030981 A1 entitled “Stock Market Prediction Using Natural Language Processing” (hereinafter referred to as the US 981 A1 Publication) having a filing date of 5 October 2012 (Applicant: Herz Frederick SM) relates to a method of using natural language processing (NLP) techniques to extract information from online news feeds and then using the extracted information to predict changes in stock prices or Volatilities. In the invention as disclosed in US 981 A1 Publication keywords such as company names is recognized and simple templates describing the actions of the company is automatically filled using pattern matching on words or around the sentence containing the company name. In US 981 A1 Publication, prediction is performed based on information available in the given template and based on statistical pattern only. Further, in the US 981 A1 Publication, articles or news are updated by using a weighted textual attribute in order to determine similarities of the present news releases to those the previous one.

Chinese Patent Application Publication No. CN 106227802 entitled “Chinese natural language processing and multi-core classifier based multi-information-source stock price prediction method” (hereinafter referred to as the CN 802 Publication) having a filing date of 20 July 2016 (Applicant: UNIV GUANGDONG TECHNOLOGY) relates to data mining, machine learning and artificial intelligence, and more particularly to a keyword-based text analysis on the extracted emotion model score. The invention as disclosed in CN 802 Publication provides a stock price prediction method based on natural language processing and multi-core classifier, mainly for Chinese Language and a text sentiment dictionary and keyword based dictionary for the research report, which are predefined is utilized. Further, in the CN 802 Publication, articles are analyzed based on numerical and text-type variables. The invention as disclosed in the CN 802 Publication further utilizes Support Vector Machine for prediction analysis whereby for article trend analysis, an evaluation process to evaluate the performance of prediction through K-fold cross- validation method is introduced and further verifying the performance of prediction. Chinese Patent Application Publication No. CN 106384166 entitled “Deep learning stock market prediction method combined with financial news” (hereinafter referred to as the CN 166 Publication) having a filing date of 12 September 2016 (Applicant: SUN YAT-SEN UNIV) relates to a deep learning stock market prediction method combined with financial news. The prediction method as disclosed in the CN 166 Publication comprises steps of first using web crawling technology for financial news, to crawl for relevant financial information related to stocks from Sina Finance News and Netease Financial News; processing financial news information and conduct news sentiment analysis; using Recurrent Neural Network, RNN deep learning network of historical trained data for prediction; training the feature extraction; and performing model training and prediction. Further, in CN 166 Publication, articles are analyzed based on a number of positive words against a number of negative words whereby web crawler is used to crawl for related data and thereafter processing the information based on sentiment analysis only.

With reference to the above-mentioned disclosures, there is indeed a need for a system and method that is able to automatically predict the outcome after processing an article considering the enormous amount of information on a perception of an entity.

SUMMARY OF INVENTION

One aspect of the invention provides a system (100) for predictive analytics of articles. The system (100) comprising at least one Entity of Interest Query Module (102) for analysing an entity of interest received from a user; at least one Corpus Generator Module (104) for collecting data relating to an entity of interest; at least one Crawler Module (106) for crawling information on the entity of interest on provided sources of information continuously and crawling keywords defined in the at least one Corpus Generator Module (104) for latest updates in articles; at least one Articles Analysis Engine (108) for analysing information in articles received from the at least one Crawler Module (106); at least one Prediction Engine (110) for performing predictive analysis of articles for next course of action; at least one Trend Analysis Engine (112) for analysing articles based on historical trend and global trend; at least one Suggestion Generation Module (114) for suggesting to user on next course of action predicted; and at least one Actual Outcome Module (116) for providing feedback on actual outcome. The at least one Prediction Engine (110) for performing predictive analysis of articles for next course of action further comprises (400) at least one Sentiment Correlation Engine (402) for analysing sentiment of statement in articles and weighing the sentiments according to influence of entities through closeness of corpus relationship; and at least one Prediction Correlation Engine (404) for performing prediction by using parameters obtained from at least historical trend, global trend, relevant statements and updated documents in articles.

Another aspect of the invention provides that the Articles Analysis Engine (108) for analysing information in articles received from the at least one Crawler Module (106) further comprises (200) at least one Statement Extraction Module (202) for extracting statements using machine learning tool; at least one Statement Entity Relation Module (204) for associating a respective entity to a statement; at least one Statement Weightage Module (206) for analysing weightage of statement according to influence of entities through closeness of corpus relationship; at least one Statement Sentiment Module (208) for analysing sentiment of statement according to influence of entities through closeness of corpus relationship; at least one Article Categorisation Module (210) for categorising articles using rule based technique; at least one Topic Sentiment Module (212) for analysing sentiment of each article by grouping of articles to its respective category based on nouns extracted and matching articles to a predefined topic grouping; at least one Duplicate Article Filter Module (214) for filtering duplicate articles; and at least one Update Detection Module (216) for updating articles that have new updates to a recent issue and linking to a previous outcome that was observed.

A further aspect of the invention provides that the at least one Trend Analysis Engine (112) for analysing articles based on historical trend and global trend further comprises (300) at least one Global Analysis Module (302) for extracting global parameters from global corpus related to category of article; at least one Feedback Monitoring Module (304a) for providing feedback of actual outcome; and at least one Trend Monitor Module (304) for applying weightage to trend based on importance of category.

Another aspect of the invention provides a method (500) for predictive analytics of articles. The method comprising steps of determining if input from user is available (502); proceeding to step (514) if input from user is not available; obtaining user query for entity of interest if input from user is available (504, 504a); determining if an entity corpus has been built for entity of interest (506); building and updating the entity corpus with keywords if the entity corpus has yet to be built (508, 510); selecting and updating the entity corpus with keywords if the entity corpus has been built (512); extracting keywords from a predefined or a user defined URL's for corpus generation and crawling of necessary articles related to keywords of corpus (514); extracting metadata of articles (518) upon receipt of articles (516); determining if timestamp on article is current (520); extracting keywords from a predefined or a user defined URL's for corpus generation and crawling of necessary articles related to keywords of corpus (514) if timestamp of article is not current and reiterate step (516) and step (518); performing duplicate record detection filtering (522) if the timestamp on article is current by retrieving article weightage from table depending on article source domain (524) and performing sentiment analysis on article and scaled by article weight (526a, 526b); performing keyword extraction (528a), article categorization (528b), summarizing article (530) and updating detection of article (532); extracting a statement from article (534a); associating respective entity to the statement (534b); analyzing sentiment of the statement and the statement is weighted according to entities of influence based on closeness of corpus relationship (534c); performing trend analysis (536); and performing prediction and suggesting to user on next course of action predicted (538). The step for performing prediction and suggesting to user on next course of action predicted (538) further comprises steps of (900) determining if statement is from Master Entity keyword (902); increasing relevance if statement is from Master Entity keyword (904); reducing relevance if statement is not from Master Entity keyword (906); adjusting weightage according to relevance (908) and producing statement positivity weight (916); determining if update is available (910); using previous outcome with high weightage (914) based on recent prediction and outcome database (912); processing decision based aggregation (926) from trend positivity (920), article positivity weight (918), historical trend weight (922) and global trend weight (924); and providing suggested outcome based on prediction (928).

A further aspect of the invention provides that the step for performing duplicate record detection filtering (522) further comprises steps of (600) determining timestamp of article (602); querying a list of historical articles titles from a database of articles (602a) in the last X days (604); performing article comparison on document similarity using known algorithms (606); determining if article is a duplicate by performing thresholding comparison (608); continuing with analysis of article if article is not a duplicate (610); and discarding article if article is found to be a duplicate resulting from thresholding comparison (612).

Yet another aspect of the invention provides that the step for updating detection of article (532) further comprises steps of (700) querying article summary (702) from a database of articles for summarizing similarity of articles (7004); comparing percentage of similarity of keyword by determining if percentage of similarity is above threshold (706); confirming article update is false if percentage of similarity of keyword is below threshold (718); performing timestamp comparison if percentage of similarity is above threshold (708); and determining if timestamp is too far (710); and determining if article is of a same category if timestamp is not too far (712) and confirming article update is correct if articles are of the same category with recent predictions and outcome (714) from recent predictions and outcome database (716) else confirming article update is false if articles are not of the same category (718); identifying article as a new article if timestamp is too far (720); and confirming article update is false (718).

Still another aspect of the invention provides that the step for performing trend analysis (536) further comprises steps of (800) extracting global parameters related to article category (804) from global entity parameters database (802); querying article related to global parameters (806); extracting time zone of article from metadata (808); determining if article is earlier in time zone (810); comparing article with previous categories showing similar sentiments (822, 820) upon undergoing relevance filter (812), noun extraction (814), article categorization (816) and sentiment analysis (818); and applying weightage to trend positivity based on category of importance (824). The present invention consists of features and a combination of parts hereinafter fully described and illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.

BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS

To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings.

FIG. 1.0 illustrates a general architecture of the system block diagram of the present invention.

FIG. 1.Oa illustrates an example of an entity corpus.

FIG. 1.Ob illustrates an example of a global corpus.

FIG. 1.Oc illustrates an example of a neural network.

FIG. 2.0 illustrates a detailed block diagram of the Article Analysis Engine of the present invention.

FIG. 3.0 illustrates a detailed block diagram of the Trend Analysis Engine of the present invention.

FIG. 4.0 illustrates a detailed block diagram of the Prediction Engine of the present invention.

FIG. 5.0 is a flowchart illustrating the general methodology of the present invention.

FIG. 6.0 is a flowchart illustrating the steps of performing duplicate record detection filtering. FIG. 7.0 is a flowchart illustrating the steps of updating detection of article.

FIG. 8.0 is a flowchart illustrating the steps of performing trend analysis.

FIG. 9.0 is a flowchart illustrating the steps of performing prediction and suggesting to user on next course of action predicted. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention relates to a system and method for predictive analytics of articles. In particular, the present invention provides a system and method for a user to extract multiple levels of sentiment from an article, extract the trend of related happenings, and provide a prediction of the next course of action due to an event. Hereinafter, this specification will describe the present invention according to the preferred embodiments. It is to be understood that limiting the description to the preferred embodiments of the invention is merely to facilitate discussion of the present invention and it is envisioned without departing from the scope of the appended claims.

According to Cambridge Business English Dictionary by Cambridge University Press, article is defined as a piece of writing on a particular subject in a newspaper or magazine, or on the internet. An event may be occurred in the article. Event is an occurrence happening at a determinable time and place, with or without the participation of human agents. It may be a part of a chain of occurrences as an effect of a preceding occurrence and as the cause of a succeeding occurrence. A sentiment analysis is used in analytics of articles to allow us to obtain an overview of the wider public opinion for certain topics or events that have been discussed.

Reference is first made to FIG. 1 .0 which illustrates a general architecture of the system block diagram of the present invention. As illustrated in FIG. 1.0, the system (100) for predictive analytics of articles comprising at least one Entity of Interest Query Module (102) for analysing an entity of interest received from a user; at least one Corpus Generator Module (104) for collecting data relating to an entity of interest; at least one Crawler Module (106) for crawling information on the entity of interest on provided sources of information continuously and crawling keywords defined in the at least one Corpus Generator Module (104) for latest updates in articles; at least one Articles Analysis Engine (108) for analysing information in articles received from the at least one Crawler Module (106); at least one Prediction Engine (110) for performing predictive analysis of articles for next course of action; at least one Trend Analysis Engine (112) for analysing articles based on historical trend and global trend; at least one Suggestion Generation Module (114) for suggesting to user on next course of action predicted; and at least one Actual Outcome Module (116) for providing feedback on actual outcome.

Reference is now made to FIG. 2.0 which illustrates a detailed block diagram of the Article Analysis Engine of the present invention. As illustrated in FIG. 2.0, the Articles Analysis Engine (108) for analysing information in articles received from the at least one Crawler Module (106) further comprises (200) at least one Statement Extraction Module (202) for extracting statements using machine learning tool; at least one Statement Entity Relation Module (204) for associating a respective entity to a statement; at least one Statement Weightage Module (206) for analysing weightage of statement according to influence of entities through closeness of corpus relationship; at least one Statement Sentiment Module (208) for analysing sentiment of statement according to influence of entities through closeness of corpus relationship; at least one Article Categorisation Module (210) for categorising articles using rule based technique; at least one Topic Sentiment Module (212) for analysing sentiment of each article by grouping of articles to its respective category based on nouns extracted and matching articles to a predefined topic grouping; at least one Duplicate Article Filter Module (214) for filtering duplicate articles; and at least one Update Detection Module (216) for updating articles that have new updates to a recent issue and linking to a previous outcome that was observed. At least one Trend Analysis Engine (112) is used for categorising the article based on the output obtained from Article Categorisation Module (210) and Topic Sentiment Module (212). Based on the data obtained from Article Sentiment Module (218), it will determine the predicted output or the next possible outcome for the article by performing prediction by Prediction Engine (110).

Reference is now made to FIG. 3.0 which illustrates a detailed block diagram of the Trend Analysis Engine (112) of the present invention. As illustrated in FIG. 3.0, the at least one Trend Analysis Engine (112) for analysing articles based on historical trend and global trend further comprises (300) at least one Global Analysis Module (302) for extracting global parameters from global corpus related to category of article and for monitoring the impact of foreign nations (prior time zones) to the article; at least one Trend Monitor Module (304) for providing feedback of actual outcome and comparing category and sentiment of the article and at least one weightage of articles by category (306) is applied to the article trend based on importance of category before the actual outcome of the article is produced by Actual Outcome Module (116).

Reference is now made to FIG. 4.0 which illustrates a detailed block diagram of the Prediction Engine of the present invention. As illustrated in FIG. 4.0, the at least one Prediction Engine (110) for performing predictive analysis of articles for next course of action further comprises (400) at least one Sentiment Correlation Engine (402) for analysing sentiment of statement in articles for master entity influential statement’s sentiment and weighing the sentiments of entire articles according to influence of entities through closeness of corpus relationship; and at least one Prediction Correlation Engine (404) for performing prediction by data from previous prediction outcome of article (is the article update is true), categorizing article based on historical trend and analysing current positivity of the article.

The system of the present invention provides a user interface where user is able to search for an entity of interest including a company name, a specific product, tax policy, or even fuel price, and gather relevant data on the search query and predict the impact to other entities in the future. Reference is now made to FIG. 1.0a, FIG. 1.0b and FIG. 1.0c respectively. FIG. 1 .Oa illustrates an example of an entity corpus while FIG. 1 .0b illustrates an example of a global corpus and FIG. 1.0c illustrates an example of a neural network. FIG. 1.0a illustrates an example of a corpus for Air Asia. As illustrated in FIG. 1.0a, the corpus comprises factors that contribute to the company which includes statements from major shareholders, fuel price hike, tourism tax, airport tax, GST, recent air incidents involving the company, statements from directors, political or geographical incidents in the destination countries. Further, booking information could reveal the seat availability for each destination, for monitoring of the popularity of the airline in terms of choice. Other factors such as external factors affecting the company including currency exchange due to government credit ratings, as well as weather forecast are important elements in the stock market of the company. Once a related article has been crawled, a metadata of the article is extracted. A timestamp information is checked to ensure that the crawled article is recent. The article will go through two filters which are duplicate record detection filter and article update detection filter. The duplicate record detection filter, checks if the article has been processed recently, while the article update detection check if the newly crawled article is an update of another recent article. Various levels of importance are provided to articles from different domains. Domains or specific social media accounts set by user or predefined by user or administrator or social media account of users whose names are mentioned in the corpus will be allocated with the highest weightage, and subsequently followed by news sites, or official company sites and finally to random blogs and social media. Each article that has been crawled will be categorized accordingly. The categorization process uses the nouns extracted from the article to fill a set of rules that will match the selected category.

Example of an article on Air Asia:

Higher fuel prices to pressure AirAsia’s earnings

“KUALA LUMPUR: CIMB Equities Research expects higher fuel prices to pressure Air Asia Group Bhd’s (AAGB) earnings as it only hedged about 12% of its FY18F jet fuel needs at US$68.55 per barrel" Read more at hltpsY/www. ihestar com m y/business/business- news/2018/05/25/hiaher-fuei-Dhces-lo-Dressure -airasia-earninos/#ccFQSAaPWK8RruKX.99 The above segment of the article will be categorized under a category of “FUEL”. The sentiment of the article is extracted and saved into a database of that category. For example: Negative Sentiment on category of “FUEL” with reason of “Price Increase” and assuming the result of this article, the share price dipped 3%.

As described, this type of categorization helps build a historical trend of events, to ensure that if a similar trend occurred in the future, where fuel price was hiked and sentiment was negative, the system would predict a dip in around “3% x delta” for ‘X’ days. In general, trend analysis takes in two consideration, firstly the historical trend and secondly the global trend.

Reference is now made to FIG. 1.0b which illustrates an example of a global corpus. The global trend requires the timestamp metadata of crawled data from a global corpus, which requires an additional relevance filter to fit the relevance of the entity corpus. Article summarization is in Article update detection filter. Statements in the articles are also given special attention as statements are extracted, analyzed and linked back to the person, and if the person is in the corpus list, the statements positivity will carry a higher weightage as it has become influential. The parameters retrieved from the historical trend, statement weightage, article update detection, sentiment of article, will be used in the prediction algorithm to suggest to the user the possible next outcome. The prediction algorithm adjusts the weightages of the parameters and computes the aggregation. The “Decision based Aggregation” could use a simple hard decision , soft decision or a more complex neural network model that could be updated as the system accumulates more historical data and the feedbacks received to improve accuracy over time.

As illustrated in FIG. 1 .0c, Information from the Statement, Article update, Article positivity, historical trend, and global trend will be fed to a decision based aggregation to compute using a neural network model that could train its weights over time from historical data. The weights in the neural network would be able to update in a smooth manner over time for an improved accuracy.

Reference is now made to FIG. 5.0 which is a flowchart illustrating the general methodology of the present invention. As illustrated in FIG. 5.0, the method (500) for predictive analytics of articles of the present invention is first initiated by determining if input from user is available (502); and proceeding to step (514) if input from user is not available. If input from user is available, user query is obtained for an entity of interest (504, 504a). Thereafter, it is determined if an entity corpus has been built for an entity of interest (506). A user is able to search for an entity of interest and the user is provided with the option to set some predefined URL’s and keywords (504a). If the corpus of the entity of interest has been built, the new keywords provided by the user and the new pronouns detected from the URL provided will be selected, extracted and updated into the entity corpus (512). Else, proceed to build and update the entity corpus with keywords if the entity corpus has yet to be built (508, 510). Subsequently, keywords are extracted from a predefined or a user defined URL's for corpus generation and from the keywords extracted, crawler robots are launched to crawl necessary articles related to keywords of corpus from various sites including the predefined sites (514). The articles extracted from the predefined sites will have higher importance compared to those that are not from the predefined sites. Metadata of articles (518) are extracted upon receipt of articles (516). Thereafter, determining if the timestamp on article is current (520) and keywords from the predefined or a user defined URL's are extracted for corpus generation and crawling of necessary articles related to keywords of corpus (514) if the timestamp of article is not current and reiterate step (516) and step (518) accordingly. Duplicate record detection filtering is performed (522) if the timestamp on article is current by retrieving article weightage from table depending on article source domain (524) and performing sentiment analysis on article and scaled by article weight (526a, 526b). The sentiment of each article is analyzed while grouping the articles to its respective category based on the nouns extracted and matching to the predefined topic grouping. The sets of data consisting of the sentiment and article category are saved to a database. For example “Negative” sentiment for category “FUEL”. This would basically indicate that an increase in fuel has impacted the turnover of the company. Whereas if the sentiment was positive for the same category, would indicate a savings for the company in term of fuel price drop.

Simultaneously, keyword extraction (528a), article categorization (528b) and article is summarized (530) and detection of article is updated (532). A statement from article is extracted (534a) by associating respective entity to the statement entity relation (534b); and analyzing sentiment of the statement and the statement is weighted according to entities of influence based on closeness of corpus relationship (534c). Thereafter, trend analysis (536) and prediction are performed and user is provided with suggestion on next course of action predicted (538).

Reference is now made to FIG. 6.0 which is a flowchart illustrating the steps of performing duplicate record detection filtering. Duplicate record or articles are filtered in order to reduce processing power and articles that have new updates to a recent issue will have to be linked to the previous outcomes that was observed. As illustrated in FIG. 6.0, in performing duplicate record detection filtering, timestamp of article is first determined (602) and a list of historical articles titles from a database of articles (602a) in a last X number of days (604) are queried. Thereafter, article comparison on document similarity is performed using known algorithms such cosine similarity or Euclidean distance to compare the documents (606). It is further determined if article is a duplicate by performing thresholding comparison (608) and continue with analysis of article as represented by yes option, if article is not a duplicate (610). Else, article is discarded as represented by no option, if article is found to be a duplicate resulting from thresholding comparison (612).

Reference is now made to FIG. 7.0 which is a flowchart illustrating the steps of updating detection of article. As illustrated in FIG. 7.0, article summary (702) is queried from a database of articles for summarizing similarity of articles using e.g Euclidean distance (704) and percentage of similarity of keyword is compared by determining if percentage of similarity is above threshold (706). Thereafter, it is confirmed that article update is false if percentage of similarity of keyword is below threshold (718). Timestamp comparison is performed if percentage of similarity is above threshold (708). It is further determined if timestamp is too far (710), if the timestamp is too far apart, the outcome correlation may vary significantly and therefore will be treated as a new article (720). It is determined if article is of a same category if timestamp is not too far (712) and further confirming article update is correct if articles are of the same category with recent predictions and outcome (714) from recent predictions and outcome database (716). Else, it is confirmed article update is false if articles are not of the same category (718). Thereafter, article is identified as a new article if timestamp is too far (720) and further confirming article update is false (718).

Reference is now made to FIG. 8.0 which is a flowchart illustrating the steps of performing trend analysis. As illustrated in FIG. 8.0, performing trend analysis (536) further comprises steps of (800) first extracting global parameters related to article category (804) from global entity parameters database (802). Thereafter, article related to global parameters is queried (806) and time zone of article is extracted from metadata (808). The timestamp information of the articles are extracted and checked to see if the time zone is earlier. This is used in scenarios where for example if in USA the Federal Reserve has increased the interest rates, it will have an impact to Malaysian stock exchange the following working day. Therefore due to difference in time zone, this information could be used to help predict the impact to a selected entity. Subsequently, it is determined if article is earlier in time zone (810) and article is compared with previous categories showing similar sentiments (822, 820) upon undergoing relevance filter (812), noun extraction (814), article categorization (816) and sentiment analysis (818). Then weightage is applied to trend positivity based on category of importance (824). The relevance filter (812) is required to only extract information that is related to the entity of interest. Since the global parameters could be of many topics, the topics that fulfil the entity of interest from the article category will be used as one of the criteria for relevance filtering. The global articles extracted are then categorized and sentiment analysis is performed. The information is then stored in a database, for historical reference. Historical trends are also retrieved from the database to analyze if a scenario similar has occurred in the past. For example if the price of fuel increase has dropped the market share of a company by X% in the past Y months, we could use this historical outcome as one of the parameters to predict the future outcome with similar category.

Reference is now made to FIG. 9.0 which is a flowchart illustrating the steps of performing prediction and suggesting to user on next course of action predicted. As illustrated in FIG. 9.0, in performing prediction and suggesting to user on next course of action predicted (538) further comprises steps of (900) first determining if statement is from Master Entity keyword (902). Statements extracted from the article usually play a high role in a company performance. If a distinct person made a negative remark on a company, the share price will be at risk of being impacted negatively. If a distinct personnel has agreed to award a new contract to a company, this in return would increase the share price. The relevance of the personnel is weighted for each statement. Therefore, relevance is increased if statement is from Master Entity keyword (904) and relevance is reduced if statement is not from Master Entity keyword (906). Thereafter, weightage is adjusted according to relevance (908) and statement positivity weight is produced (916). It is further determined if update is available (910) and using previous outcome with high weightage (914) based on recent prediction and outcome database (912). Decision based aggregation is processed (926) if the update is not available from trend positivity (920), article positivity weight (918), historical trend weight (922) and global trend weight (924) using either a simple hard decision, or a neural network model that could train its weights over time from historical data. The weights in the neural network would be able to update in a smooth manner over time to improve accuracy.

Finally, suggested outcome is provided based on prediction (928).

The present invention assist a user to extract multiple levels of sentiment from an article, extract the trend of related happenings, and provide a system and method where the next course of action is predicted due to an event.

Unless the context requires otherwise or specifically stated to the contrary, integers, steps or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements. Throughout this specification, unless the context requires otherwise, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated step or element or integer or group of steps or elements or integers, but not the exclusion of any other step or element or integer or group of steps, elements or integers. Thus, in the context of this specification, the term “comprising” is used in an inclusive sense and thus should be understood as meaning “including principally, but not necessarily solely”.

Claims

1 . A system (100) for predictive analytics of articles, the system (100) comprising: at least one Entity of Interest Query Module (102) for analysing an entity of interest received from a user; at least one Corpus Generator Module (104) for collecting data relating to the entity of interest; at least one Crawler Module (106) for crawling information on the entity of interest on provided sources of information continuously and crawling keywords defined in the at least one Corpus Generator Module (104) for latest updates in articles; at least one Articles Analysis Engine (108) for analysing information in articles received from the at least one Crawler Module (106); at least one Prediction Engine (110) for performing predictive analysis of articles for next course of action; at least one Trend Analysis Engine (112) for analysing articles based on historical trend and global trend; at least one Suggestion Generation Module (114) for suggesting to user on next course of action predicted; and at least one Actual Outcome Module (116) for providing feedback on actual outcome, characterized in that the at least one Prediction Engine (110) for performing predictive analysis of articles for next course of action further comprises (400): at least one Sentiment Correlation Engine (402) for analysing sentiment of statement in articles and weighing the sentiments according to influence of entities through closeness of corpus relationship; and at least one Prediction Correlation Engine (404) for performing prediction by using parameters obtained from at least historical trend, global trend, relevant statements and updated documents in articles.

2. The system (100) according to Claim 1 , wherein the Articles Analysis Engine (108) for analysing information in articles received from the at least one Crawler Module (106) further comprises (200): at least one Statement Extraction Module (202) for extracting statements using machine learning tool; at least one Statement Entity Relation Module (204) for associating a respective entity to a statement; at least one Statement Weightage Module (206) for analysing weightage of statement according to influence of entities through closeness of corpus relationship; at least one Statement Sentiment Module (208) for analysing sentiment of statement according to influence of entities through closeness of corpus relationship; at least one Article Categorisation Module (210) for categorising articles using rule based technique; at least one Topic Sentiment Module (212) for analysing sentiment of each article by grouping of articles to its respective category based on nouns extracted and matching articles to a predefined topic grouping; at least one Duplicate Article Filter Module (214) for filtering duplicate articles; and at least one Update Detection Module (216) for updating articles that have new updates to a recent issue and linking to a previous outcome that was observed.

3. The system (100) according to Claim 1 , wherein the at least one Trend Analysis Engine (112) for analysing articles based on historical trend and global trend further comprises (300): at least one Global Analysis Module (302) for extracting global parameters from global corpus related to category of article; at least one Trend Monitor Module (304) for providing feedback of actual outcome; and at least one weightage of articles by category (306) is applied to the article trend based on importance of category before the actual outcome of the article is produced by Actual Outcome Module (116).

4. A method (500) for predictive analytics of articles, the method comprising steps of: determining if input from user is available (502); proceeding to step (514) if input from user is not available; obtaining user query for entity of interest if input from user is available (504,

504a); determining if an entity corpus has been built for entity of interest (506); building and updating the entity corpus with keywords if the entity corpus has yet to be built (508, 510); selecting and updating the entity corpus with keywords if the entity corpus has been built (512); extracting keywords from a predefined or a user defined URL's for corpus generation and crawling of necessary articles related to keywords of corpus (514); extracting metadata of articles (518) upon receipt of articles (516); determining if timestamp on article is current (520); extracting keywords from a predefined or a user defined URL's for corpus generation and crawling of necessary articles related to keywords of corpus (514) if timestamp of article is not current and reiterate step (516) and step (518); performing duplicate record detection filtering (522) if the timestamp on article is current; retrieving article weightage from table depending on article source domain (524) and performing sentiment analysis on article and scaled by article weight (526a, 526b); performing keyword extraction (528a), article categorization (528b), summarizing article (530) and updating detection of article (532); extracting a statement from article (534a); associating respective entity to the statement (534b); analyzing sentiment of the statement and the statement is weighted according to entities of influence based on closeness of corpus relationship (534c); performing trend analysis (536); and performing prediction and suggesting to user on next course of action predicted (538), characterized in that performing prediction and suggesting to user on next course of action predicted (538) further comprises steps of (900): determining if statement is from Master Entity keyword (902); increasing relevance if statement is from Master Entity keyword (904); reducing relevance if statement is not from Master Entity keyword (906); adjusting weightage according to relevance (908) and producing statement positivity weight (916); determining if update is available (910); using previous outcome with high weightage (914) based on recent prediction and outcome database (912) if the update is available; processing decision based aggregation (926) from trend positivity (920), article positivity weight (918), historical trend weight (922) and global trend weight (924) if the update is not available; and providing suggested outcome based on prediction (928).

5. The method (500) according to Claim 4, wherein performing duplicate record detection filtering (522) further comprises steps of (600): determining timestamp of article (602); querying a list of historical articles titles from a database of articles (602a) in the last X days (604); performing article comparison on document similarity using known algorithms (606); determining if article is a duplicate by performing thresholding comparison (608); continuing with analysis of article, if article is not a duplicate (610); and discarding article, if article is found to be a duplicate resulting from thresholding comparison (612).

6. The method (500) according to Claim 4, wherein updating detection of article (532) further comprises steps of (700): querying article summary (702) from a database of articles for summarizing similarity of articles (704); comparing percentage of similarity of keyword by determining if percentage of similarity is above threshold (706); confirming article update is false if percentage of similarity of keyword is below threshold (718); performing timestamp comparison if percentage of similarity is above threshold (708); determining if timestamp is too far (710); determining if article is of a same category if timestamp is not too far (712); confirming article update is correct (714) if articles are of the same category with recent predictions and outcome from recent predictions and outcome database (716); else confirming article update is false if articles are not of the same category (718); identifying article as a new article if timestamp is too far (720); and confirming article update is false (718).

7. The method (500) according to Claim 4, wherein performing trend analysis (536) further comprises steps of (800): extracting global parameters related to article category (804) from global entity parameters database (802); querying article related to global parameters (806); extracting time zone of article from metadata (808); determining if article is earlier in time zone (810); comparing article with previous categories showing similar sentiments (822, 820) upon undergoing relevance filter (812), noun extraction (814), article categorization (816) and sentiment analysis (818) if the article is earlier in time zone; and applying weightage to trend positivity based on category of importance (824).