CN111433806A - System and method for analyzing crowd funding platform - Google Patents

System and method for analyzing crowd funding platform Download PDF

Info

Publication number
CN111433806A
CN111433806A CN201880078251.8A CN201880078251A CN111433806A CN 111433806 A CN111433806 A CN 111433806A CN 201880078251 A CN201880078251 A CN 201880078251A CN 111433806 A CN111433806 A CN 111433806A
Authority
CN
China
Prior art keywords
data
company
loan
model
platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201880078251.8A
Other languages
Chinese (zh)
Inventor
金·威尔士
朱利安·比蒂
哈拉尔德·弗罗斯特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Claude Biro
Original Assignee
Claude Biro
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Claude Biro filed Critical Claude Biro
Publication of CN111433806A publication Critical patent/CN111433806A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0279Fundraising management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a system and a method for analyzing crowd funding platforms. The method comprises the following steps: connecting to a plurality of individual loan platforms using an electronic device; and retrieving loan book data from each of the individual loan platforms; storing the loan book data using a memory coupled to the electronic device, wherein the loan book data comprises metadata generated in a structured query language database, and wherein the metadata comprises a list of names and data attributes of platforms associated with the loan book data. The method further comprises the following steps: converting, using a processor coupled to the electronic device, the loan book data from each platform such that the converted loan book data uses common data; reading, using the processor, the converted loan book data; and documenting the destination unified data attributes for each pair of platform and attributes.

Description

System and method for analyzing crowd funding platform
Priority declaration
This application is a PCT international non-provisional application and claims priority from us provisional patent application 62/568,105 filed on 2017, 10, 4, which is hereby incorporated by reference in its entirety.
Technical Field
The present invention relates to loan analysis, and more particularly to analyzing data for point-to-point loans and crowd funding platforms.
Background
From major street stores to high-tech startup companies, two thirds of new jobs have been created by small and medium-sized businesses in the united states for decades. The ability of individuals to pursue ideal, create companies, and develop business is the basis of the U.S. economy.
The obama government has attempted to benefit all americans by the 'pioneer enterprise financing Act' of 2012, which allows securities crowd funding (stocks and debts) to be conducted online through intermediaries (broker-traders or registered financing platforms), ensuring the benefits of the us economy from the continuous resurgence of the 2008 financial crisis. This measure prompted another 40 countries to modify the securities laws to cope with this crisis. Importantly, consumers and small and medium-sized businesses can widely obtain secure and affordable credit and equity financing. Without capital development, entrepreneurs are unable to move the innovative ideas into action. Without sufficient funds, americans are unable to develop their enterprises to create new jobs and opportunities for the next generation.
Crowd funding has become very popular since the first donation and reward based platform in 2009, the first peer-to-peer loan platform introduced by Zopa platform in the united kingdom, prosperous market corporation in 2007 to Kickstarter, usa. This "financing democratization" gives entrepreneurs and innovators an opportunity to raise significant funds from individuals and institutions around the world, bypassing the traditional method of financing from existing relationships of friends, family and investors. Kickstarter, Indicogo, and GoFundMe are common names that bring about billions of dollars in returns and donations. These crowd funding platforms are only a small segment of the world's rapidly growing business. If a person plans to perform crowd funding activities, the person may first turn to one of the platforms.
The professor employees, alumni and students of the university in the united states also begin to use these new mechanisms to fund charges, projects and enterprises through an exclusive crowd-funding platform sponsored by schools.
Most crowd funding platforms can be assigned to the four crowd funding categories listed below, although the business models in these groups below are sometimes very different, the following is an overview of each group. For example, in the mass-cast category, there is a large difference between business models, depending on which part of the JOBS act is being utilized. Note that one or more models may be employed in order to create a "graduation" model that acts as an incubator throughout the life cycle of a project or business.
Crowd funding definition
1. Donation to the masses: a fund donation refers to a donation without a directly quantifiable reward or benefit. Examples include social, charitable, and cultural items. Mass donations may also be used to raise funds for political sports. For the success of mass donations, emotional ties must be established and maintained between capital providers and recipients.
2. The public rewards: the crowd rewards include creative cultural items and sports items. However, commercial items may also fall into this category. Through such financing, the patron may receive additional benefits (e.g., rewards) in the form of products, artwork, or services. The creativity of the parties seeking funds is unlimited.
3. Crowd investments (equity/debt): the focus placed by the public is not to fund the project, but to purchase corporate equity (common stock) or debt (e.g., convertible notes, mini-bonds, etc.). Mass investment also provides investors with limited investment opportunities to support the development of pioneer enterprises, small and medium enterprises, or lifestyles. In return, these investors will receive shares from the company or interest reimbursement based on certain terms. In the case of equity investments, these tend to be silent partnership relationships where the investor has no or limited voting right.
4. Crowd loan/point-to-point loan: the crowd loan mainly refers to loan (borrowing funds) financing of companies or individuals (such as life style, school loan, real estate, automobiles and the like). In return for the loan, the borrower may wish to have their investment in risk adjusted return. As product and business models have evolved, the investor base of online market lenders has expanded to institutional investors, hedge funds and financial institutions.
Depending on the country, securities-based crowd funding includes selling shares (common stocks) and all forms of credit including, but not limited to, mini-bonds, point-to-point loans, convertible instruments, and the like.
The next section will outline the main business model in online point-to-point loans and the structure used to fund the activity.
Companies in the industry have developed three main business models: (1) direct borrowers who offer loans to hold in their own portfolio, commonly referred to as balance sheet borrowers: (2) a platform borrower who cooperates with an issuing and depositing institution to issue loans funded by all types of borrowers and then, in some cases, purchases loans sold to investors as a whole loan or by issuing securities such as bills of dependent members; and (3) a third business model including the above and illustrating the right to transfer and obligations in securitization.
Direct borrowers who do not rely on the depository issuing a loan often need to obtain a license from each state in which the loan is located. A direct borrower who uses a national loan permit to directly issue loans is not under the supervision of the federal bank regulatory agency, except where the borrower may be under the CFPB regulations.
Disclosure of Invention
According to one aspect of the invention, a method for analyzing a crowd-funding platform is provided. The method comprises the following steps: connecting to a plurality of individual loan platforms using an electronic device; and retrieving loan book data from each individual loan platform; storing the loan book data using a memory coupled to the electronic device, wherein the loan book data comprises metadata generated in a structured query language database, and wherein the metadata comprises a list of names and data attributes of platforms associated with the loan book data. The method further comprises the following steps: converting, using a processor coupled to the electronic device, the loan book data from each platform such that the converted loan book data uses common data; reading, using the processor, the converted loan book data; and recording the destination unified data attributes for each pair of platform and attributes.
It is an object of the present invention to provide a method for analyzing crowd funding platforms wherein the metadata further includes a timestamp for when loan book data has been received.
It is an object of the present invention to provide a method for analyzing crowd-funding platforms in which a list of attributes is associated with each borrower list and loan issuance is associated with the platform.
It is an object of the invention to provide a method for analyzing a crowd-funding platform, wherein the public data is selected from the group consisting of: a common language; a public currency; a public time zone; a common unit; and a common numerical range.
It is an object of the present invention to provide a method for analyzing crowd funding platforms wherein storing the loan book data further comprises storing the loan book data in real time for each platform in its natural state.
The invention aims to provide a method for analyzing a crowd-funding platform, wherein documentation is performed according to a mapping table.
It is an object of the present invention to provide a method for analyzing a crowd funding platform, wherein the method further comprises predicting whether a loan associated with the platform is likely to be repayed.
According to another aspect of the invention, a system for analyzing a crowd-funding platform is provided. The system comprises: an electronic device configured to connect to a plurality of individual loan platforms and retrieve loan book data from each of the individual loan platforms; a memory coupled to the electronic device, the memory configured to store the loan book data, wherein the loan book data comprises metadata generated in a structured query language database, and wherein the metadata comprises a list of names and data attributes of platforms associated with the loan book data; and a processor coupled to the electronic device, the processor configured to convert the loan book data from each platform such that the converted loan book data uses common data, read the converted loan book data, and document destination unified data attributes for each pair of platform and attribute.
It is an object of the present invention to provide a system for analyzing crowd funding platforms wherein the metadata further includes a timestamp for when loan book data has been received.
It is an object of the present invention to provide a system for analyzing crowd-funding platforms in which the attribute list is associated with each borrower list and loan issuance is associated with a host platform that is listed and identified across other platforms.
It is an object of the present invention to provide a system for analyzing crowd-funding platforms, wherein the public data is selected from the group consisting of: a common language; a public currency; a public time zone; a common unit; and a common numerical range.
It is an object of the present invention to provide a system for analyzing crowd funding platforms, wherein the processor is further configured to store the loan book data for each platform in its native state in real time.
It is an object of the invention to provide a system for analyzing crowd-funding platforms, wherein the processor is configured to perform the documentation according to a mapping table.
It is an object of the present invention to provide a system for analyzing crowd-funding platforms, wherein the processor is further configured to predict whether a loan associated with a platform is likely to be repayed.
It is an object of the invention to provide a system for analyzing crowd-funding platforms, wherein the electronic device is selected from the group consisting of: a desktop computer; a laptop computer; a tablet computer; and a smart phone.
It is an object of the invention to provide a system for analyzing crowd-funding platforms, wherein the system further comprises a graphical user interface, and wherein the memory is further configured to store a digital application configured to enable a user to access the destination unified data attribute using the graphical user interface.
Drawings
Fig. 1 shows a block/flow diagram that illustratively depicts a method/system for analyzing crowd-funding platforms, in accordance with an embodiment of the present invention.
Figure 2 illustrates a screenshot of a login screen for a digital application of an analytics crowd funding platform, according to an embodiment of the invention.
FIG. 3 illustrates a screen shot of an alert system configuration screen for a digital application for analyzing borrower capital restrictions and investor investment restrictions based on supervisory commands and a cross-platform specific crowd-funding business model using encrypted unique identifiers, according to an embodiment of the invention.
Figure 4 illustrates a screenshot for setting up a user account for a digital application of an analytics crowd-funding platform according to an embodiment of the invention.
Figure 5 illustrates a screenshot of an alert for configuring a digital application for an analytics crowd-funding platform according to an embodiment of the invention.
Figure 6 illustrates a screenshot of an alert for configuring a digital application for an analytics crowd-funding platform according to an embodiment of the invention.
Figure 7 illustrates a screenshot of a platform using a digital application for analyzing crowd-funding platforms according to an embodiment of the invention.
Figure 8 illustrates a screenshot of an alert using a platform for analyzing a digital application of a crowd-funding platform according to an embodiment of the invention.
Detailed Description
Preferred embodiments of the present invention will now be described with reference to the accompanying drawings. Like elements in the various figures are denoted by like reference numerals.
Reference will now be made in detail to each embodiment of the invention. These examples are provided by way of explanation of the invention, and the invention is not intended to be limited thereto. Indeed, various modifications and alterations will become apparent to those skilled in the art upon a reading of the specification and a review of the associated drawings.
Recent legislative innovation has enabled us companies to raise the required capital by peer-to-peer market loans and securities (equity and debt (e.g., peer-to-peer loans)). This allows approved and unapproved investors to buy and sell securities for small private companies and non-publicly traded funds. The present invention describes an integrated approach to addressing the challenges of this market, including the development of ratings and the creation of an online financial technology platform that provides a transparent framework for investors and creates a mechanism for market participants to comply with rules and benchmark their performance. The design of the rating framework begins with data collection, consolidation and unification of point-to-point market loans and crowd funding markets for securities (equities and debts).
According to one embodiment, the system has two components.
The first component is the technology stack. According to one embodiment, the three subcomponents form a technology stack (system) that continues to crawl and produce the first subcomponent of the data to be collected; followed by a purge feature that allows the second sub-components to merge; and a third subcomponent, the unification of loan book data from securities crowd funding platforms called market borrowers, point-to-point borrowers, and crowd funding platforms (equity and debt). However, it should be noted that the terminology tends to vary depending on the origin of the country.
The second component relates to data collection. According to one embodiment, crawls of point-to-point loan book data are collected in a first layer/component in natural language (e.g., chinese, hindi, english, and more), computer code, and computer format for each country.
Referring now to fig. 1, a block/flow diagram of a method/system 100 for analyzing crowd-funding platforms is illustratively depicted, in accordance with an embodiment of the present invention.
Around the world, there have been 2500 or more platforms that have started to deliver consumer personal loans, small and medium-sized business loans, real estate loans (business and residential), school-assistant loans, agricultural/agricultural integrated business loans, solar/renewable energy loans, and automobile loans through network loan platforms. Financial loan data is published by each loan platform because each borrower is listed on the platform seeking financing. Market lenders/point-to-point lenders update and publish their data at different time intervals, through different media, in different formats, and across different jurisdictions.
Some platforms provide data through network Socket real-time agreements (essentially pushing new loan data and events to agreement subscribers). Other scripts, through the RESTful API, may extract new loan data at a predefined sequence of time intervals (hourly, 3-hour, daily, monthly, quarterly, etc.). The dependency of the output depends on the age of the point-to-point loan platform, the business model of the point-to-point loan platform (some update their loan list only when the borrower "asks" for the loan amount, update the event when the investor's loan reaches the "ask" amount), and provide a Comma Separated Value (CSV) file for download when the loan is delivered on a common web page (e.g., the loan funds are all in place). Other platforms provide direct Application Programming Interfaces (APIs) for retail and institutional investors and partners.
These point-to-point loan platforms provide their data in different formats, including but not limited to JSON, line delimited JSON, CSV, TSV, Excel, and HTM L Each format is provided with different possible encodings, including but not limited to UTF-8, BIG5, Latin-1, and GBK.
Each platform data may be in a different language (chinese, english, hindi, french, spanish, etc.). Any value can be evaluated in different units, which can be in various currencies (e.g., U.S. dollars, RMB, Euro, pound, Luo, etc.), and have different ranges of values. The range of numbers may include salaries (e.g., 0-1 million versus 0-1 thousand).
The problem stems from the fact that an entity (e.g., automated or manual) wishes to understand these data on a macro-to-micro unified level across all platforms (e.g., regulatory bodies, investors) of the point-to-point loan financial industry. In this context, "understanding" refers to generating statistical data and allowing a high degree of qualitative and quantitative comparison between platform data.
The complexity described thus far complicates attempts to analyze risk management in crowd-funding platforms. A solution to this problem involves three layers of ingredients working together. Fig. 1 shows a collection, integration, unified solution.
According to one embodiment, the data collection component 105 includes a set of custom scripts that connect to individual loan platforms and retrieve their loan book data, each script conforms and follows the point-to-point loan platform data distribution schedule, medium and format 110. once data is received from each platform, they are stored in their natural state (archive) in real time with metadata generated in the data collection SQ L database 115. according to one embodiment, the metadata includes a timestamp of the received data, the name of the platform, and a list of data attributes for each borrower list and subsequent loan issuance. according to one embodiment, each borrower list and/or issuance is associated with a primary platform listed and identified on other platforms. ". at this stage, all platform data is saved using the same encoding (e.g., UTF-8) and the same format (e.g., JSON), but each platform retains its unique and verifiable data attribute keys (e.g., loan interest may be expressed as" L oarssans "or" Interest ".
According to one embodiment, the data merge component 120 addresses the need to convert data to use common languages, currencies, time zones, common units, and numeric ranges. The data merge component pulls data from the data collection component 105, reads the data, and applies various transformations during the data merge process 125, such as a list of examples:
1. data in natural language (e.g., loan type/usage, interest rate, loan amount, payment terms, etc.) is first obtained in the local language and archived for auditing and then translated 130 into english. Currency denomination data (such as loan amount, premium, and other data) is acquired in the local language and is retained as the local language for research reporting and benchmarking due to currency fluctuations. Typically, this will not convert 135 to dollars unless needed, and then both denominations will be presented with date/time stamps for return testing.
2. Time zone conversion 140 is a UTC time zone.
3. The digital information, such as the borrower revenue information, interest rates, etc., is converted 145 to a single floating point format (e.g., "18K" to "18000.00" and "10%" to "0.1"). At this stage, all data has been converted to a common format, but each platform still retains its original and unique set of data attribute keys.
According to one embodiment, all of this data is pushed and stored into a queue for consumption by the last component (data unification component 150).
Based on the mapping table, recording its destination unified data attribute for each different platform data attribute pair 155 (e.g., platform a/attribute Y), the data unification component 150 populates a central structured query language (SQ L) database 160 for all platform/attribute pairs 155.
This results in the central database 160 storing the different platform data in a new unified format, whereby macro-level statistical and comparative analysis can be achieved with an accuracy of less than 1% error rate.
Such a solution as shown in fig. 1 allows near real-time transparency of loan data at the transaction level, as well as normalization and standardization of the data, allowing for cross-platform, cross-jurisdictional, and cross-regional settings to establish industry-wide comparisons, valuation, pricing activities, and statistical data generation. See appendix I, for further explanation.
According to one embodiment, the method/system 100 includes, for example, comparing the average interest rate of platform a in jurisdiction Y with the average interest rate of another platform B in jurisdiction Z; and averaging all platform loan default rates throughout the jurisdiction or region.
According to one embodiment, the present method/system 100 includes the feasibility and value of using social media, for example, in traditional company (public-private) specific rating models and investor specific ratings.
According to one embodiment, the method/system 100 includes, for example, creating an industry-wide standard weighted credit risk model to underwrite loans and track performance.
According to one embodiment, the present method/system 100 includes the ability to, for example, identify when a borrower exceeds the borrower limit on one or more platforms.
According to one embodiment, the present method/system 100 collects, merges, and unifies data from multiple separate point-to-point loan platforms, such as point-to-point loan platforms from china, the united states, and europe, including consumer loans, real estate, assisted loans, automobiles, integrated agriculture, renewable energy/solar energy, and lifestyle, among others.
According to one embodiment, the present invention provides the following aspects:
1. stable automation of the API and/or network crawling techniques for each platform.
2. And adding loan collecting/grabbing conditions every hour on the platform.
3. Any loan is collected/captured hourly, per platform update event.
4. For loan issuance, the performance of the loan may be tracked.
5. There is a differentiation of loans for the following cases:
a. the loan progress is less than 100 percent, the loan is in the price enquiring stage, and the unconstrained contract between the two parties represents the price enquiry amount in the market.
b. Loan progress is equal to 100% -the loan is a valid, binding legal contract between two parties that provides the amount of loan/credit on the market.
According to one embodiment, the present method/system 100 includes incorporating a method for identifying credit risk with the purpose of identifying an explanatory variable for predicting whether a loan is likely to be repayed.
The following data and subsets provide examples of methods for predicting whether a loan may be repayed, according to embodiments of the invention.
Data: the loan data issued by the XYZ platform includes all loans issued during the period of 1 month 2010 to 2016 and 9 months, and the latest loan status is up to the date of issuance. Two subsets of the loan have been analyzed-both having completed their lifecycle, the loan status being "full repayment" or "offset".
Subset 1: three-year and five-year loans (30986 loans, default rate 15%) issued during 1 month to 2011 11 months in 2010,
subset 2: three-year loans (166267 loans, default rate 12%) issued between 1 month 2010 and 12 months 2013.
Model: and a logistic regression model with the loan condition as a dependent variable. Different subsets of independent variables were established according to the following attributes (as shown in table 1):
Figure BDA0002522122000000101
TABLE 1
As a result: to date, no subset of attributes has produced a model to calculate the probability of a default that matches the default observed in the original loan data. These attributes do not appear to have much impact on the loan status. To further analyze this problem, we calculated the correlation between loan status and several attributes, such as "dti" (debt-to-income ratio). For example, the correlation between dti and loan status in data subset 2 is only 0.09, which is very low.
Description of the drawings: the XYZ platform has used these attributes to distinguish between "good" loans and "bad" loans, where a "good" loan is a loan issued by the XYZ platform; which reduces the loan application by approximately 90%. Thus, the data we analyzed contained only "top 10%", e.g., the debt-to-income ratio for all loans from 2010 to 2013 was below 35%. In the descending loan data, we find that over 20 ten thousand dti values are above 40%, up to 1000%, for the same time interval. (debt-to-income ratio is the only attribute in the refusal loan dataset that can be compared to the original loan.)
Thus, other attributes appear to be needed to explain the default of the original loan. For example, these indicators may be indicators related to health or risk of unemployment.
Another example is provided below:
estimated amount of subset of parameters, using 5000 loan samples (4300 total payments, 700 offsets), using R (as shown in table 2):
(Intercept) -2,64400000
open_acc 0,01046000
revol_util 0,01275000
revol_bal -0,05051000
delinq_2yrs 0,16580000
Dti 0,01498000
pub_rec 0,02689000
pub_rec_bankruptcies 0,50240000
mths_since_last_delinq -0,00008213
mths_since_last_record -0,00243300
inq_last_6mths 0,18150000
TABLE 2
The resulting probability of default divided by quartile and corresponding observed default is applicable to a complete data set of 30,086 loans (26,636 full payments/4,350 offsets) (as shown in table 3):
Figure BDA0002522122000000121
TABLE 3
And (3) comparison: default probabilities divided by quartile and corresponding default observed from the bank's 300 loan dataset (255 total repayment/45 offsets) (as shown in table 4):
Figure BDA0002522122000000122
TABLE 4
Referring now to fig. 2, a screenshot of a login screen for analyzing a digital application of a crowd-funding platform is illustratively depicted in accordance with an embodiment of the present invention.
According to one embodiment, one or more of the steps and/or functions illustrated and described in FIG. 1 may be accomplished using a digital application. According to one embodiment, the digital application is capable of running on an electronic device such as, but not limited to, a desktop computer, a laptop computer, a tablet computer, a smartphone, and/or any other suitable electronic device. According to one embodiment, one or more electronic devices are connected via a wired and/or wireless connection via a server. According to one embodiment, the memory may be coupled to an electronic device and/or a server for storing one or more pieces of data and/or digital applications.
According to one embodiment, a login screen of a digital application enables a user to enter login credentials (e.g., username, password, etc.) and a particular technology platform.
Referring now to fig. 3, a screenshot of an alert system configuration screen for analyzing a digital application of a crowd-funding platform is illustratively depicted in accordance with an embodiment of the present invention.
According to one embodiment, a user can configure a digital application to send an alert to the user. According to one embodiment, the configuring includes inputting information for the platform. This information may include, for example, the address, the region, the extent of outstanding loans, the legal maximum loan limit, the address (numeric or physical) to which the alert is sent, and/or any other suitable information.
Referring now to fig. 4, a screen shot for setting up a user account for a digital application of an analytics crowd-funding platform is illustratively depicted, in accordance with an embodiment of the present invention.
According to one embodiment, user account configuration includes entering identifiable information including, for example, a name, login credentials, an email address, and/or any other suitable information. According to one embodiment, more than one user account may be configured.
Referring now to fig. 5-6, screenshots of alerts for configuring a digital application for an analytics crowd-funding platform are illustratively depicted, according to various embodiments of the present invention.
According to one embodiment, the user is able to configure alerts for a particular platform (FIG. 5) or all platforms (FIG. 6). According to one embodiment, the configuration includes setting a legal maximum debit limit, setting to receive an alert when the actual debit reaches an amount or percentage of the maximum debit amount, setting to receive an alert when the potential debit reaches an amount or percentage of the maximum debit amount, and receiving an interview alert at what time. According to an embodiment, the user may also configure the alert such that the user stops receiving alerts from customers whose platform has borrowed funds until the customer requests a new loan.
Referring now to fig. 7, a screen shot of a profile in a platform using a digital application for an analytics crowd funding platform is illustratively depicted, in accordance with an embodiment of the present invention.
According to one embodiment, the profile includes identifiable information about the identity associated with the profile, such as name, address, region, extent of outstanding repayments, legal maximum loan limits, and address (numeric or physical) to which the alert is to be sent.
Referring now to fig. 8, a screenshot of an alert using a platform for analyzing a digital application of a crowd-funding platform is illustratively depicted in accordance with an embodiment of the present invention.
According to one embodiment, the alerts are organized and listed by date received and the borrower's unique identifier is listed in the alert. According to one embodiment, the user is able to search for alerts according to a particular timeframe.
System, device and operating system
In general, one or more users (which may be people or groups of users and/or other systems) may participate in an information technology system (e.g., a computer) to facilitate operation and information processing of the system. Further, computers employ processors to process information, and such processors may be referred to as Central Processing Units (CPUs). One form of processor is known as a microprocessor. The CPU uses the communication circuit to pass binary-coded signals used as instructions to implement various operations. These instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory (e.g., registers, cache, random access memory, etc.). Such communication instructions may be stored and/or transmitted as program and/or data components in batches (e.g., batch instructions) to facilitate the desired operations. These stored instruction codes (e.g., programs) may cause CPU circuit components and other motherboard and/or system components to participate in performing desired operations. One type of program is a computer operating system, which may be executed by a CPU on a computer; the operating system enables and facilitates user access and operation of computer information technology and resources. Some of the resources that may be used in an information technology system include: input and output mechanisms by which data may be transferred to and from the computer; a memory store into which data may be saved; and a processor by which information may be processed. These information technology systems can be used to collect data for later retrieval, analysis, and manipulation, which can be facilitated by database programs. These information technology systems provide interfaces that allow users to access and operate the various system components.
In one embodiment, the present invention may be connected to and/or in communication with an entity, such as, but not limited to: one or more users from user input devices; a peripheral device; an optional cryptographic processor device; and/or a communication network. For example, the present invention may connect to and/or communicate with a user operating client devices, including but not limited to personal computers, servers, and/or various mobile devices, including but not limited to cellular telephones, smart phones (e.g., cellular phones), smart phones, etc
Figure BDA0002522122000000141
Android os based phones, etc.), tablet computers (e.g., apple iPad)TMHP SlateTMMotorola XoomTMEtc.), e-book readers (e.g., amazon KindleTMBook of the peno bookstoreTME-book reader, etc.), laptop, notebook, netbook, game console (e.g., XBOX L iveTMRen and ren
Figure BDA0002522122000000151
DS, Sony
Figure BDA0002522122000000152
Portable, etc.), Portable scanners, etc.
It should be noted that the term "server" as used throughout this application generally refers to computers, programs, other devices, users, and/or combinations thereof that are capable of processing and issuing requests, as well as obtaining and processing any responses from the server over a communication network.
The present invention may be based on a computer system that may include, but is not limited to, components such as a computer system coupled to a memory.
Computer system
The computer system may include a clock, a central processing unit ("CPU" and/or "processor") (these terms are used interchangeably throughout this disclosure unless otherwise noted), memory (e.g., Read Only Memory (ROM), Random Access Memory (RAM), etc.) and/or interface bus, and most frequently, although not necessarily, all interconnected and/or communicating via a system bus on one or more (mother) boards having conductive and/or otherwise transmitted circuit paths through which instructions (e.g., binary coded signals) may travel to effect communication, operation, storage, etc.. optionally, the computer system may be connected to an internal power source; e.g., optionally, the power source may be internal; optionally, a crypto processor and/or transceiver (e.g., IC) may be connected to the system bus; in another embodiment, the crypto processor and/or transceiver may be connected to internal and/or external peripheral devices via an interface bus I/O; further, the transceiver may be connected to an antenna, thereby effect various communication and/or state sensors and/or other communication devices such as a Bluetooth transceiver chip, which may be connected to one another via an I/O bus I/O to provide a communication link to one another clock, a wireless communication link, a transceiver, a.
The CPU includes at least one high speed data processor sufficient to execute program components for executing user and/or system generated requests. Typically, the processor itself will be incorporated into various special purpose processing units such as, but not limited to: an integrated system (bus) controller, a memory management control unit, a floating point unit, and even special purpose processing subunits such as a graphics processing unit, a digital signal processing unit, and the like. In addition, the processor may include internal fast-access addressable memory, and may be capable of mapping and addressing memory beyond the processor itself; internal memory may include, but is not limited to: fast registers, various levels of cache (e.g., levels 1, 2, 3, etc.), RAM, etc. The processor may access the memory by using a memory address space accessible via instruction addresses, which the processor may construct and decode, allowing it to access a circuit path to a particular memory address space having memory states. The CPU may be a microprocessor, for example: athlon, Duron and/or Opteron of AMD; applications for ARM, embedded and secure processors; DragonBall and PowerPC by IBM and/or motorola; IBM and Sony cell processors; celeron, Core (2) Duo, Itanium, Pentium, Xeon, and/or XScale, Intel; and/or the like. The CPU interacts with the memory through instructions passing through electrically conductive and/or transmissive conduits (e.g., (printed) electronic and/or optical circuits) to execute the stored instructions (i.e., program code) according to conventional data processing techniques. Such command transfer facilitates communication within and outside of the present invention through various interfaces. If the processing requirements indicate a greater amount of speed and/or capacity, distributed processor (e.g., distributed embodiments of the present invention), host, multi-core, parallel, and/or supercomputer architectures may similarly be employed. Alternatively, if the deployment requires greater portability, a smaller Personal Digital Assistant (PDA) may be employed.
Depending on the particular implementation, features of the invention may be implemented by implementing microcontrollers such as CAST's R8051XC2 microcontroller, intel's MCS 51 (i.e., 8051 microcontroller), and the like. Moreover, to implement certain features of various embodiments, some feature implementations may rely on embedded components, such as application specific integrated circuits ("ASICs"), digital signal processing ("DSPs"), field programmable gate arrays ("FPGAs"), and/or similar embedded technologies. For example, any set of components (distributed or otherwise) and/or features of the invention may be implemented via a microprocessor and/or via embedded components (e.g., via ASICs, co-processors, DSPs, FPGAs, etc.). Alternatively, some embodiments of the invention may be implemented with embedded components configured and used to implement various features or signal processing.
Depending on the particular implementation, the embedded component may include a software solution, a hardware solution, and/or some combination of both hardware/software solutions. For example, the features of the present invention discussed herein may be implemented by implementing an FPGA, which is a semiconductor device containing programmable logic components and programmable interconnects called "logic blocks," such as the high performance FPGA Virtex family and/or the low cost Spartan family manufactured by Xilinx. After the FPGA is manufactured, a customer or designer can program the logic blocks and interconnects to implement any of the features of the present invention. The hierarchy of programmable interconnects allows logic blocks to be interconnected as required by the system designer/administrator of the invention, somewhat like a single chip programmable test board. The logic blocks of the FPGA can be programmed to perform the functions of basic logic gates, such as AND XOR, or more complex combinational functions, such as decoders or simple mathematical functions. In most FPGAs, the logic block also includes memory elements, which can be simple flip-flops or more complete memory blocks. In some cases, the invention can be developed on a conventional FPGA and then migrated into a fixed version more similar to an ASIC implementation. Alternative or coordinated implementations, instead of or in addition to FPGAs, may migrate the features of the controller of the present invention to the final ASIC. Depending on the implementation, all of the above-described embedded components and microprocessors may be considered "CPUs" and/or "processors" of the present invention.
Power supply
The power supply may be of any standard form for powering small electronic circuit board devices, such as the following batteries: alkaline batteries, lithium hydride batteries, lithium ion batteries, lithium polymer batteries, nickel cadmium batteries, solar cells, and the like. Other types of AC or DC power sources may also be used. In the case of a solar cell, in one embodiment, the housing provides an aperture through which the solar cell can capture photon energy. The power cell is connected to at least one of the interconnected subsequent components of the invention, thereby providing current to all subsequent components. In one example, a power supply is connected to the system bus component. In an alternative embodiment, the external power supply is provided by a connection across the I/O interface. For example, USB and/or IEEE1394 connections carry data and power over the connection and are therefore suitable power sources.
Interface adapter
An interface bus may accept, connect and/or communicate to a plurality of interface adapters, conventionally but not necessarily in the form of adapter cards, such as, but not limited to: input output interfaces (I/O), storage interfaces, network interfaces, and the like. Alternatively, a cryptographic processor interface may similarly be connected to the interface bus. The interface bus provides communication between the interface adapters to each other and to the other components of the computer system. The interface adapter is adapted to a compatible interface bus. The interface adapter is typically connected to the interface bus via a slot fabric. Conventional slot architectures may be employed, such as, but not limited to: accelerated Graphics Port (AGP), card bus, (extended) industry Standard architecture ((E) ISA), Micro Channel Architecture (MCA), NuBus, peripheral component interconnect (extended) (PCI (X)), PCI express, Personal Computer Memory Card International Association (PCMCIA), and the like.
The memory interface may accept, communicate and/or connect to a plurality of storage devices, such as, but not limited to: storage devices, removable disk devices, and the like. The memory interface may employ a connection protocol such as, but not limited to, (ultra) (serial) advanced technology attachment (packet interface) (ultra) (serial) ata (pi), (enhanced) integrated drive electronics ((E) IDE), Institute of Electrical and Electronics Engineers (IEEE)1394, fibre channel, Small Computer System Interface (SCSI), Universal Serial Bus (USB), and the like.
The network interfaces may employ connection protocols such as, but not limited to, direct connections, Ethernet (thick, thin, twisted pair 10/100/1000Base T, etc.), token ring, wireless connections such as IEEE802.11a-x, etc. if processing requirements dictate greater amounts of speed and/or capacity, distributed network controllers (e.g., the distributed embodiments of the present invention), architectures may similarly be employed to aggregate, load balance, and/or otherwise increase the communication bandwidth required by the controllers of the present invention.
An input/output interface (I/O) may accept, communicate and/or connect to a user input device, a peripheral device, a cryptoprocessor device, etc. the I/O may employ a connection protocol such as, but not limited to, audio, analog, digital, mono, RCA, stereo, etc., data: Apple Desktop Bus (ADB), IEEE1394a-b, serial, Universal Serial Bus (USB), infrared, joystick, keyboard, midi, optical, PC AT, PS/2, parallel, radio, Video interface: Apple Desktop Connector (ADC), BNC, coaxial, composite, digital visual interface (VGA), High Definition Multimedia Interface (HDMI), RCA, RF antenna, S-Video, VGA, etc., wireless transceiver: 802.11 a/b/g/n/x; Bluetooth; cellular (e.g., DVI, Code Division Multiple Access (CDMA), high speed packet access (HSPA (+), high speed downlink packet access (DVI), global system for mobile communications (GSM), long term evolution (L TE), WiWinAN) and the like; the output of a WiRCA may include a Video signal generated by a composite Video interface device, such as a composite Video interface (CRT) and a composite Video interface (CRT) that receives Video signal generated by a composite Video interface (CRT) and a composite Video interface that receives a Video signal from a Video monitor, which is typically based on a composite Video interface (CRT) and a composite Video interface.
The user input device is typically a peripheral device (see below) and may include: a card reader, dongle, fingerprint reader, glove, tablet, joystick, keyboard, microphone, mouse (mouse), remote control, retina reader, touch screen (e.g., capacitive, resistive, etc.), track ball, track pad, sensor (e.g., accelerometer, ambient light, GPS, gyroscope, proximity, etc.), style, and the like.
The peripheral devices may be external, internal and/or part of the controller of the present invention. The peripheral devices may also include, for example, an antenna, audio devices (e.g., incoming lines, outgoing lines, microphone inputs, speakers, etc.), cameras (e.g., still, video, webcam, etc.), drive motors, lighting, video monitors, and the like.
A cryptographic unit such as, but not limited to, a microcontroller, processor, interface and/or device may be attached to and/or communicate with the controller of the present invention the MC68HC16 microcontroller manufactured by Motorola Inc. may be used for encryption units and/or within encryption units the MC68HC16 microcontroller utilizes 16 bit multiply and accumulate instructions in a 16MHz configuration and requires less than one second to perform 512 bit RSA private key operations, the encryption units support authentication of communications from interactive agents and allow anonymous transactions, the encryption units may also be configured as part of the CPU, equivalent microcontrollers and/or processors may also be used, other commercially available dedicated cryptographic processors include CryptoNet X from Broadcom corporation and other security processors, nCipher corporation's nShield, SaNet corporation's L una (e.g., 7100) family, the 40MHz Roadron 184 from Semapre corporation, SunCipher corporation's PCIe cryptographic unit (e.g., encryption accelerator 3900), the PCIe accelerator 500 MHz accelerator, the accelerator may execute instructions via PCI 600V 19V + 600, 600 MHz accelerator, 600 MHz cryptographic accelerators, 2400V # 19 MHz, and the like.
Memory device
Generally, any mechanization and/or embodiment that allows a processor to affect the storage and/or retrieval of information is considered memory. However, memory is an alternative technology and resource, and thus, any number of memory embodiments may be employed instead of or in coordination with each other. It should be understood that the controller and/or computer system of the present invention may employ various forms of memory. For example, a computer system may be configured in which the functionality of on-chip CPU memory (e.g., registers), RAM, ROM, and any other storage device is provided by a punched-paper tape or punched-paper card mechanism; of course, such an embodiment would result in a very low operating rate. In a typical configuration, the memory will include ROM, RAM, and storage devices. The storage device may be any conventional computer system memory. The storage device may include a drum; a disk drive (fixed and/or removable); a magneto-optical driver; optical drives (i.e., Blueray, CD ROM/RAM/recordable (R)/Rewritable (RW), DVD R/RW, HD DVD R/RW, and the like); arrays of devices (e.g., Redundant Array of Independent Disks (RAID); solid state storage devices (USB, solid state processors, etc.), and other readable devices.
Component collection
The memory may contain a collection of programs and/or database components and/or data such as, but not limited to: operating system component(s) (operating system); information server component(s) (information server); user interface component(s) (user interface); web browser component(s) (web browser); a database(s); mail server component(s); mail client component(s); the encryption server component(s) (collectively referred to as an encryption server or set). These components may be stored and accessed from a memory device and/or from a memory device accessible via an interface bus. While non-traditional program components, such as those in the component collection, are typically stored in local storage, they may also be loaded into and/or stored in memory, such as peripheral devices, RAM, remote storage devices over a communication network, ROM, various forms of memory, etc.
Operation ofSystem for controlling a power supply
The operating system may Be a highly fault-tolerant, extensible, and secure system such as apple Macintosh OS X (server), AT & T Plan 9, Be OS, Unix and Unix-like system issues (e.g., Unix of AT & T; Berkley Software Distribution (BSD) such as FreeBSD, NetBSD, OpenBSD, etc.), and/or L inux issues such as Red Hat, Ubuntu, etc., and similar operating systems, however, more limited and/or less secure operating systems may also Be employed such as apple Macintosh OS, IBM OS/2, Microsoft DOS, Microsoft 2000/2003/3.1/95/98/CE/lenium/NT/Vista/XP (server), milm OS/2, etc., operating systems such as Microsoft OS, Windows 2000/2003/3.1/95/98/CE/tenium/OS, Windows/XP (Windows @ OS, etc., may Be implemented as a network communication system component for communicating with other operating systems, such as a network communication system components, such as a network communication system, a network communication system capable of communicating with other operating system devices, including, a network communication system, a network communications system capable of communicating with other operating system communications system, a host, a network, a host.
Information server
The information server may be a conventional internet information server, such as but not limited to Apache of Apache software Foundation, an internet information server of Microsoft, etc. the information server may allow for the execution of program components via facilities such as Active Server Pages (ASP), ActiveX, (ANSI) (object-) C (+), C #, and/or NET, Common Gateway Interface (CGI) scripts, dynamic (D) Hypertext markup language (HTM L), F L ASH, Java, JavaScript, real retrieval report language (PER L), Hypertext preprocessor (PHP), pipes, Python, Wireless Application (WAP), WAP/protocol, etc. the information server may support secure communication protocols such as but not limited to File Transfer Protocol (FTP), Hypertext transfer protocol (HTTP), secure Hypertext transfer protocol (HTTPS), secure sockets layer (SS L), messaging protocol (e.g. USA online (L), instant messaging (HTTP), application exchange (APS), Internet messaging protocol (HTTP), Internet messaging protocol (SIP) to open message components, Internet messaging protocol (IP) and other messaging servers, such as internet messaging devices, etc. including internet messaging devices, internet messaging devices, internet messaging devices, internet messaging devices, internet devices, etc. servers, internet devices, internet messaging devices, etc. including internet devices, internet messaging devices, internet devices, etc. including internet devices, internet.
The access to the databases of the present invention may be accomplished through a number of database bridging mechanisms, such as through the scripting languages (e.g., CGI) listed below, as well as through the inter-application communication channels (e.g., CORBA, network Objects, etc.) listed below.
Moreover, an information server can contain, communicate, generate, obtain, and/or provide program components, systems, users, and/or data communications, requests, and/or responses.
User interface
Computer interface elements such as check boxes, cursors, menus, scrollers and Windows (collectively referred to as widgets) similarly facilitate access, capability, operation and display of data and computer hardware and operating system resources and states.
The user interface component is a stored program component that is executed by the CPU. The user interface may be a conventional graphical user interface provided by and/or over an operating system and/or operating environment such as those already discussed. The user interface may allow for the display, execution, interaction, manipulation, and/or operation of program components and/or system facilities through textual and/or graphical facilities. The user interface provides a facility through which a user may influence, interact with, and/or operate the computer system. The user interface may communicate with other components in the set of components and/or with other components, including itself and/or the like. Most frequently, the user interface communicates with an operating system, other program components, and the like. The user interface may contain, communicate, generate, obtain, and/or provide program components, systems, users, and/or data communications, requests, and/or responses.
Web browser
A web browser and similar information access tools may be integrated into a PDA, cell phone, and/or other mobile device.
Mail server
The mail server may support communication protocols such as, but not limited to, Internet Message Access Protocol (IMAP), Messaging Application Programming Interface (MAPI)/Microsoft Exchange, post office protocol (POP3), Simple Mail Transfer Protocol (SMTP), etc. the mail server may route, forward, and process incoming and outgoing mail messages that have been sent, relayed, and/or otherwise traversed through and/or to the present invention.
Access to the mail of the present invention may be accomplished through multiple APIs provided by various web server components and/or operating systems.
Moreover, a mail server can contain, communicate, generate, obtain, and/or provide program components, systems, users, and/or data communications, requests, information, and/or responses.
Mail client
The mail client component is a stored program component that is executed by the CPU. The Mail client may be a conventional Mail viewing application such as Apple Mail, Microsoft Enterprise, Microsoft Outlook Express, Mozilla, Thunderbird, etc. The mail client may support a variety of transport protocols, such as: IMAP, Microsoft Exchange, POP3, SMTP, etc. The mail client may communicate with other components in the set of components and/or with other components, including itself and/or the like. Most commonly, a mail client communicates with a mail server, operating system, other mail clients, and the like; for example, it may contain, communicate, generate, obtain, and/or provide program components, system, user, and/or data communications, requests, information, and/or responses. Typically, mail clients provide the facility to compose and send electronic mail messages.
Encryption server
The cryptographic components may employ encryption techniques such as, but not limited to, digital certificates (e.g., X.509 authentication framework), digital signatures, double signatures, encapsulations, cryptographic access protections, public key management, etc. the cryptographic components may facilitate multiple (encryption and/or decryption) security protocols such as, but not limited to, checksums, Data Encryption Standards (DES), elliptic curve Encryption (ECC), International Data Encryption Algorithm (IDEA), message digest 5(MD5, which is a one-way hash function), hashes, Rivest (RC5), Rijndael, RSA, which is a system of communications enabled by Ron Rivest, Adimonr, a cryptographic processor device, etc., and may facilitate secure communications with other cryptographic components such as, a cryptographic component, a system of a cryptographic component, a cryptographic component, a cryptographic component, a.
Database of the invention
The database components of the present invention may be embodied in a database and its stored data. The database is a stored program component executed by the CPU; the stored program component portion configures the CPU to process the stored data. The database may be a conventional, fault tolerant, relational, scalable, secure database, such as Oracle or Sybase. Relational databases are extensions of flat files. A relational database consists of a series of related tables. These tables are interconnected by key fields. The use of key fields allows tables to be combined by indexing the key fields; that is, the key field acts as a dimension pivot point for combining information from various tables. Relationships typically identify links maintained between tables by matching primary keys. The primary key represents a field that uniquely identifies a row of a table in a relational database. Rather, they uniquely identify rows of a table on the "one" side of a one-to-many relationship.
The object database may include a plurality of object sets grouped and/or linked together by common attributes, which may be related to other object sets by some common attributes.
In one embodiment, the database component includes a plurality of tables. User (e.g., operator and doctor) tables may include tables such as, but not limited to: user _ id, ssn, dob, first _ name, last _ name, age, state, address _ first, address _ second, zipcode, device _ list, contact _ info, contact _ type, alt _ contact _ info, alt _ contact _ type, etc. fields to refer to any type of inputtable data or selections discussed herein. The user table may support and/or track multiple entity accounts. The client table may include fields such as, but not limited to: user _ id, client _ ip, client _ type, client _ model, operating _ system, os _ version, app _ installed _ flag, and the like. Apps tables may include tables such as, but not limited to: fields of app _ ID, app _ name, app _ type, OS _ compatibilities _ list, version, timestamp, devilpepender _ ID, and the like. Beverage tables, which include, for example, the heat capacity of different beverages and other useful parameters, such as depending on size measure _ name, measure _ size, desired _ total, total _ time, favorite _ driver, number _ of _ definitions, current _ measure _ temperature, current _ activity _ temperature, and so forth. The parameter table may include fields including the aforementioned fields, or additional fields such as cool _ start _ time, cool _ preset, cool _ rate, and the like. The cooling routine table may include a plurality of cooling sequences, which may include, for example and without limitation: sequence _ type, sequence _ id, flow _ rate, avg _ water _ temp, synchronizing _ time, pump _ setting, pump _ speed, pump _ pressure, power _ level, temperature _ sensor _ id _ number, temperature _ sensor _ location, and the like.
In one embodiment, the user program may contain various user interface primitives that may be used to update the platform of the present invention. Also, various accounts may require customized database tables depending on the environment and type of client that the system of the present invention may need to service. It should be noted that any unique field may always be designated as a key field. In an alternative embodiment, the tables have been dispersed into their own databases and their respective database controllers (i.e., a separate database controller for each of the tables described above). The database may further be distributed over several computer systems and/or storage devices using standard data processing techniques. Similarly, the configuration of a decentralized database controller may be changed by consolidating and/or distributing the various database components. The system of the present invention may be configured to track various settings, inputs and parameters via a database controller.
When introducing elements of the present disclosure or the embodiments thereof, the articles "a," "an," and "the" are intended to mean that there are one or more of the elements. Similarly, when used to introduce an element, the adjective "another" is intended to mean one or more of the element. The terms "comprising" and "having" are intended to be inclusive such that there may be additional elements other than the listed elements.
Although the present invention has been described with a certain degree of particularity, it is understood that the present disclosure has been made only by way of illustration and that numerous changes in the details of construction and the arrangement of components may be resorted to without departing from the spirit and scope of the invention.
Appendix I
Ministry of population LL C, intellectual Property patent application
Kim Wales
Julien Buty
Harald Frost
Directory
A
.
The
A
A
.
A
A
.
A
A
a.
The
.
Figure BDA0002522122000000301
Population office, LL C
IP patent
Application for
From major street storefronts to high-tech entrepreneurs, the U.S. "small and medium-sized businesses created every 3 net in the last 20 years2 of employment posts. "1The ability of individuals to pursue ideal, create companies, and develop business is the basis of the U.S. economy.
Figure DA00025221220036412
The obama government has attempted to ensure that Our benefits of continued economic resuscitation benefit accrue to all americans through the 2012' initiative Our business financing Act (Jumpstart Our business start Act), which allows for the mass funding of securities (stocks and debts) on the web through intermediaries (brokers-traders or registered financing platforms). Importantly, consumers and small businesses are able to widely obtain secure and affordable credit and equity financing. Without capital development, entrepreneurs are unable to move the innovative ideas into action. Without sufficient funds, americans are unable to develop their enterprises to create new jobs and opportunities for the next generation.
Crowd funding has become very popular since Kickstarter introduced in 2009. This "democratization" of financing enables entrepreneurs and innovators to raise significant funds from strangers around the world, bypassing the traditional manner of raising funds from friends, family and investors. Kickstarter, Indicogo, and GoFundMe are common names that bring about billions of dollars in returns and donations. These crowd funding platforms are only a small segment of the rapidly growing industry. If someone plans to initiate a crowd-funding campaign, they may first turn to one of the platforms.
The professor employees, alumni and students of the university in the united states also begin to use these new mechanisms to fund charges, projects and enterprises through an exclusive crowd-funding platform sponsored by schools.
Four types of crowd funding
Most crowd funding platforms can be assigned to the four crowd funding categories listed below, although the business models in these groups below are sometimes very different, the following is an overview of each group. For example, in the mass-cast category, there is a large difference between business models, depending on which part of the JOBS act is being utilized. Note that one or more models may be employed in order to create a "graduation" model that acts as an incubator throughout the life cycle of a project or business.
Crowd funding definition
1. Donation to the masses: a fund donation refers to a donation without a directly quantifiable reward or benefit. Examples include social, charitable, and cultural items. Mass donations may also be used to raise funds for political sports. For the success of mass donations, emotional ties must be established and maintained between capital providers and recipients.
2. The public rewards: the crowd rewards include creative cultural items and sports items. However, commercial items may also fall into this category. Through such financing, the patron may receive additional benefits (e.g., rewards) in the form of products, artwork, or services. The creativity of the parties seeking funds is unlimited.
3. Crowd investments (equity/debt): the focus placed by the public is not to fund the project, but to purchase corporate equity (common stock) or debt (e.g., convertible notes, mini-bonds, etc.). Crowd investments also provide investors with limited investment opportunities to support the growth of young companies. In return, these investors obtain shares of the company. These are usually silent partners, investors have limited voting rights.
4. Crowd loan/point-to-point loan: the crowd loan mainly refers to loan (borrowing funds) financing of companies or individuals (such as life style, school loan, real estate, automobiles and the like). In return for the loan, the borrower may wish to have their investment in risk adjusted return. As product and business models have evolved, the investor base of online market lenders has expanded to institutional investors, hedge funds and financial institutions.
Type of traffic pattern
Since securities-based crowd funding (e.g., rights to stock) is selling shares (common stocks), UAB implements a platform under the JOBS act that does not itself require the second and fourth operations to be performed, although there are many platforms that can simplify and manage the process. This may be an extension (trading desk) built into the entire platform, as shown by the point-to-point loan model. Thus, this section outlines the primary business model in an online point-to-point loan and the structure used to fund the activity.
Companies in the industry have developed two main business models: (1) direct borrowers who initiate loans to hold in their own portfolio, commonly referred to as balance sheet borrowers (fig. 9): and (fig. 10) a platform borrower who cooperates with the issuing and depositing institution to initiate a loan and then purchases a loan (fig. 10) sold to the investor as a whole loan or by issuing a security such as a bill of a dependent member. The third business model (fig. 11) is intended to illustrate transfer rights and obligations in securitization.
Direct borrowers who do not rely on the depository issuing a loan often need to obtain a license from each state in which the loan is located. A direct borrower who uses a national loan permit to directly issue loans is not under the supervision of the federal bank regulatory agency, except where the borrower may be under the CFPB regulations.
FIG. 9: direct 'simple' model
The model may be used for donation, rewards, equity, and debt crowd funding. The platform will be flexible to allow more than model and level results from activity a to activity B for the same publisher.
FIG. 10: platform loan model
The model uses a partner bank to initiate loans for subsequent purchases by the platform.
FIG. 11: transfer of rights and obligations in securitization
This figure is only used to illustrate the direction of rights and obligations in the program. Many details of the securitization process, such as grading securities, creating liquidity, etc., are not included below.
The basic principle of crowd funding (debt or equity) is to match a borrower who needs capital with an investor/borrower who has idle capital, bypassing what traditionally has been done by banks. With these developments, borrowers may provide faster credit to consumers (e.g., students) and all types of emerging growth companies. Over the past decade, online market loan companies have evolved from platforms that connect personal borrowers and personal lenders to complex networks characterized by institutional investors, financial institution partnerships, direct loans, and securitized transactions.
One approach is to try a hybrid model of market and balance sheet loans. In our view, companies that purchase loans to maintain their own balance sheet, as well as companies that sell other loans to investors, are motivated to sell weaker loans and retain better loans for their balance sheet. It would also be beneficial to have the concept of "skin in the game" to maintain the integrity and consistency of the platform and lender parties.
Market opportunity
Current regulatory status
1. The government compliance system of the capital market is followed by actions such as the pioneer corporation financing act (Jumpstart Our Business start) in the united states.
2. Retail consumers and small and medium sized enterprises (SMEs) financing for buying and selling securities using the internet have been revolutionized and re-regulated by 40 countries worldwide (e.g., 28 european member countries, china).
3. A new type of regulatory agency, known as a securities crowd funding platform, market borrower, and point-to-point platform, is creating new types of data (e.g., point-to-point loan book data) through online trading of securities.
4. The chinese People Bank (peoples Bank of China) requires the establishment of a platform that supervises the limits of borrowers and issuers and the transfer of funds.
Problem affecting market transparency
1. Investors cannot compare loans across platforms (e.g., interest rate)
2. There is no standard benchmark to evaluate the performance of an investor or borrower
Risk assessment
1. There is no standard rating system to convey risk
2. There is no standard framework for the creation of structured loan products
Population office structure
The population office is a financial technology company and is expected to become a substitute rating mechanism for peer-to-peer loan and stock crowd funding.
The team consists of financial services/banks, operation, technology and legal personnel who are dedicated and have rich experience. Including experts providing quantitative and qualitative due diligence panels for daily/weekly/monthly/quarterly analysis. Regulatory compliance, benchmarking, and risk models are provided to evaluate loans and portfolios.
We will provide research, asset and risk management for customers such as banks, point-to-point loan platforms, basic investors and fund managers.
Opportunity for technical development
Around the world, there have been 2500 or more platforms that have started issuing consumer personal loans, small and medium-sized business loans, real estate loans, learning-aid loans, agricultural/agricultural integrated business loans, solar/renewable energy loans, and automobile loans through a network loan platform. Financial loan data is published by each loan platform because each borrower is listed on the platform seeking financing. Market lenders/point-to-point lenders update and publish their data at different time intervals, through different media, in different formats, and across different jurisdictions.
Some platforms provide data through network Socket real-time agreements (essentially pushing new loan data and events to agreement subscribers). He, through the RESTful API, the script may extract new loan data at a predefined sequence of time intervals (hourly, 3-hour, daily, monthly, quarterly). The dependency of the output depends on the age of the point-to-point loan platform, the business model of the point-to-point loan platform (some update their loan list only when the borrower "asks" for the loan amount, update the event when the investor loan reaches the "ask" amount), and when the loan is initiated on a public web page (e.g., the loan funds are all in place), provide CSV files for download, and other platforms provide direct Application Program Interfaces (APIs) for retail and institutional investors and partners.
These point-to-point loan platforms provide their data in different formats, including but not limited to JSON, line delimited JSON, CSV, TSV, Excel, and HTM L Each format is provided with different possible encodings, including but not limited to UTF-8, BIG5, Latin-1, and GBK.
Each platform data may be in a different language (chinese, english, hindi, french, spanish, etc.). Any value can be evaluated in different units, which can be (e.g., currency-U.S. dollars, RMB, Euro, pound, Lux; and time zone), and have different ranges of values. The range of numbers may include salaries (e.g., 0-1 million versus 0-1 thousand).
The problem stems from the fact that an entity (e.g., automated or manual) wishes to understand these data on a macro-to-micro unified level across all platforms (e.g., regulatory bodies, investors) of the point-to-point loan financial industry. "understand" means to generate statistical data and allow a high degree of qualitative and quantitative comparison between platform data.
Technical scheme for solving problems
A solution to this problem involves three layers of ingredients working together. Fig. 12 shows a collection, integration, unified solution.
And (4) collecting data.
The data collection component includes a set of custom scripts that connect to a single loan platform and retrieve its loan book data, each script conforms and follows a point-to-point loan platform data distribution schedule, medium and format.
And (5) data merging components. The data merge component addresses the need to convert data to use common languages, currencies, time zones, common units, and numeric ranges. The data merge component extracts data from the data collection component, reads the data, and applies various transformations, such as the following exemplary list:
1. data in natural language (e.g., loan type/usage, interest rate, loan amount, payment terms, etc.) is first obtained in the local language and archived for auditing and then translated into english. Currency denomination data (such as loan amount, premium, and other data) is acquired in the local language and is retained as the local language for research reporting and benchmarking due to currency fluctuations. Typically, this will not translate to dollars unless needed, and then both denominations will be presented with date/time stamps for return testing.
2. The time zone is converted to a UTC time zone.
3. The digital information such as the income information, interest rate and the like of the borrower is converted into a single floating point format (for example, "18K" is converted into "18000.00", and "10%" is converted into "0.1"). At this stage, all data has been converted to a common format, but each platform still retains its original and unique set of data attribute keys.
All of this data is pushed and stored into the queue for use by the last component.
The central SQ L database is populated for all platform/attribute pairs based on a mapping table.
Define "perfect" -error rate below 1%.
Beneficial effects of population office solution
This solution allows near real-time transparent processing of loan data at the transaction level. Normalization and standardization of data, allowing industry-wide comparisons, valuation, pricing activities, and statistics generation to be created across platforms, jurisdictions, and regional environments.
An example of loan data. Comparing the average interest rate of a platform A in the district of Y with the average interest rate of another platform B in the district of Z; the loan default rates for all platforms throughout the jurisdiction or region are averaged. Benchmark/index.
Examples of equity data. Including the feasibility and value of using social media in traditional company (public-private) specific rating models and investor specific ratings.
Credit risk algorithm example. Including establishing an industry-wide standard weighted credit risk model to underwrite loans and track performance.
An example of an alarm system. Including the ability to identify when a borrower exceeds the borrower limit on one or more platforms.
Market data
Currently, the population offices collect, consolidate and unify data from 85 separate point-to-point loan platforms from china (83), the united states (2) and europe (6), encompassing consumer loans, real estate, assisted loans, automobiles, integrated agriculture, renewable energy/solar energy, and lifestyle.
1. Stable automation of the API and/or network crawling techniques for each platform.
2. And adding loan collecting/grabbing conditions every hour on the platform.
3. Any loan is collected/captured hourly, per platform update event.
4. For initiating loans, the performance of the loan may be tracked.
5. There is a differentiation of loans for the following cases:
a. the loan progress is less than 100 percent, the loan is in the price enquiring stage, and the unconstrained contract between the two parties represents the price enquiry amount in the market.
b. Loan progress is equal to 100% -the loan is a valid, binding legal contract between two parties that provides the amount of loan/credit on the market.
And deriving information. We can get information for the client and product:
"loan progress < 100% and 100%" average loan profitability of loan
"loan progress < 100% and 100%" estimated market size of loan
3. Repayment clause information
4. Loan use classification
5. We can generate derived data with classification attributes such as: loan usage, loan repayment terms, etc.
Datum
Currently, the reference data of the trial phase is printed (mailed) in quarters. The daily, monthly reference numbers will be extrapolated in the fourth quarter of 2017.
A reference feature.
1. Individual P2P platform account-all account escrow (china regulatory requirements).
2. Purity of loan method
a. Clustering is defined by "use of loan" -derived claims, where people require X currency on the P2P platform, in the form of, for example, automobiles, real estate, etc., in an overall market.
3. Fully transparent-all loans, Interval events and initiations listed
4. Total amount of all platform' enquiries
The P2P loan has a price inquiring stage before loan initiation. For a general loan (daily), all the amounts that remain outstanding are summed up, [ nominal loan 1-percent loan ]
b. Summing all platforms-economy-we know how much the market wants to loan out through P2P loans.
5. Situation of performing
a. Interest rate, quantity, value, default, offset, etc.
6. Risk control
a. Daily, monthly, quarterly quantitative and qualitative reviews.
Credit risk algorithm
The state is as follows: estimation of default probability based on XYZ platform data
The purpose is as follows: an explanation variable for predicting whether the loan is likely to be repayment is determined.
Data: the loan data issued by the XYZ platform includes all loans issued during the period of 1 month 2010 to 2016 and 9 months, and the latest loan status is up to the date of issuance. To date, two groups of loans have been analyzed-all loans have completed their lifecycle, with the loan status being "full repayment" or "offset":
subset 1: three-year and five-year loans (30986 loans, default rate 15%) issued during 1 month to 2011 11 months in 2010,
subset 2: three-year loans (166267 loans, default rate 12%) issued between 1 month 2010 and 12 months 2013.
Model: and a logistic regression model with the loan condition as a dependent variable. Different subsets of independent variables are established according to the following attributes:
Figure BDA0002522122000000381
Figure BDA0002522122000000391
monitoring
Figure BDA0002522122000000401
Descriptive information
Figure BDA0002522122000000402
And deriving information. We can get information for the client and product:
"loan progress < 100% and 100%" average loan profitability of loan
"loan progress < 100% and 100%" estimated market size of loan
3. Repayment clause information
4. Loan use classification
5. Derivative data of classified attributes such as loan application, loan repayment term and the like can be generated
Still more.
Datum
Currently, the reference data of the trial phase is printed (mailed) in quarters. The monthly daily reference for numbers will be extrapolated in the fourth quarter of 2017.
Figure BDA0002522122000000411
Figure BDA0002522122000000421
Figure BDA0002522122000000431
Figure BDA0002522122000000441
As a result: to date, no subset of attributes has produced a model to calculate the probability of a default that matches the default observed in the original loan data. These attributes do not appear to have much impact on the loan status. To further analyze this problem, we calculated the correlation between loan status and several attributes, such as "dti" (debt-to-income ratio). For example, the correlation between dti and loan status in data subset 2 is only 0.09, which is very low.
Description of the drawings: the XYZ platform has used these attributes to distinguish between "good" loans and "bad" loans, where a "good" loan is a loan initiated by the XYZ platform; which reduces the loan application by approximately 90%. Thus, the data we analyzed contained only "top 10%", e.g., the debt-to-income ratio for all loans from 2010 to 2013 was below 35%. In the descending loan data, we find that over 20 ten thousand dti values are above 40%, up to 1000%, for the same time interval. (debt-to-income ratio is the only attribute in the refusal loan dataset that can be compared to the original loan.)
Thus, other attributes appear to be needed to explain the default of the original loan. For example, these indicators may be indicators related to health or risk of unemployment.
Remarking: the method used in the analysis of the loan club data was tested using another set of credit data (from the bank). From these data, the probability of breach by parameter estimation is a good predictor of observed and non-breaches.
Appendix: examples of the invention
The estimated amount of the subset of parameters was calculated using 5000 loan samples (4300 total payments, 700 offsets), using R:
Figure BDA0002522122000000451
the resulting probability of default divided by quartile and the corresponding observed default applies to the complete data set of 30,086 loans (26,636 full payments/4,350 offsets):
Figure BDA0002522122000000461
and (3) comparison: probability of default divided by quartile and corresponding default observed from 300 loan data sets at the bank (255 total repayment/45 offsets):
Figure BDA0002522122000000462
valuation model for crowd funding by utilizing social data
This section discusses the feasibility and value of using social media in traditional company (public-private) specific rating models and investor specific ratings, which suggests that certain social media attributes have the potential to predict crowd funding success. For the company specific model, our results show that social media data is used in the Solvency, Z and Moat models; in addition, the data also suggests that social media may provide minor improvements to traditional models and may predict outcomes such as bankruptcy when used alone.
General methods and workflows
In assessing the value that social equity data may bring to a company's rating, it is important to first consider existing methods and approaches to rating a company's financial health and growth. Traditional models based on financial indicators have been used by many to predict outcomes for companies ranging from bankruptcy to having an economic river of protection. From an investment perspective, a rating system that can predict these outcomes with high accuracy is particularly valuable in selecting portfolios to maximize return on investment.
Since the beginning of the digital era, we have been able to obtain unprecedented amounts of data relative to the era prior to this time. The contact between individuals is now tighter than ever before, and information and events can be spread around the world in seconds.
In addition, individuals are increasingly turning to social media in order to establish connections with others and quickly share news, data, and ideas. These relationships and ideas, in turn, have the ability to influence the individual's decisions.
With modern technology, a great deal of data can be collected from connections and communications between individuals on social media, but can these data better leverage a company's future health and success? The general method determines economic city protection river, fair value price, Altman Z score, repayment ability score and income valuation method of each share as the starting point for analyzing the specific rating of the social media company. For ratings for investors, we focus on predicting the probability of crowd funding success using social media.
After identifying the models that will serve as baselines for the social media overlay, we next determine the mathematical basis for each model and the inputs and dependent variables required for each model. Once the variables needed for modeling are determined, we obtain historical financial data points for several companies for each model in order to predict the outcome that each model is intended to predict (e.g., the repayment capacity scoring model is intended to predict bankruptcy). Over time we will conduct a more comprehensive analysis and limit our acquisition of financial (and social) variables to the period of 2007 to 2016, since the start of this period is one year after Twitter holds. In practice, our model uses data for a narrow time window (2009 to 2015), and we rely on quotiedia and gurufocus. Once obtained, we build a baseline model using the financial data and the appropriate mathematical model.
Then, we used Crimson Hexagon, Internet Archive (R) ((R))https://archive.org/ index_php) Com, and other resources obtain social media data. Finally, we combined social media data with traditional financial variables in different combinations to determine if these data improve the predictive power of traditional models, and we also evaluated whether social media data alone has any predictive power in predicting corporate financial health, income, or economic river protections. The average accuracy of 100 iterations of each model incorporating social media is compared to the average accuracy of the baseline model and the accuracy that would be achieved if random guessing to evaluate the predictive power of social media in ratings.
Quantitative social data prediction for protecting urban rivers
Model overview. Companies with economic moats are attractive when considering investments because they tend to be less risky investments and provide a more consistent return. Our baseline moat river model is based on the 2013 Morningstar methodology paper written by Warren Miller2, and we achieved all modeling using the insert package in R3. The berm river assigned to the company in our analysis was obtained from the Morningstar website in 2016 (see "social data for demographic berm focusing benchmarks" file for more information about the company used in our analysis). We assume that these companies hold the names of these city-protecting rivers by the end of 2015.
We obtained the same 12 financial variables used by Morningstar in its quantitative berm rating (described below) during 2013 to 2014, as this would provide us with data predicting berm type at least one year before a company obtained 2015 berm name. Also, we apply two different random forest models to distinguish a highway with an economic river of protectionSedan and companies without moat river and distinguish between companies with narrow moat rivers and companies with wide moat rivers. The prediction of our model is based on 500 regression trees (details about random forest models can be found in the Morningstar report)1)。
Figure BDA0002522122000000481
To test the accuracy of each model, we randomly separated our data into a training data set (60% of companies) and a test data set (40% of companies) (fig. 13). After training the model with 60% of company data, we used the model to classify the remaining 40% of companies and calculate accuracy. Since the accuracy of the model can vary due to the random selection of training and test data, we performed the above sequence of steps 100 times and took the mean and standard deviation of 100 trials as our final accuracy score. To generate the final berm score, we train each random forest model with all the data in the matrix, and then use the probability outputs of the random forest models to evaluate the probability that a company has both a berm and a wide (wide) berm. The method is the same as the method used by Morningstar corporation, and the calculation method is as follows:
Figure BDA0002522122000000491
in the above equation, "(1-unwarranted town river probability)" and "(wide protected town river probability)" can be directly obtained from the insert packet within R.
When overlaying social media variables, we use the same approach as described above. The main difference between the baseline model and the social media overlay model is the matrix we provide to the random forest model. In total, we constructed 23 different models (model descriptions provided separately) consisting of a baseline model with different combinations of social media variables and several models consisting entirely of social media variables (described below). These models are not exhaustive in the number of combinations that can be created due to time constraints, but they do serve as a substantial starting point for the analysis.
The QuoteMedia API and the Morningstar website are used to obtain the financial information that we need for analysis. In our analysis, Cridson Hexagon and company websites were used to obtain all social media variables.
Model variables (finance and social)
In our analysis, we collected data for a total of 17 different variables (12 financial variables and 5 social variables). The description of the financial variables and how we obtain/calculate these variables is as follows:
asset rate of Return (ROA) -we calculate ROA as: net income/total assets
These data were obtained using yearly report data of the Quotedia corporation.
Profitability-we calculate profitability as:
basic revenue per share/unadjusted closing price for a company on a reporting day
The base per-share revenue data is obtained using corporate yearly data provided by QuoteMedia. The non-adjusted closing price is also obtained from quadredia.
Account value revenue-we calculate account value revenue as: 1/price to book ratio
These data were obtained using yearly report data of the Quotedia corporation.
Sales revenue-we calculate sales revenue as:
total income/(total amount of issued common stocks x price of not adjusted closing price on financial report day)
The base per-share revenue data is obtained using corporate yearly data provided by QuoteMedia. The non-adjusted closing price is also obtained from quadredia.
Strand power fluctuation ratio-we calculate the strand power fluctuation ratio as follows:
first, we collected the closing prices that did not adjust for a company within 365 days of the reporting date (including the reporting date). Next, we calculate the difference between the closing price on a certain day and the closing price on the previous day, then divide the difference by the closing price on the previous day (i.e.,(closing price of dish)i+1-closing pricei) Price per closingiWhere i is 0-364). We did this for 365 days until the reporting date and took the standard deviation of these values. In summary, this can be described by the following equation:
stock volatility (standard deviation) (closing price)i+1-closing pricei) Price per closingi)
Wherein i is 0-364
The non-adjusted closing price was obtained from quadredia.
Maximum pull-we calculate the maximum pull-down as follows:
first, we collected the closing prices that did not adjust for a company within 365 days of the reporting date (including the reporting date). Then, we subtract the highest closing price from the lowest closing price and divide the difference by the highest closing price. In general terms, the number of active devices,
maximum pull-down (minimum closing price-maximum closing price)/maximum closing price
The unadjusted closing price was obtained from quadredia.
Average daily trades-the average daily trades we calculated is the average of the daily unadjusted shares trades over 365 days up to and including the annual reporting date). The number of unadjusted shares was obtained from quadredia.
Total revenue-the total revenue for each company is obtained directly from the QuoteMedia API output based on the annual report data for each company.
Market value-we calculate the market value as: and issuing the outward general stock total x with an unadjusted closing price.
These values were obtained using quadrdia on the day that each company submitted annual reports.
Enterprise value-we calculate the enterprise value as:
market value + priority shares + long-term debt + present-term debt + few stakeholders equity-cash and equivalents
These values were obtained from the annual reports of companies using QuoteMedia.
Business value/market value-we calculate the value by dividing the calculated business value by the calculated market value.
Department (Sector) ID-we get the department ID directly from the quotiemedia API.
The description of social media variables and how we obtain/compute these variables is as follows:
identity Score (Identity Score) -our computed Identity Score is the number of social media website links that each company displays on its host website-here, social media websites include Facebook, Twitter, Tumblr, L inkedIn, Google +, Pinterest, and instagram-because of time constraints, we have used the number of links on company websites up to 2016 month 2, assuming that companies have not added a large number of social media links to their websites since 2013.
Total Posts — this is the Total number of Posts that include the company's cash label (e.g., $ AMGN is Amgen's cash label). We have created a Buzz Monitor ("dada Huchen river; 2014 data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook and Tumblr from 1/2013 to 12/31/2014. For a given company, data is collected from 12 months prior to the company's annual reporting date. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Total Potential Impressions-this is the Total Potential impression created by posts including the company's cash label within 12 months prior to the company's annual reporting date. Data are from "city river under protection" by Crimson Hexagon; data "Buzz Monitor" 2014. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Post per Author-We calculated the post per Author as the total number of Twitter authors released during the period divided by the post per Author reported 12 months prior to the annual reporting date of the company. Data are from "city river under protection" by Crimson Hexagon; data "Buzz Monitor" 2014. Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 authors in this time frame, we manually set this value to 0 to avoid dividing by 0
Per Post impression (expressions per Post) -we calculate as follows:
total potential impressions (see description above)/total post volume (see description above) data from "city river under guard on crimson hexagon; data "Buzz Monitor" 2014.
Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 posts in this time frame, we manually set the value to 0 to avoid dividing by 0.
Company incorporation Standard
This section is intended to outline our selection process for companies involved in the analysis of the berm.
Specific information about the company itself can be obtained from "social Data of population offices dawn river focus benchmark" Word document and "mot _2014Data _ Master _ Matrix" Excel document. In 2016, month 1, we obtained a list of about 120 companies using the Morningstar website, which identified them as having either wide or narrow (narrow) or no berms. Then, we use the quotemia API to obtain the 12 financial variables described above (e.g., total revenue) directly, or to obtain the variables needed to compute these attributes (e.g., we have obtained a total common stock that was published and unadjusted, and then compute market values from these values). To remain in our analysis, the company must report all the variables needed to obtain the 12 financial input attributes of the model of the protected city river of its year 2014. Companies that we could not obtain all 12 attributes from the 2014 annual report are eliminated from our analysis. After filtration, a total of 59 companies were used in our final analysis. Of these companies, 23 were listed as having wide moats, 19 were listed as having narrow moats, and 17 were listed as not having moats.
Data acquisition
To obtain the 12 financial variables (or the components that make up these variables), we developed several internal Python scripts that download these data using the quotemia API. These scripts are summarized below. These scripts in the compressed folder "Model _ Code _2_24_ 16" are uploaded to Git. In this folder, these scripts can be found in a subdirectory named "Moat _ Model". All scripts are set to get data from the 10 most recent annual reports for a given company, but simple modification of the API calls will allow one to get more reports when needed. Before running these scripts, Python must be installed on the computer. In theory, Python2 or Python3 should work, but the data is obtained on the machine running Python 2.7. Furthermore, these codes rely on the import of multiple python modules. To view the modules required for each code, please use the text editor to open the script and view the first few lines of code.
The script and its purpose are as follows:
to run this code for a particular company list, please open the script using a text editor and paste into the company list separated by commas in line 27 of this script, for example, [ ' AMGN ', ' BIIB ', ' BX L T ]. brackets, apostrophes and commas are all necessary to run the code correctly.
The use method comprises the following steps: python getHistorcalROA. py > HistorcalROA. txt (note: "HistorcalSY. txt" can be changed to any desired filename)
get Historial early Yield. py-the script takes a list of company stock codes and returns the stock codes, earnings per share, unadjusted closing prices and reporting dates in tab-separated format. the earnings rate can be calculated in Excel as described above. to run this code for a particular company list, a text editor is used to open the script and paste into the company list separated by commas in line 27 of this script.
The use method comprises the following steps: python getHistorcalearningsYIeld. py > HistorcalEY. txt (note: "HistorcalEY. txt" can be changed to any desired filename)
To run this code for a particular company list, a text editor is used to open the script and paste into the comma separated company list in line 28 of the script.
The use method comprises the following steps: python getHistorcalBookValueYield. py > HistorcalBVY. txt (note: "HistorcalBVY. txt" can be changed to any desired filename)
To run this code for a particular company list, please use a text editor to open the script and paste into the company list separated by commas in line 27 of this script.
The use method comprises the following steps: python getHistorial SalesYoung. py > Historial SY. txt (note: "Historial SY. txt" can be changed to any desired filename)
get HistorcalalVolatinity _ MaximumDrawDown _ AverageVolume.py-this script takes a list of company stock codes and returns the stock codes, stock volatility, maximum consumption, average transaction volume, and report date in a tab-separated format to run this code for a particular company list, please use a text editor to open the script and paste into the company list separated by commas in line 40 of this script.
While this error still requires more problematic solutions, we suspect that the error was due to missing data. The simplest solution is to find the company that caused the error and remove it from the list of companies that are pasted on line 40. Due to time constraints, we cannot provide a fix for this error. Currently, the code is set to first identify the company that gave the error when the code was run. We first suggest to use this code by running "python gethistorian volume _ maximum draw down _ average volume. This prints the company and its data to the terminal. If the code exits, the company for which the code is intended can be viewed before exiting and the company that gave the error deleted. Once all the companies that give errors are deleted, they are added before the code on line 117 and then deleted before the code on line 123. The following code may then be used:
python getHistorial V latetilityMaximumDrawdown AverageV column. py > Historial V _ MD _ AV.txt (note: "Historial V _ MD _ AV.txt" can be changed to any file name desired)
To run this code for a particular company list, please open the script using a text editor and paste into the company list separated by a comma in line 27 of the script.
The use method comprises the following steps: python getHistorcalTotalRevenue. py > HistorcalTR. txt (note: "HistorcalTR. txt" can be changed to any desired filename)
Note that: this is technically redundant with the scripting of historical sales revenue.
Py-this script takes a list of company stock codes and returns the stock codes in tab-separated format, the total number of common stocks released outside, the price of the collection without adjustment and the date of reporting-market values can be calculated in Excel as described above-to run this code for a particular company list, a text editor is used to open the script and paste into the company list separated by commas in line 27 of this script-the example lists [ 'AMGN', 'BIIB', 'BX L T' ] -brackets, apostrophes and commas are all necessary to run the code correctly.
The use method comprises the following steps: python getHistorcalMarketCap. py > HistorcalMC. txt (note; "HistorcalMC. txt" may be changed to any desired filename)
Note that: this script is redundant with the script for obtaining historical sales revenue.
get Historial Enterprise value-this script uses a list of company stock codes to return stock codes in tab-separated format, total number of common shares released outside, unregulated closing prices, current debt, long-term debt, cash and equivalents, priority shares, few shareholders equity, and reporting date.
The use method comprises the following steps: python getHistorcalEnterprise value. py > HistorcalEV.txt (note: "HistorcalEV.txt" can be changed to any desired file name)
Py-the script takes a list of company stock codes and returns the stock codes and department ID. in a tab-delimited format to run the codes for a particular company list, and uses a text editor to open the script and paste into a comma-delimited company list in line 29 of the script.
The use method comprises the following steps: python getSector. py > Historincalcactor. txt (note: Historincalcactor. txt "can be changed to any desired file name)
Although mentioned in the section "model variables (financial and social)," we have used the following method in the analysis to obtain social media variables.
Here, social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram because of time constraints, we have used the number of links on a company's website up to 2016, assuming that companies have not added a large number of social media links to their website since 2013.
Total posting volume-this is the total number of postings that include the company's cash label (e.g., $ AMGN is Amgen's cash label). We have created a Buzz Monitor ("dada Huchen river; 2014 data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook and Tumblr from 1/2013 to 12/31/2014. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 31/12/2014, we get a total posting volume from 31/12/2013 to 31/12/2014. The total posting volume is obtained online from the monitor screen.
Total potential look-this is the total potential look generated by posts including company cash labels within 12 months prior to the company annual reporting date. We have created a Buzz Monitor ("dada Huchen river; 2014 data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook and Tumblr from 1/2013 to 12/31/2014. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 31/12/2014, we have a total potential look and feel from 31/12/2013 to 31/12/2014. We downloaded an excel file from Crimson Hexagon, which contains the total potential look and feel data on the web interface. In the Excel file, we add the number of potential impressions per day to derive the total potential impression for that time period.
Posting volume per author-we calculated the result as the posting volume 12 months before the company annual report date divided by the total number of Twitter authors released during that period. We have created a Buzz Monitor ("dada Huchen river; 2014 data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook and Tumblr from 1/2013 to 12/31/2014. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 31/12/2014, we have a total potential look and feel from 31/12/2013 to 31/12/2014. We downloaded an excel file from Crimson Hexagon, which contains data of total Twitter authors and average number of posts per author in a day. In an Excel file, we first multiply the number of Twitter authors released on a certain day by the average number of releases per author on that day to obtain the number of releases per day. Then, we add the total number of posts over the time period and divide it by the total number of Twitter authors over the time period to yield the post count for each author.
Look and feel of each post-we calculated in Excel by dividing the total potential look and feel by the total post volume (obtained as described above).
Model testing and results.
After we obtained all the financial and social media Data of the above 59 companies in the analysis, we generated a "heat _2014Data Master _ Matrix" Excel spreadsheet, which can be found on Confluence. The spreadsheet is too large to be included in the report, but it contains all the data as well as other details (e.g., cash labels, report dates, social data date ranges, etc.) that help to obtain further information about the company used in the modeling process. After generating the data matrix, we next created a baseline data matrix for the random forest model of "unprotected versus protected town river" (table 1) and the random forest model of "narrow protected versus wide protected town river" (table 2).
Table 1. A snapshot of an example matrix, where variables are entered into the model "No Moat versus Moat. Input variables are abbreviated for brevity. The variables that we sought to predict ("Moat") are highlighted in green. Although company names are not shown, each row corresponds to a particular company. ROA is asset profitability; EY is yield rate; SY is sales yield; BVY account value yield; eqVol ═ stock volatility; MD ═ maximum consumption; AV is the average transaction amount; TR is total revenue; MC is market capitalization; EV is enterprise value; EV mc is enterprise value/market capitalization. TR, MC, and EV, measured in dollars.
Moat ROA EY SY BVY EqVol MD AV TR MC EV EV_MC Sector
Moat 0.06 0.042 0.847 0.166 0.0146 -0.32 793565 1E+10 1.16E+10 1.32E+10 1.13179 Consumer_Cyclical
No_Moat 0 -0.23 0.818 1.053 0.0237 -0.52 1E+07 1E+10 1.25E+10 2.29E+10 1.82922 Basic_Materials
No_Moat 0 0.058 2.049 1.075 0.0251 -0.39 559.11 8E+09 3.84E+09 1.07E+10 2.78551 Industrials
No_Moat 0.06 0.037 0.541 0.485 0.0202 -0.32 639327 6E+08 1.16E+09 1.12E+09 0.96168 Technology
Moat 0.04 0.065 1.867 0.481 0.0138 -0.29 2E+06 6E+10 3.11E+10 3.82E+10 1.23046 Healthcare
Moat 0.08 0.049 0.256 0.285 0.0106 -0.22 23591 5E+10 1.84E+11 2.27E+11 1.23154 Consumet_Defensive
Moat 0.05 0.036 1.133 0.269 0.0192 -0.48 290600 1E+09 9.14E+08 1.06E+09 1.16206 Healthcare
No_Moat 0 0.046 0.151 0.152 0.0189 -0.36 941.27 2E+09 1.31E+10 1.21E+10 0.9253 Healthcare
Moat 0 0.043 0.235 0.154 0.0214 -0.24 500.99 4E+09 1.76E+10 1.93E+10 1.09982 Technology
No_Moat -0.1 -0.2 2.657 0.258 0.0268 -0.47 2E+07 6E+09 2.07E+09 3.48E+09 1.67908 Technology
Table 2. A snapshot of an example matrix, where variables are input into a "narrow moat versus wide moat" model. Input variables are abbreviated for brevity. The variables that we sought to predict ("the moat river") are highlighted in green. Although company names are not shown, each row corresponds to a particular company. ROA is asset profitability; EY is yield rate; SY is sales yield; BVY account value yield; eqVol ═ stock volatility; MD ═ maximum consumption; AV is the average transaction amount; TR is total revenue; MC is market capitalization; EV is enterprise value; EV mc is enterprise value/market capitalization. TR, MC, and EV, measured in dollars.
Moat ROA EY SY BVY EqVol MD AV TR MC EV EV_MC Sector
Narrow 0.0952 0.05 0.39 0.246 0.0081 -0.195 8E+05 8446000000 2.18E+10 2.4E+10 1.096615 Healthcare
Narrow 0.0814 0.06 0.5 0.476 0.0087 -0.257 8E+05 3563637000 7.19E+09 6.3E+09 0.875755 Technology
Wide 0.0128 0.02 6.97 0.116 0.0088 -0.23 2E+06 1.20E+11 1.72E+10 1.7E+10 1.010908 Healthcare
Wide 0.1471 0.05 0.18 0.044 0.0089 -0.361 7E+06 17945000000 9.74E+10 1.09E+11 1.116742 Consumer_Defensive
Narrow 0.0963 0.06 0.42 0.211 0.0091 -0.148 3E+06 16671000000 3.98E+10 4.6E+10 1.1605 Healthcare
Wide 0.0903 0.05 0.56 0.248 0.0092 -0.109 3E+06 24537000000 4.36E+10 4.6E+10 1.06595 Industrials
Narrow 0.0529 0.04 0.98 0.36 0.0092 -0.205 8E+05 4343500000 4.4E+09 5.7E+09 1.29284 Consumer_Cyclical
Wide 0.0516 0.1 1.22 0.366 0.0094 -0.148 3E+06 36066900000 2.96E+10 6.3E+10 2.122383 Industrials
Narrow 0.0325 0.05 0.45 0.467 0.0095 -0.297 5E+05 3350300000 7.37E+09 1.1E+10 1.552919 Utilities
Wide 0.1598 0.05 0.3 0.155 0.0095 -0.277 3E+06 31821000000 1.04E+11 1.09E+11 1.051577 Industrials
After establishing the baseline matrices, we continue to run a random forest model for each baseline matrix to compute the average accuracy of each baseline model. To do this, we developed an R Script named "Script _ for _ Running _ models.r". While we intend to describe this script in detail separately, we will briefly summarize how this script determines the average accuracy and standard deviation of the model accuracy. This Script is uploaded to Git's "Model _ Code _2_24_ 16" compressed folder and contained in this archived "Modeling _ Script" subdirectory.
The first step of this script involves importing the baseline data matrix (see, for example, tables 1 and 2). After loading the matrix, the code randomly selected 60% of the data for training and 40% of the data for testing. As an example, if we loaded table 1 (which has 10 rows of data) into the code, 6 rows of data would be randomly selected to train the random forest model, and 4 rows of data would be randomly selected for testing purposes. After training, the code predicts to which class each data point in the test data belongs, and then compares the prediction to the actual class of each data point. The accuracy of the model is then stored in a list and the above steps are repeated 99 more times for a total of 100 iterations. After 100 iterations, the code prints the average precision and the standard deviation of the precision. We plot and compare the average accuracy and average error (square root of standard deviation/sample size) of the models we tested.
After implementing the above modeling code, we found that the average accuracy of the baseline model of the "unprotected versus protected town river" model was 83.6% with a standard deviation of 6.7%, while the average accuracy of the baseline model of the "narrow versus wide protected town river" model was 71.9% with a standard deviation of 11.5%. Note that these are the accuracies of when we implement the model. If run again, highly similar but inaccurate results are likely to be obtained due to the random selection of training and testing data sets. The model of the relative guarded city river to the guarded city river has a no-information rate (NIR; random prediction) of 71.2 percent, and the model of the relative wide guarded city river to the narrow guarded city river has a no-information rate of 54.8 percent. If a person randomly guesses the nature of a company's town river, he or she will receive these rates.
After generating the baseline Model, we add the social media variables in different combinations to the baseline matrix (see the "Narrow _ v _ Wide _ mount _ Model _ Descriptions" and "No _ mount _ v _ mount _ Model _ Descriptions" documents for details). We generated a total of 23 different coverage models, including various combinations of baseline data plus social media data or social media data alone. Since other combinations of social media and baseline variables exist, we have not tested them, so the number of models tested is far from exhaustive. Nor did we explore combining subsets of baseline variables with social media variables. Therefore, we conclude below based on a limited subset of combinations derived from combinations of baseline variables (considered as "one" variables) and social media variables.
Using the above codes, we calculated the average accuracy and standard deviation of accuracy for each model and compared them to our baseline matrix. In our 23 models, model 8(M8) appeared to have a marginal increase in accuracy (85.5% accuracy) when predicting whether the company was "unshielded town river" or "protected town river" from baseline (83.6% accuracy). Model 8 includes baseline data plus the total potential look and feel and identity score. Several models built using social media alone appear to be indistinguishable between companies that are "unprotected cities and" protected cities. None of the models we overlaid with social media predicted narrower and wider berms better than the baseline model. However, several models built using only social media seem to be more predictive of "narrow" versus "wide" berms, rather than random berms. In view of these results, and the fact that we tested only a small number of different possible combinations in the analysis, these results indicate that further review of the quantitative city river social data forecast of the population administration is necessary.
Although the baseline and social media coverage models show promise in predicting the type of the berm, they are still separated in this regard in our analysis. To combine the predictions for each model into a single rating, we used all the data to train our random forest model of "unprotected versus protected town river" and "narrow protected versus wide protected town river" and then used the model to predict the probability of each company having no protected town river and the probability of having wide protected town river. Then, we calculate the berm river score using the method developed by Morningstar as follows:
Figure BDA0002522122000000601
our R modeling script for the precision analysis can be modified to generate probabilities for each company in the above equation. In our analysis, we first generated a dawn river score for each company based only on baseline data, which can be found in the "population office social data dawn river key benchmark" file. Since we observed that model 8 slightly improved the prediction accuracy for distinguishing companies with and without an economic river, we generated a probability of no river using model 8 for each company. We then combine these probabilities with the wide moat probabilities generated by the baseline "narrow moat versus wide moat" model to obtain the moat score for each company, taking into account their social media identity scores and overall potential look and feel. These scores for each company can be viewed in a "population office social data dawn river focus benchmark" file.
Finally, we asked how our baseline quantitative berm score isolates wide, narrow and berm companies. Using Excel, we calculated the percentile of each company based on its berm score. In this way, the top 23 companies should have a wide moat river, the lowest 17 companies should have no moat river, and the middle 19 companies should have a narrow moat river. As shown in table 3, our baseline model performed well in separating out different berm river types. To determine whether the social coverage model can improve the predictive power of the traditional model using the berm river scoring method, more data and further analysis are needed.
TABLE 3 evaluation of prediction ability of base line of dividing and protecting city and river
Figure BDA0002522122000000611
And (4) quantitative fair value social data prediction model overview.
Knowledge of the company's current and future cash flows is crucial to investors seeking to maximize return on investment. One way to estimate the future value of an investment is to calculate the equity price of the stock. When considering various stocks, it is advantageous to invest in the stocks that are underestimated. That is, if the current price of a stock is below its fair value estimate, it will be a good candidate for inclusion in a person's portfolio. Due to time and resource limitations, a fair-value social data prediction model cannot be fully implemented. However, we provide a summary of the work so far and indicate the steps that need to be taken to advance this model from the present stage.
Our fair value model methodology is based on Warren Miller4Written 2013 Morningstar methodology paper, if time allows us to complete the construction of the model, we will implement all modeling within R5 using the insert notation package. The companies we plan to use in the analysis are the same as we use in quantitative moat society data prediction. We obtained a 2013 method file for Morningstar during 2013 to 20143(as described below) in12 financial variables. These are also the same inputs to our quantitative city-river social data prediction method. During this time, we also obtained social media data (as described below). To predict more recent fair value prices, we also collected social media data from 2014 to 2015 and constructed code to obtain the 12 financial input variables that were recently available from QuoteMedia.
Figure BDA0002522122000000621
We also constructed a theoretical cash-on-demand flow (DCF) model that allows us to calculate the fair value price of a company. Unfortunately, we cannot acquire all variables (historical variables and current variables) in order to implement the model at the appointed time. Similar to our quantitative berm social data prediction, we will use a random forest model of 500 regression trees (details about the random forest model can be reported in Morningstar3Found) to predict the fair value price of the company in our analysis. Specifically, our goal is to predict the Fair Value Price (FVP) using 12 financial variables, which we calculate as:
log (0.0001+ DCF-based equity value estimate/current closing price)
After obtaining 12 financial variables and fair value estimates based on our DCF model, we will construct a matrix similar to that shown in table 4.
Table 4. An example baseline data matrix that would be used in a fair value price social data prediction model. In this table, "x" is calculated from the equity value estimate of the company's stock (obtained using the DCF model) divided by the closing price of the company on its annual newspaper date. The variables that we sought to predict ("FVP") are highlighted in green. FVP is the fair value price; EY is yield rate; SY is sales yield; BVY account value yield; eqVol ═ stock volatility; MD ═ maximum consumption; AV is the average transaction amount; TR is total revenue; MC is market capitalization; EV is enterprise value; EV mc is enterprise value/market capitalization. TR, MC, and EV, measured in dollars.
Figure BDA0002522122000000631
To test the accuracy of each model, we will randomly split our data into a training data set (60% of companies) and a test data set (40% of companies) (FIG. 14). After training the model with data from 60% of companies, we will use the model to calculate the fair value price. After calculating this price, we will take the absolute difference between the model's estimate of the fair value price and the actual fair value price generated by our DCF model. Since the accuracy of the model may vary due to randomly selecting training and test data, we will perform the above sequence of steps 100 times and take the mean and standard deviation of the differences across 100 trials. These values will allow us to assess whether the predicted values of the social coverage model are closer to the values generated by the DCF relative to a baseline model containing only financial information. For final rating, we will report the fair value prices given by our DCF model and the fair value prices generated by our random forest model.
The QuoteMedia API and the Morningstar website were used to obtain the financial input variables and company name, respectively, in our analysis. We also use the quotiedia API to obtain several variables needed to compute the output of our DCF model, and our goal is to use the quotiedia API to obtain the remaining variables needed to implement the DCF model. In our analysis, Cridson Hexagon and company websites were used to obtain all social media variables.
Description of DCF model
The cash flow on cash (DCF) is an estimation method for estimating the equity value of the investment or company in this case. DCF analysis predicts the future free cash flow and carries out cash pasting to obtain a present value estimated value.
We developed a two-stage DCF. We assume that in the first 5 years from now, a company's cash flow will grow at the same per-share growth rate as in the last 3 years, after which the company's cash flow will become a permanent cash flow, with a growth rate of 3%, roughly equivalent to the long-term growth rate of the U.S. economy.
Cash flow
The company's current free cash flow, in our model, is FCF0The method is calculated by subtracting the capital expenditure from the current operating cash flow.
FCF (cash from business) -CapEx (cash from business) -PPE purchase (intangible asset purchase)
G-basic EPS growth rate
We will perform a linear regression on the logarithms over the past 3 years (3 years ahead of the date we are interested in predicting the fair value of each revenue), and G is a coefficient. Based on this growth rate, we will get a cash flow for the next 5 years. Five years later, cash flow is assumed to continue to increase at a rate of 3% per year.
Rate of discount
Here we will use WACC (weighted average capital cost) as the discount rate, which is the average of the debt cost and the equity cost, weighted by the proportion of debt and equity. In general terms, the calculation method of the debt cost is as follows: dividing the interest expenditure by the average of the total debts for the given year and the previous year, wherein:
total debt is current debt + long-term debt + business bill
Figure BDA0002522122000000641
Equity cost is the expected return of stock for the company calculated by the CAPM model. Here we use 2% as risk free rate and 7.5% as market excess.
Ce=2%+beta*7.5%
Figure BDA0002522122000000642
Cash flow is discounted
We will first calculate the permanent value and add this to the value of the fifth year. Then we will discount the five years of interest to the time of interest.
Figure BDA0002522122000000651
Is an estimated long-term growth rate, here we use an intrinsic growth rate of 3%
Figure BDA0002522122000000652
G-basic EPS growth rate
Current stage of development
We have now developed code that takes most of the input variables needed for the DCF model from QuoteMedia. To complete the DCF model, we need to fully develop the code to obtain the variables needed for the basic EPS growth rate and use these variables to calculate the growth rate G.
Future direction
To complete the DCF model and implement fair value social data prediction, the population bureau needs to calculate the EPS growth rate (G) for each company used in these models. Next, the population bureau needs to build a benchmark fair value price matrix (the output of the DCF plus 12 financial input variables). And finally, the population bureau covers the social media variable on the baseline matrix to determine whether the addition of the social fairness data improves the accuracy of the baseline model by reducing the absolute difference between the fair value price prediction of the random forest model and the fair value price prediction of the DCF model.
Justice value model input variables (finance and social)
In our analysis, we collected data for a total of 17 different input variables (12 financial variables and 5 social variables). These input values are exactly the same as those used in our quantitative city river social data prediction model. The description of the financial variables and how we obtain/calculate these variables is as follows:
asset rate of Return (ROA) -we calculate ROA as: net income/total assets
These data were obtained using yearly report data of the Quotedia corporation.
Profitability-we calculate profitability as:
basic revenue per share/unadjusted closing price for a company on a reporting day
The base per-share revenue data is obtained using corporate yearly data provided by QuoteMedia. The non-adjusted closing price is also obtained from quadredia.
Book (Book) value revenue-we calculate Book value revenue as: 1/price to book ratio these data were obtained using yearly reported data from QuoteMadia corporation.
Sales revenue-we calculate sales revenue as:
total income/(total amount of issued common stocks x price of not adjusted closing price on financial report day)
The base per-share revenue data is obtained using corporate yearly data provided by QuoteMedia. The non-adjusted closing price is also obtained from quadredia.
Strand power fluctuation ratio-we calculate the strand power fluctuation ratio as follows:
first, we collected the closing prices that did not adjust for a company within 365 days of the reporting date (including the reporting date). Next, we calculate the difference between the closing price on a certain day and the closing price on the previous day, and then divide the difference by the closing price on the previous day (i.e., (closing price)i+1-closing pricei) Price per closingiWhere i is 0-364). We did this for 365 days until the reporting date and took the standard deviation of these values. In summary, this can be described by the following equation:
stock volatility (standard deviation) (closing price)i+1-closing pricei) Price per closingi)
Wherein i is 0-364
The non-adjusted closing price was obtained from quadredia.
Maximum pull-we calculate the maximum pull-down as follows:
first, we collected the closing prices that did not adjust for a company within 365 days of the reporting date (including the reporting date). Then, we subtract the highest closing price from the lowest closing price and divide the difference by the highest closing price. In general terms, the number of active devices,
maximum pull-down (minimum closing price-maximum closing price)/maximum closing price
The non-adjusted closing price was obtained from quadredia.
Average daily trades-the average daily trades we calculated is the average of the daily unadjusted shares trades over 365 days up to and including the annual reporting date). The number of unadjusted shares was obtained from quadredia.
Total revenue-the total revenue for each company is obtained directly from the QuoteMedia API output based on the annual report data for each company.
Market value-we calculate the market value as: and issuing the outward general stock total x with an unadjusted closing price.
These values were obtained using quadrdia on the day that each company submitted annual reports.
Enterprise value-we calculate the enterprise value as:
market value + priority shares + long-term debt + present-term debt + few stakeholders equity-cash and equivalents
These values were obtained from the annual reports of companies using QuoteMedia.
Business value/market value-we calculate the value by dividing the calculated business value by the calculated market value.
Department ID-we obtain the department ID directly from the quadredia API.
The description of social media variables and how we obtain/compute these variables is as follows:
here, social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram because of time constraints, we have used the number of links on a company's website up to 2016, assuming that companies have not added a large number of social media links to their website since 2013.
Total posting volume-this is the total number of postings that include the company's cash label (e.g., $ AMGN is Amgen's cash label). We have created a Buzz Monitor ("dada Huchen river; 2014 data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook and Tumblr from 1/2013 to 12/31/2014. For a given company, data is collected from 12 months prior to the company's annual reporting date. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Total potential look-this is the total potential look generated by posts including company cash labels within 12 months prior to the company annual reporting date. Data are from "city river under protection" by Crimson Hexagon; data "Buzz Monitor" 2014. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Posting volume per author-we calculated the result as the posting volume 12 months before the company annual report date divided by the total number of Twitter authors released during that period. Data are from "city river under protection" by Crimson Hexagon; data "Buzz Monitor" 2014. Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 authors in this time frame, we manually set this value to 0 to avoid dividing by 0
Look and feel per post-we calculate as follows:
total potential impressions (see description above)/total post volume (see description above) data from "city dawn on Crimson Hexagon; data "Buzz Monitor" 2014.
Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 posts in this time frame, we manually set the value to 0 to avoid dividing by 0.
Company incorporation Standard
This section is intended to outline our selection process for companies involved in fair value price analysis. Specific information about the companies can be obtained from a 'public value emphases basis for social data of population offices'. In 2016, month 1, we obtained a list of approximately 120 companies that were identified by Morningstar as having a wide, narrow, or no berm river using the Morningstar website. We use quotiedia to directly obtain 12 financial input variables (e.g., total revenue) or input variables needed to compute these attributes (e.g., we obtain the common stock total issued on the outside and unadjusted closing price and then compute the market value from these values). To remain in our analysis, a company must report all the variables needed to obtain the 12 financial input attributes of its year 2014 annual report. Companies that we could not obtain all 12 attributes from the 2014 annual report are eliminated from our analysis.
After this filtration step, 59 companies remained. We will further filter the company according to whether all variables needed for DCF analysis can be obtained. Companies lacking any variable will be excluded from our analysis. Thus, the list of 59 companies may become smaller.
Data acquisition
To obtain the 12 financial input variables (or the components that make up these variables), we developed several internal Python scripts that download these data using the quotemia API. These scripts are summarized below. These scripts in the compressed folder "Model _ Code _2_24_ 16" are uploaded to Git. In this folder, they can be found in a subdirectory named "model of the protected town river". All scripts are set up to get data from the 10 most recent annual reports for a given company, but simple modification of the API calls will allow the user to get more reports when needed. We have also developed a python script to capture several variables required for the DCF model, although if historical data (e.g., data from 2013 to 2014) needs to be used, more development of the script is required to capture the historical information. This code, named "getFairValueVars. Before running these scripts, Python must be installed on the computer. In theory, Python2 or Python3 should work, but the data is obtained on the machine running Python 2.7. Furthermore, these codes rely on the import of multiple python modules. To view the modules required for each code, please use the text editor to open the script and view the first few lines of code. The script and its purpose are as follows:
get FairValueVars. py-the script takes a list of company stock codes and returns a number of variables in a tab delimited format. While the first few variables returned by the code are theoretically the most recent financial input variables (i.e., "asset profitability," "sales profitability," "financial value profitability," "total revenue," "market value," "business value," "average daily volume of transactions," "equity volatility," and "maximum consumption"), it is preferable to use the other codes listed below to obtain the 12 financial input variables because the code is incomplete. This also applies to the other variables ("free cash flow", "total debt", "tax rate", "capital cost", "debt cost") returned by the code. To capture historical data for these variables, code to modify these variables is also needed. The variables returned by the code are:
"stock code", "Sector _ ID" (note: although this means "department ID", it returns a "template" type for which QuoteMedia is used for its company type and not the actual department, this is where it needs to be corrected in the code), "asset profitability", "sales profitability", "ledger value profitability", "gross profit", "market capitalization", "business value", "daily trading volume", "stock volatility", "maximum draw rate", "free cash flow", "gross debt", "tax rate", "equity cost", and "debt cost".
To run this code for a particular company list, please open the script using a text editor and paste into the comma separated company list in line 43 of this script.
The use method comprises the following steps: python getFairValueVars. py (note: 64 lines of 'QuoteMedia _ FairValue _ variables. tsv' can be changed to any file name required)
To run this code for a particular company list, please open the script using a text editor and paste into the company list separated by commas in line 27 of this script, e.g., [ 'AMGN', 'BIIB', 'BX L T' ], brackets, apostrophes, and commas are all necessary to run the code correctly.
The use method comprises the following steps: python getHistorcalROA. py > HistorcalROA. txt (note: "HistorcalSY. txt" can be changed to any desired filename)
get Historial early Yield. py-the script takes a list of company stock codes and returns the stock codes, earnings per share, unadjusted closing prices and reporting dates in tabloid separated format. the earnings rate can be calculated in Excel as described above. to run this code for a particular company list, please open the script using a text editor and paste into the company list separated by commas in line 27 of this script.
The use method comprises the following steps: python getHistorcalEimingsYield. py > HistorcalEY. txt (note: "HistorcalEY. txt" can be changed to any desired filename)
To run this code for a particular company list, a text editor is used to open the script and paste into the comma separated company list in line 28 of the script.
The use method comprises the following steps: python getHistorcalBookValueYield. py > HistorcalBVY. txt (note: "HistorcalBVY. txt" can be changed to any desired filename)
To run this code for a particular company list, please use a text editor to open the script and paste into the company list separated by commas in line 27 of this script.
The use method comprises the following steps: python getHistorial SalesYoung. py > Historial SY. txt (note: "Historial SY. txt" can be changed to any desired filename)
get HistorcalalVolatinity _ MaximumDrawDown _ AverageVolume.py-this script takes a list of company stock codes and returns the stock codes, stock volatility, maximum consumption, average transaction volume, and report date in a tab-separated format to run this code for a particular company list, please use a text editor to open the script and paste into the company list separated by commas in line 40 of this script.
While this error still requires more problematic solutions, we suspect that the error was due to missing data. The simplest solution is to find the company that caused the error and remove it from the list of companies that are pasted on line 40. Due to time constraints, we cannot provide a fix for this error. Currently, the code is set to first identify the company that gave the error when the code was run. We first suggest to use this code by running "python gethistorian volume _ maximum draw down _ average volume. This prints the company and its data to the terminal. If the code exits, the company for which the code is intended can be viewed before exiting and the company that gave the error deleted. Once all the companies that give errors are deleted, they are added before the code on line 117 and then deleted before the code on line 123. The following code may then be used: python getHistorcalVolatinity MaximumDrawDown AverageVolume.py > HistorcalV _ MD _ AV.txt (note: HistorcalV _ MD _ AV.txt' can be changed to any file name required)
To run this code for a particular company list, please open the script using a text editor and paste into the company list separated by a comma in line 27 of the script.
The use method comprises the following steps: python getHistorcalTotalRevenue. py > HistorcalTR. txt (note: "HistorcalTR. txt" can be changed to any desired filename)
Note that: this is technically redundant with the scripting of historical sales revenue.
Py-this script takes a list of company stock codes and returns the stock codes in tab-separated format, the total number of common stocks released outside, the price of the collection without adjustment and the date of reporting-market values can be calculated in Excel as described above-to run this code for a particular company list, a text editor is used to open the script and paste into the company list separated by commas in line 27 of this script-the example lists [ 'AMGN', 'BIIB', 'BX L T' ] -brackets, apostrophes and commas are all necessary to run the code correctly.
The use method comprises the following steps: python getHistorcalMarketCap. py > HistorcalMC. txt (note; "HistorcalMC. txt" may be changed to any desired filename)
Note that: this script is redundant with the script for obtaining historical sales revenue.
get Historial Enterprise value-this script uses a list of company stock codes to return stock codes, general stock totals published outside, unregulated closing prices, current debts, long-term debts, cash and equivalents, priority stocks, few shareholders equity and reporting date in a tab-separated format As described above, business value can be calculated in Excel using these variables.
The use method comprises the following steps: python getHistorcalEnterprise value. py > HistorcalEV.txt (note: "HistorcalEV.txt" can be changed to any desired file name)
Py-the script takes a list of company stock codes and returns the stock codes in a tab-delimited format and the department ID. wants to run this code for a particular company list, please use a text editor to open the script and paste into a comma-delimited company list in line 29 of this script.
The use method comprises the following steps: python getSector. py > Historincalcactor. txt (note: Historincalcactor. txt "can be changed to any desired file name)
Although mentioned in the section "model variables (financial and social)," we have used the following method in the analysis to obtain social media variables.
Here, social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram because of time constraints, we have used the number of links on a company's website up to 2016, assuming that companies have not added a large number of social media links to their website since 2013.
Total posting volume-this is the total number of postings that include the company's cash label (e.g., $ AMGN is Amgen's cash label). We have created a Buzz Monitor ("dada Huchen river; 2014 data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook and Tumblr from 1/2013 to 12/31/2014. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 31/12/2014, we get a total posting volume from 31/12/2013 to 31/12/2014. The total posting volume is obtained online from the monitor screen.
Total potential look-this is the total potential look generated by posts including company cash labels within 12 months prior to the company annual reporting date. We have created a Buzz Monitor ("dada Huchen river; 2014 data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook and Tumblr from 1/2013 to 12/31/2014. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 31/12/2014, we have a total potential look and feel from 31/12/2013 to 31/12/2014. We downloaded an excel file from Crimson Hexagon, which contains the total potential look and feel data on the web interface. In the Excel file, we add the number of potential impressions per day to derive the total potential impression for that time period.
Posting volume per author-we calculated the result as the posting volume 12 months before the company annual report date divided by the total number of Twitter authors released during that period. We have created a Buzz Monitor ("dada Huchen river; 2014 data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook and Tumblr from 1/2013 to 12/31/2014. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 31/12/2014, we have a total potential look and feel from 31/12/2013 to 31/12/2014. We downloaded an excel file from Crimson Hexagon, which contains data of total Twitter authors and average number of posts per author in a day. In an Excel file, we first multiply the number of Twitter authors released on a certain day by the average number of releases per author on that day to obtain the number of releases per day. Then, we add the total number of posts over the time period and divide it by the total number of Twitter authors over the time period to yield the post count for each author.
Look and feel of each post-we calculated in Excel by dividing the total potential look and feel by the total post volume (obtained as described above).
Z-model social data prediction
Model overview.
The financial status of a company is the most important time to construct an investment portfolio. One particularly troublesome situation for investors and companies is financial bankruptcy. For investors, poor financial conditions and bankruptcy can result in significant losses if the investor anticipates that a company will grow, but not that the health of the company will decline. In the world of original companies, investors should pay special attention to the financial status of their investments, since 55% of original companies have been closed in the first 5 years of operation6
Z-model analysis is utilized to determine whether social fairness data can be used alone or in combination with existing models to better predict a company's reimbursement ability risk (whether the company will apply for bankruptcy). Our method is based on the study of EdwardAltman in 1968, which uses linear discriminant analysis to assess the health status of companies7. All our model tests were performed using the insertion packet in R8. Although we makeCompanies applying for bankruptcy during 2007 to 2014 were identified with Compustat (see the "population office Z Model focuss benchmark" document for more information about the companies used in our analysis), but our final analysis included companies applying for bankruptcy during 2011 to 2014 due to time and resource limitations. Our Z-model analysis included a total of 50 companies (24 bankruptcy companies and 26 non-bankruptcy companies).
Figure BDA0002522122000000751
We used the method of quotiemia API in combination with gurufocus. com to obtain 5 financial ratios that Edward Altman determined to be predictive of bankruptcy in the 1968 study (see below for details). Because our goal is to predict bankruptcy, our analysis is limited to data obtained from annual financial reports of the year prior to the calendar year in which the company applies for bankruptcy. In other words, if a company applies for bankruptcy in 2014, we have acquired financial and social variables from 2012 to 2013. Data is collected 12 months prior to the company yearbook date (e.g., if the company submits the yearbook on 12/31 2013, financial and social media data is collected from 12/31 2012 to 12/31 2013).
After collecting the data and organizing it into baseline (financial ratio only), coverage (financial ratio plus social media data), and social media (social media only) matrices, we apply linear discriminant analysis to each model we created (fig. 15). To test the accuracy of each model, we randomly separated our data into a training data set (60% of companies) and a test data set (40% of companies). Once we trained the linear discriminant model, we used the model to classify the remaining 40% of companies and calculated the resulting accuracy of the model predictions. Since the accuracy of the model can vary due to the random selection of training and test data, we performed the above sequence of steps 100 times and took the mean and standard deviation of 100 trials as our final accuracy score. We train our discriminant model with all the data in the baseline matrix and then use the coefficients given by the model to generate the Z-score for each company. This calculation can be summarized as:
z fraction ═ C1x R1+C2x R2+C3x R3+C4x R4+C5x R5
Where "C" corresponds to the coefficient given by our model and "R" corresponds to 1 of the 5 Altman ratios (described later). In the above equation, the coefficients can be obtained directly from the interpolated symbol packet within R.
When overlaying social media variables, we use the same approach as described above. The main difference between the baseline model and the social media overlay model is the matrix we provide to the linear discriminant analysis function. In total, we constructed 24 different models (model descriptions provided separately) consisting of a baseline model with different combinations of social media variables and several models consisting entirely of social media variables (described below). These models are not exhaustive in the number of combinations that can be created due to time constraints, but they do serve as a substantial starting point for the analysis.
The quotemia API and the gurufocus.com website are used to obtain the financial information that we need for analysis. In our analysis, Cridson Hexagon and internet archives were used to obtain all social media variables.
Model variables (finance and social)
In our analysis, we collected data for a total of 10 different variables (5 financial variables and 5 social variables). The description of the financial variables and how we obtain/calculate these variables is as follows:
operating capital/total assets
These data were obtained using yearly report data of the Quotedia corporation.
Retention of revenue/Total assets
These data were obtained using yearly report data of the Quotedia corporation.
Profit/total asset before tax return
These data were obtained using yearly report data of the Quotedia corporation. Note that: if there is no revenue before tax interest, our code (described later) attempts to calculate the ratio using revenue depreciation amortization profit (EBITDA).
Equity market value/total liability-although we later developed code, using yearbook data (see "gethistorical marketcap. py" code description) to download variables needed to calculate equity market value (i.e., market value ═ total outstanding stock x annual newspaper day unadjusted closing price), our initial and final almman demonstrations used the ratios provided by gurufocus.
Sales/total assets
These data were obtained using yearly report data of the Quotedia corporation.
The description of social media variables and how we obtain/compute these variables is as follows:
here, the social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram under 7 building blocks of social media, which will be classified as belonging to an identity block.
Total posting volume-this is the total number of postings that include the company's cash label (e.g., $ AMGN is Amgen's cash label). We have created a Buzz Monitor ("Solvency and Z") on Crimson Hexagon, searching for the use of company cash labels on Twitter, Facebook, and Tumblr from 5 months and 23 days 2008 to forever. For a given company, data is collected from 12 months prior to the company's annual reporting date. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Total potential look-this is the total potential look generated by posts including company cash labels within 12 months prior to the company annual reporting date. Data are from "Solvency and Z" Buzz Monitor on Crimson Hexagon. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Posting volume per author-we calculated the result as the posting volume 12 months before the company annual report date divided by the total number of Twitter authors released during that period. Data are from "Solvency and Z" BuzzMonitor on Crimson Hexagon. Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 authors in this time frame, we manually set this value to 0 to avoid dividing by 0
Look and feel per post-we calculate as follows:
total latent impression (see description above)/Total position (see description above) data were obtained from "Solvency and Z" Buzz Monitor on Crimson Hexagon.
Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 posts in this time frame, we manually set the value to 0 to avoid dividing by 0
Company incorporation Standard
This section is intended to outline our selection process for companies involved in the analysis of the berm. Specific information about the company itself can be obtained from the "population office Social Data Z Model Fonchmark" Word document and the "Z _ Model _ MasterMatrix" Excel document. We used Compustat to identify those companies that applied for bankruptcy between 2007 and 2014. Then, we calculate/obtain 4 of the above 5 financial variables (e.g., working capital/total assets) using the quotiedia API, and obtain exclusively equity market value/total liability using gurufocus. To remain in our analysis, a company must be classified by QuoteMedia as a company with a template type of "N". This allows us to filter out financial institutions, where the Altman Z ratio does not apply. In addition, a company must have all 5 financial rate data in the year before bankruptcy to continue the analysis. Companies that did not have access to all rates in the year before parity or did not have the quote media api template type "N" were removed from our analysis. To improve efficiency, we used companies in the quantitative city river social data prediction model (a list of companies obtained through the Morningstar website) as health controls. As before, companies must meet template and financial rate requirements in order to remain in our analysis. After filtration, a total of 50 companies were used in our final analysis. Wherein 24 companies apply for bankruptcy in 2011-2014, and 26 companies do not have bankruptcy.
Data acquisition
To obtain 5 financial variables (or the components that make up these variables), we developed several internal Python scripts that download these data using the quotemia API. These scripts are summarized below. These scripts in the compressed folder "Model _ Code _2_24_ 16" are uploaded to Git. In this folder, these scripts can be found in a subdirectory named "Z _ Model". All scripts are set to get data from the 10 most recent annual reports for a given company, but simple modification of the API calls will allow one to get more reports when needed. Before running these scripts, Python must be installed on the computer. In theory, Python2 or Python3 should work, but the data is obtained on the machine running Python 2.7. Furthermore, these codes rely on the import of multiple python modules. To view the modules required for each code, please use the text editor to open the script and view the first few lines of code.
The script and its purpose are as follows:
get _ Altman _ WC _ TA _ RE _ TA _ EBIT _ TA _ Total L properties _ Sales _ TA _ py-this script takes a list of company stock codes and company names, stock codes, operating capital/Total assets, reserve income/Total assets, income/Total assets before tax interest, Total liabilities, Sales/Total assets, and report dates to get these ratios.
Use of python get _ Altman _ WC _ TA _ RE _ TA _ EBIT _ TA _ Total L properties _ salts _ TA. py (note: default output filename "almman _ rates. tsv" can be changed to the desired filename by adding "-o" and the specified output filename)
get HistorcalalMarketCap. py-although we used Gurufocus. com to get stock market value/total liability ratio, we later developed this script to get these data from the QuoteMedia API. this script takes the company stock code list and returns the stock code in tab separated format, the total number of common stocks issued, the current closing price and the reporting date.
The use method comprises the following steps: python getHistorcalMarketCap. py > HistorcalMC. txt (note; "HistorcalMC. txt" may be changed to any desired filename)
Although mentioned in the section "model variables (financial and social)," we have used the following method in the analysis to obtain social media variables.
Here, social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram for this model, we use Internet Archive9To view historical snapshots of the company's web site (finding that the web site is conducting secondary research) in order to find a historical identity score. Specifically, if a company applies for bankruptcy in 2014 and submits an annual report in 2013 on 12 and 31 monthsTo the company website, as close as possible to 12 months and 31 days 2013. If we cannot find enough snapshots for the month in which we submitted the annual report for the company, we move to a date closer to the present. We do this because the closer one gets to the present, the more snapshots in the archive. If we cannot find a link on a company's web page, or the company has no web pages at any time near the date of report submission (approximately 1-2 years), then the company's score is 0. Finally, our search for web sites typically includes "home", "media pages" (if any) and "contact us" pages. Therefore, our search for links is not exhaustive. Under 7 building blocks of social media, this would be classified as belonging to an identity block.
Figure DA00025221220036671
Total posting volume-this is the total number of postings that include the company's cash label (e.g., $ AMGN is Amgen's cash label.) we have created a Buzz Monitor ("Solvency and Z") on Crimson hexgon, which searches for company cash label usage from 23/5/2008 to today on Twitter, Facebook, and Tumblr.
Due to time and resource constraints, we cannot incorporate this new call into existing code, but the population office may wish to explore this possibility for future modeling.
For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect corporate and time-specific data, we created a filter in the Buzz Monitor using the cash label of the corporation (usually two cash labels of bankruptcy companies). We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 2013, 12, 31 (bankruptcy 2014), we get a total member amount from 2012, 12, 31 to 2013, 12, 31. The total posting volume is obtained online from the monitor screen.
Total potential look-this is the total potential look generated by posts including company cash labels within 12 months prior to the company annual reporting date. From 23/5 in 2008 to date, we created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of company cash labels on Twitter, Facebook, and Tumblr. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 12/31/2013, we get a total potential look and feel from 12/31/2012 to 12/31/2013. We downloaded an excel file from Crimson Hexagon, which contains the total potential look and feel data on the web interface. In the Excel file, we add the number of potential impressions per day to derive the total potential impression for that time period.
Posting volume per author-we calculated the result as the posting volume 12 months before the company annual report date divided by the total number of Twitter authors released during that period. From 23/5 in 2008 to date, we created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of company cash labels on Twitter, Facebook, and Tumblr. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 12/31/2013, we get a total potential look and feel from 12/31/2012 to 12/31/2013. We downloaded an excel file from Crimson Hexagon, which contains data of total Twitter authors and average number of posts per author in a day. In an Excel file, we first multiply the number of Twitter authors released on a certain day by the average number of releases per author on that day to obtain the number of releases per day. Then, we add the total number of posts over the time period and divide it by the total number of Twitter authors over the time period to yield the post count for each author. If the company has 0 authors in the period of time the data is collected, the value is set to 0 to avoid dividing by 0
Look and feel of each post-we calculated in Excel by dividing the total potential look and feel by the total post volume (obtained as described above). If the company has 0 posts within the time period of collecting data, the value is set to 0 to avoid dividing by 0
Model testing and results
After we obtained all the financial and social media data of the 50 companies mentioned above in the analysis, we generated a "ZModel MasterMatrix" Excel spreadsheet, which can be found on Confluence. The spreadsheet is too large to be included in the report, but it contains all the data as well as other details (e.g., cash labels, report dates, social data date ranges, etc.) that help to obtain further information about the company used in the modeling process. After generating this data matrix, we created a baseline data matrix for the Z model (Table 5).
Table 5. snapshot of the Z model baseline matrix. input variables are abbreviated for brevity.
Bankruptcy WC_TA RE_TA EBIT_TA MVE_TL SA_TA
Nonbankrupt 0.375676 0.053668 0.193986 3.393 0.643537
Nonbankrupt 0.742144 0.674134 0.149449 8.0367 0.480601
Nonbankrupt 0.382911 0.97189 0.084966 6.3662 0.700166
Nonbankrupt -0.04897 0.218853 0.152337 1.8667 0.291933
Bankrupt 0.5479 0.269109 0.02366 0.209 2.019867
Bankrupt -0.85481 -90.8125 -6.58448 10.6539 4.214631
Nonbankrupt 0.217 -1.54 -0.2605 0.4943 1.3555
Nonbankrupt 0.293686 -0.11545 0.088726 1.951037725 0.282435
Bankrupt -0.81325 -1.5766 -0.25859 0.0606 0.736392
Bankrupt -1.69524 -3.05019 -0.08692 0 0.804718
Nonbankrupt 0.011849 0.540412 0.083596 2.0823 0.570327
After establishing the baseline matrix, we perform linear discriminant analysis on the baseline matrix to calculate the average accuracy of our baseline model. To do this, we developed an R Script named "Script _ for _ Running _ models.r". While we intend to describe this script in detail separately, we will briefly summarize how this script determines the average accuracy and standard deviation of the model accuracy. This Script is uploaded to Git's "Model _ Code _2_24_ 16" compressed folder and contained in this archived "Modeling _ Script" subdirectory.
The first step of this script involves importing a baseline data matrix (see, e.g., table 5). After loading the matrix, the code randomly selected 60% of the data for training and 40% of the data for testing. As an example, if we loaded table 5 (which has 10 rows of data) into the code, 6 rows of data would be randomly selected to train the linear discriminant model, and 4 rows of data would be randomly selected for testing purposes. After training, the code predicts to which class each data point in the test data belongs, and then compares the prediction to the actual class of each data point. The accuracy of the model is then stored in a list and the above steps are repeated 99 more times for a total of 100 iterations. After 100 iterations, the code prints the average precision and the standard deviation of the precision. We plot and compare the average accuracy and average error (square root of standard deviation/sample size) of the models we tested.
After implementing the above modeling code, we found that the average accuracy of our baseline model was 84.5% with a standard deviation of 8.4%. Note that these are the accuracies of when we implement the model. If run again, highly similar but inaccurate results are likely to be obtained due to the random selection of training and testing data sets. The no-information rate (NIR; random prediction) of the Z model was 52%. If a person randomly guesses the nature of a company's town river, he or she will receive these rates.
After generating the baseline Model, we add the social media variables in different combinations to the baseline matrix (see "Z _ Model _ Descriptions" document for details). We generated a total of 24 different coverage models, including various combinations of baseline data plus social media data or social media data alone. Since other combinations of social media and baseline variables exist, we have not tested them, so the number of models tested is far from exhaustive. Nor did we explore combining subsets of baseline variables with social media variables. Therefore, we conclude below based on a limited subset of combinations derived from combinations of baseline variables (considered as "one" variables) and social media variables.
Using the above codes, we calculated the average accuracy and standard deviation of accuracy for each model and compared them to our baseline matrix. Of our 24 models, model 15(M15) appeared to have a marginal increase in accuracy (88.2% accuracy; 8.0% standard deviation) relative to baseline (84.5% accuracy) when predicting whether the company will apply for demolition a year ahead. The model 15 includes baseline data plus the total potential look and feel and the total amount of posts. Several models built using social media alone appear to be more predictive of bankruptcy than stochastic models. In view of these results, and the fact that we tested only a small fraction of the different possible combinations in the analysis, our data suggests that further examination of Z-model social data predictions by the demographic bureau is necessary.
While the baseline and social media coverage models show promise in predicting bankruptcy, we compared the baseline Z model with the model of Edward Altman for the prediction of our dataset. To do this, we first calculate the Z-score for each company using the coefficients provided by our model and the coefficients given by the almman Z model. Our model gives the coefficients: 0.782 (operating capital/gross assets), -0.129 (reserved income/gross assets), 2.396 (profit before tax interest/gross assets), 0.169 (equity/gross liability) and 0.0114 (sales/gross assets). The coefficients of the Z-model of Altman are: 1.2 (operating capital/gross assets), 1.4 (reserved income/gross assets), 3.3 (pre-interest profit/gross assets), 0.6 (equity/gross liability) and 1 (sales/gross assets).
After calculating the Z score, we then converted the Z score for each company to a percentile score based on the method of Morningstar for calculating percentile grades (high scoring companies get lower percentile scores, low scoring companies get higher percentile scores)10. Briefly described:
Figure DA00025221220036787
rounding down ((99x (i-l)/(n-1) +1))
Where "round down" refers to a function of Microsoft Excel rounding down the values to the nearest integer, "n" is the total number of observations (i.e., the total number of companies in the analysis), "i" is the absolute rank per observation (obtainable by the "rank" function of Excel). After obtaining the percentile rank for each company, we calculate the percentile rank over all the percentile ranksCumulative failure frequencies are accumulated, and a graph of cumulative failure frequencies is plotted against percentile rank. We find our model to be similar to that of Altman. However, the population authority may want to consider calculating the accuracy ratio of each model as described by Warren Miller in Morningstar report of 12 months 2009, in order to make a more quantitative comparison of our baseline Z model with the Altman Z model 1111
Figure BDA0002522122000000851
Reimbursement ability scoring social data prediction
Overview of the model
Corporate financial status is a key factor in designing portfolios to achieve maximum return. For investors, poor financial conditions and bankruptcy can result in significant losses if the investor anticipates that a company will grow, but not that the health of the company will decline. The ability to predict whether a company will break production is a valuable asset to investors, regardless of one's investment strategy. This is particularly relevant in the world of pioneer companies, where about 55% of them fail within the first 5 years of operation12
Figure BDA0002522122000000852
In addition to the Z model analysis, a repayment capacity score social data prediction model is also established to determine whether social equity data can be used alone or in combination with existing models to better predict a company's repayment capacity risk (whether the company will apply for bankruptcy). Our method is based on the 2009 12 month methodology paper by Morningstar (by WarrenMiller)13Written), all of our model tests were performed using the insertion package in R14. Although we have identified companies applying for bankruptcy during the period from 2007 to 2014 using Compustat (for more information on the companies we used in our analysis, see "population bureau Solvency Score Focus Benchmark "document), but our final analysis includes companies applying for bankruptcy due to time and resource limitations during 2011 to 2014. Our repayment ability score analysis included a total of 49 companies (23 bankruptcy companies and 26 non-bankruptcy companies). These 49 companies were also used for our Z model social data prediction.
We obtained 3 financial variables from the 4 financial ratios (see below) used by Morningstar in its 2009 reimbursement ability scoring method12. Because our goal is to predict bankruptcy, our analysis is limited to data obtained from annual financial reports of the year prior to the calendar year in which the company applies for bankruptcy. In other words, if a company applies for bankruptcy in 2014, we have acquired financial and social variables from 2012 to 2013. Data is collected 12 months prior to the company yearbook date (e.g., if the company submits the yearbook on 12/31 2013, financial and social media data is collected from 12/31 2012 to 12/31 2013).
After collecting the data and organizing it into baseline (financial ratio only), coverage (financial ratio plus social media data), and social media (social media only) matrices, we applied logistic regression analysis to each model we created. To test the accuracy of each model, we randomly separated our data into a training data set (60% of companies) and a test data set (40% of companies). Once we trained our logistic regression model, we used the model to classify the remaining 40% of companies and calculated the resulting accuracy of the model predictions. Since the accuracy of the model can vary due to the random selection of training and test data, we performed the above sequence of steps 100 times and took the mean and standard deviation of 100 trials as our final accuracy score. We train our logistic regression model with all the data in the baseline matrix and then use the coefficients given by the model to generate the reimbursement ability score for each company. This calculation can be summarized as: repayment ability score ═ C1x V1+C2x V2+C3x V3+Y,
Where "C" corresponds to the coefficient given by our model, "V" corresponds to 1 of the 3 variables derived from the 4 ratios described above (described in detail later) and "Y" corresponds to the Y-intercept. In the above equation, the coefficients can be obtained directly from the interpolated symbol packet within R.
When overlaying social media variables, we use the same approach as described above. The main difference between the baseline model and the social media overlay model is the matrix we provide to the logistic regression function. In total, we constructed 23 different models (model descriptions provided separately) consisting of a baseline model with different combinations of social media variables and several models consisting entirely of social media variables (described below). These models are not exhaustive in the number of combinations that can be created due to time constraints, but they do serve as a substantial starting point for the analysis.
The QuoteMedia API is used to obtain the financial information that we need for analysis. In our analysis, cridson hexagon and internet archives were used to obtain all social media variables.
Model variables (finance and social)
In our analysis, we collected data for a total of 8 different variables (3 financial variables and 5 social variables). The description of the financial variables and how we obtain/calculate these variables is as follows:
square root (T L TA)pX EBIEp)-
We shall turn T L TApThe calculation is as follows:
percentile of company total liability/total asset (percentile (total liability/total asset)).
We will use EBIEpThe calculation is as follows:
101-percentile of company interest, tax, depreciation, and pre-amortization revenue/interest expenditure (101-percentile (EBITDA/interest expenditure)). The percentile is calculated as follows:
rounded down ((99x (i-l)/(n-1) +1)),
where "round down" refers to a function of Microsoft Excel rounding down the values to the nearest integer, "n" is the total number of observations (i.e., the total number of companies in the analysis), "i" is the absolute rank per observation (obtainable by the "rank" function of Excel).
These data are corporate yearbook data obtained from quadrdia and further processed in Excel.
QRpWe will QRpThe calculation is as follows:
101-snap ratioPercentile of (c).
We calculate the snap ratio as:
quick action ratio (flowing asset-inventory)/flowing liability
These data are corporate yearbook data obtained from quadrdia and further processed in Excel.
ROICpWe will ROICpThe calculation is as follows:
101-return on investment capitalPercentile.
We calculate the return on investment capital as:
return on investment capital (net income-dividend)/total capitalization
These data are corporate yearbook data obtained from quadrdia and further processed in Excel.
The description of social media variables and how we obtain/compute these variables is as follows:
here, the social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram under 7 building blocks of social media, which will be classified as belonging to an identity block.
Total posting volume-this is the total number of postings that include the company's cash label (e.g., $ AMGN is Amgen's cash label). We have created a Buzz Monitor ("Solvency and Z") on Crimson Hexagon, searching for the use of company cash labels on Twitter, Facebook, and Tumblr from 5 months and 23 days 2008 to forever. For a given company, data is collected from 12 months prior to the company's annual reporting date. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Total potential look-this is the total potential look generated by posts including company cash labels within 12 months prior to the company annual reporting date. Data are from "Solvency and Z" Buzz Monitor on Crimson Hexagon. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Posting volume per author-we calculated the result as the posting volume 12 months before the company annual report date divided by the total number of Twitter authors released during that period. Data are from "Solvency and Z" BuzzMonitor on Crimson Hexagon. Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 authors in this time frame, we manually set the value to 0 to avoid dividing by 0.
Look and feel per post-we calculate as follows:
total latent impression (see description above)/Total position (see description above) data were obtained from "Solvency and Z" Buzz Monitor on Crimson Hexagon.
Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 posts in this time frame, we manually set the value to 0 to avoid dividing by 0.
Company incorporation Standard
This section is intended to outline our selection process for companies involved in the analysis of the berm. Specific information about the company itself is available from the "demographic Data solvance Score Focus Benchmark" Word document and the "solvance _ Score Master _ Matrix _ Final" document. We used Compustat to identify those companies that applied for bankruptcy between 2007 and 2014. Then, we use the quotiedia API to compute/obtain data for the above 3 financial variables. Due to time constraints, we decided to use the same company in the Z-model analysis in the reimbursement ability score analysis. Therefore, to remain in our analysis, a company must be classified by QuoteMedia as a template type of "N". This allows us to filter out financial institutions, where the Altman Z ratio does not apply. However, Morningstar2009, 12 months of reimburseability score analysis included financial institutions, and the demographic should know this information (see the link in footnote #12 for more details on the Morningstar method). In addition, a company must obtain all 4 financial rate data through QuoteMedia one year prior to bankruptcy in order to continue the analysis. Companies that did not have access to all rates in the year before parity or did not have the quote media API template type "N" were excluded from our analysis. To improve efficiency, we used companies in the quantitative city river social data prediction model (a list of companies obtained through the Morningstar website) as health controls. As before, companies must meet template and financial rate requirements in order to remain in our analysis. After filtration, a total of 49 companies were used in our final analysis. Among them, 23 companies apply for bankruptcy in 2011 to 2014, and 26 companies do not have bankruptcy.
Data acquisition
To obtain the data needed to build the 3 financial variables (or the components that make up these variables), we developed an internal Python script that downloaded the data using the quotemia API. The script is summarized as follows. The script is uploaded into the Git's "Model _ Code _2_24_ 16" compressed folder. In this folder, they can be found in a subdirectory named "Solvency _ Model". All scripts are set up to get data from the 10 most recent annual reports for a given company, but simple modification of the API calls will allow the user to get more reports when needed. Before running these scripts, Python must be installed on the computer. In theory, Python2 or Python3 should work, but the data is obtained on the machine running Python 2.7. Furthermore, these codes rely on the import of multiple python modules. To view the modules required for each code, please use the text editor to open the script and view the first few lines of code.
The script and its purpose are as follows:
GetHistorcalalRawSolvecyScoreVariables. py-the script takes a list of company stock codes and returns company names, stock codes, total liabilities, total assets, discounted interest and pre-amortization profits, interest expenditures, liquidity, inventory, liquidity, net income, cash dividends, total capitalization, and report dates on which to obtain these data.
The use method comprises the following steps: py gethistorian raw vendor variables (note: the default output filename "quatemedia _ solvance _ score _ health.tsv" can be altered to any filename desired by adding "-o" after the specified output filename.)
Although mentioned in the section "model variables (financial and social)," we have used the following method in the analysis to obtain social media variables.
Here, social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram for this model, we use Internet Archive15To view historical snapshots of the company's web site (finding that the web site is conducting secondary research) in order to find a historical identity score. Specifically, if a company applies for bankruptcy in 2014 and submits an annual report in 2013 on 12/31, we try to find a snapshot of the company's website that is as close as possible to 2013 on 31/12. If we cannot find enough snapshots for the month in which we submitted the annual report for the company, we move to a date closer to the present. We do this because the closer one gets to the present, the more snapshots in the archive. If we cannot find a link on a company's web page, or the company is near the report submission dateAt no time, no web pages (approximately 1-2 years), the company has a score of 0. Finally, our search for web sites typically includes "home", "media pages" (if any) and "contact us" pages. Therefore, our search for links is not exhaustive. Under 7 building blocks of social media, this would be classified as belonging to an identity block.
Figure DA00025221220036907
Total posting volume-this is the total number of postings that include the company's cash label (e.g., $ AMGN is Amgen's cash label). from 23/5/2008 we have created a BuzzMonitor on Crimson hexgon ("Solvency and Z") that searches for the use of the company's cash label on Twitter, Facebook, and Tumblr.
For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect corporate and time-specific data, we created a filter in the Buzz Monitor using the cash label of the corporation (usually two cash labels of bankruptcy companies). We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 2013, 12, 31 (bankruptcy 2014), we get a total member amount from 2012, 12, 31 to 2013, 12, 31. The total posting volume is obtained online from the monitor screen.
Total potential look-this is the total potential look generated by posts including company cash labels within 12 months prior to the company annual reporting date. From 23/5 in 2008 to date, we created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of company cash labels on Twitter, Facebook, and Tumblr. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 12/31/2013, we get a total potential look and feel from 12/31/2012 to 12/31/2013. We downloaded an excel file from Crimson Hexagon, which contains the total potential look and feel data on the web interface. In the Excel file, we add the number of potential impressions per day to derive the total potential impression for that time period.
Posting volume per author-we calculated the result as the posting volume 12 months before the company annual report date divided by the total number of Twitter authors released during that period. From 23/5 in 2008 to date, we created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of company cash labels on Twitter, Facebook, and Tumblr. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 12/31/2013, we get a total potential look and feel from 12/31/2012 to 12/31/2013. We downloaded an excel file from Crimson Hexagon, which contains data of total Twitter authors and average number of posts per author in a day. In an Excel file, we first multiply the number of Twitter authors released on a certain day by the average number of releases per author on that day to obtain the number of releases per day. Then, we add the total number of posts over the time period and divide it by the total number of Twitter authors over the time period to yield the post count for each author. If the company has 0 authors in the period of time the data is collected, the value is set to 0 to avoid dividing by 0
Look and feel of each post-we calculated in Excel by dividing the total potential look and feel by the total post volume (obtained as described above). If the company has 0 posts within the time period of collecting data, the value is set to 0 to avoid dividing by 0
Model testing and results
After we obtained all the financial and social media data of the above 49 companies in the analysis, we generated a "solvance _ Score _ Model _ MasterMatrix" Excel spreadsheet, which can be found on confound. The spreadsheet is too large to be included in the report, but it contains all the data as well as other details (e.g., cash labels, report dates, social data date ranges, etc.) that help to obtain further information about the company used in the modeling process. After generating this data matrix, we created a baseline matrix for the reimbursement capability scoring model (Table 6).
Table 6. repayment ability scoring model baseline data matrix snapshot. input variables are abbreviated for brevityp、EBIEp、QRpAnd ROICp
Bankruptcy SQRT(TLTAp x EBIEp) QRp ROICp
Bankrupt 3.605551275 22 100
Bankrupt 6.708203932 6 59
Bankrupt 8.124038405 12 96
Bankrupt 9.486832981 4 86
Bankrupt 12 8 88
Nonbankrupt 90.33271833 76 47
Nonbankrupt 90.48756821 63 70
Nonbankrupt 93.49866309 88 41
Nonbankrupt 96.48834126 96 55
Nonbankrupt 100 80 49
After establishing the baseline matrix, we performed logistic regression analysis on the baseline matrix to calculate the average accuracy of the baseline model. To do this, we developed an R Script named "Script _ for _ Running _ models.r". While we intend to describe this script in detail separately, we will briefly summarize how this script determines the average accuracy and standard deviation of the model accuracy. This Script is uploaded to Git's "Model _ Code _2_24_ 16" compressed folder and contained in this archived "Modeling _ Script" subdirectory.
The first step of this script involves importing a baseline data matrix (see, e.g., table 6). After loading the matrix, the code randomly selected 60% of the data for training and 40% of the data for testing. As an example, if we loaded table 5 (which has 10 rows of data) into the code, 6 rows of data would be randomly selected to train the linear discriminant model, and 4 rows of data would be randomly selected for testing purposes. After training, the code predicts to which class each data point in the test data belongs, and then compares the prediction to the actual class of each data point. The accuracy of the model is then stored in a list and the above steps are repeated 99 more times for a total of 100 iterations. After 100 iterations, the code prints the average precision and the standard deviation of the precision. We plot and compare the average accuracy and average error (square root of standard deviation/sample size) of the models we tested.
After implementing the above modeling code, we found that the average accuracy of our baseline model was 90.5% with a standard deviation of 6.4%. Note that these are the accuracies of when we implement the model. If run again, highly similar but inaccurate results are likely to be obtained due to the random selection of training and testing data sets. The informativeness rate (NIR; random prediction) of the repayment ability scoring model was 53.1%. If a person randomly guesses the nature of a company's town river, he or she will receive these rates.
After generating the baseline Model, we add the social media variables in different combinations to the baseline matrix (see the "solution _ Score _ Model _ Descriptions" document for details). We generated a total of 23 different coverage models, including various combinations of baseline data plus social media data or social media data alone. Since other combinations of social media and baseline variables exist, we have not tested them, so the number of models tested is far from exhaustive. Nor did we explore combining subsets of baseline variables with social media variables. Therefore, we conclude below based on a limited subset of combinations derived from combinations of baseline variables (considered as "one" variables) and social media variables.
Using the above codes, we calculated the average accuracy and standard deviation of accuracy for each model and compared them to our baseline matrix. Of our 23 models, model 4(M) appeared to have a marginal increase in accuracy (92.5% accuracy; 6.3% standard deviation) relative to baseline (90.5% accuracy) when predicting whether the company will apply for demolition a year ahead. The model includes baseline data plus the overall potential look and feel. Several models built using social media alone appear to be more predictive of bankruptcy than stochastic models. Given these results, and the fact that we tested only a small fraction of the different possible combinations in the analysis, our data suggests that it is necessary to further examine the repayment ability score social data prediction of the population.
Although the baseline and social media coverage models show promise in predicting bankruptcy, we compare the baseline reimbursement ability score model to the Morningstar's reimbursement ability score model for the prediction of our data setpx EBIEp))、0.02793(QRp)、0.02786(ROICp) And a y-intercept of-9.19726 Morningstar repayment Capacity score coefficient of 5(SQRT (T L TA)px EBIEp))、4(QRp) And 1.5 (ROIC)p). After calculating the repayment ability score, we converted each company's repayment ability score to a percentile score (high scoring companies get lower percentile scores, while low scoring companies get higher percentile scores) according to the method of Morningstar corporation calculating percentile ranking 16. Briefly:
the percentile scale is rounded down ((99x (i-l)/(n-1) +1)),
where "round down" refers to a function of Microsoft Excel rounding down the values to the nearest integer, "n" is the total number of observations (i.e., the total number of companies in the analysis), "i" is the absolute rank per observation (obtainable by the "rank" function of Excel). After obtaining the percentile rank for each company, we calculate the cumulative failure frequency over all percentile ranks and plot the cumulative failure frequency against the percentile rank. We found that our model is similar to Morningstar's repayment ability scoring model in predicting bankruptcy. However, the population authority may want to consider calculating the accuracy ratio of each model as described by Warren Miller in Morningstar report 12 months 2009 to more quantitatively compare our baseline Z model with the Morningstar reimbursement capability scoring model17
Figure DA00025221220036855
Revenue per share social data prediction
Overview of the model
Profitability is a key factor that needs to be kept in mind when deciding which company to invest. Generally, high profit companies tend to be good investments. One common indicator of profitability of a company is revenue per share. We asked whether using social media data alone can better predict the increase or decrease in dilution revenue per year from one year to the next, compared to random prediction.
To answer this question, we constructed several random forest models, using 5 social media points as input variables. We also bought the amortization per profit for 58 companies in 2013 and 2014. To calculate whether the amortization per profit for a company is increasing or decreasing, we compared the annual amortization per profit (DEPS) since 2014 for a company with the annual DEPS for 2013 for that company. Then, we obtained social equity data (described later) from 2012 to 2013 to predict the DEPS change from 2013 to 2014.
After obtaining social media variables and determining the company's DEPS changes in our analysis, we use these data to build a master data matrix. Then, we apply a random forest model to several different variations of the matrix in order to distinguish between companies with increased DEPS and companies with decreased DEPS. Model prediction is based on 500 regression trees, and we use the insertion notation packet in R to realize all modeling18
Figure DA00025221220036932
To test the accuracy of each model, we randomly separated our data into a training data set (60% of companies) and a test data set (40% of companies) (fig. 16). After training the model with 60% of company data, we used the model to classify the remaining 40% of companies and calculate accuracy. Since the accuracy of the model can vary due to the random selection of training and test data, we performed the above sequence of steps 100 times and took the mean and standard deviation of 100 trials as our final accuracy score. Although we did not generate a final quantitative score for the company, since we found our model to not predict better than random, it was possible to obtain the increased probability of DEPS directly from the interpolated symbol packet within R.
In total, we constructed 23 different models (model descriptions provided separately) that consisted of different combinations of social media variables (described below). These models are not exhaustive in the number of combinations that can be created due to time constraints, but they do serve as a substantial starting point for the analysis. The QuoteMedia API and the Morningstar website are used to obtain the financial information that we need for analysis. In our analysis, Cridson Hexagon and company websites were used to obtain all social media variables.
Model variables (finance and social)
In our analysis, we collected data for a total of 6 different variables (1 financial variable and 5 social variables). The description of the financial variable and how we obtain/compute it is as follows:
change in per-stand-out revenue-we obtained the annual per-stand-out revenue of the companies in 2013 and 2014 directly from the quote media API. Then, we compared the DEPS of 2014 with the DEPS of 2013 to determine whether or not to
The description of social media variables and how we obtain/compute these variables is as follows:
here, social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram because of time constraints, we have used the number of links on a company's website up to 2016, assuming that companies have not added a large number of social media links to their website since 2013.
Total posting volume-this is the total number of postings that include the company's cash label (e.g., $ AMGN is Amgen's cash label). We have created a Buzz Monitor ("EPS; 2014-year-change data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook, and Tumblr from 1/2012 to 12/31/2013. For a given company, data is collected from 12 months prior to the company's annual reporting date. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Total potential look-this is the total potential look generated by posts including company cash labels within 12 months prior to the company annual reporting date. Data are from "EPS of Crimson Hexagon; the 2014-year change data "Buzz Monitor. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Posting volume per author-we calculated the result as the posting volume 12 months before the company annual report date divided by the total number of Twitter authors released during that period. Data are from "EPS of Crimson Hexagon; the 2014-year change data "Buzz Monitor. Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 authors in this time frame, we manually set the value to 0 to avoid dividing by 0.
Look and feel per post-we calculate as follows:
total potential impressions (see description above)/total post volume (see description above) data are from "EPS on Crimson Hexagon; the 2014-year change data "Buzz Monitor. Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 posts in this time frame, we manually set the value to 0 to avoid dividing by 0
Company incorporation Standard
This section is intended to outline our selection process for companies involved in the analysis of the berm. Specific information about the company itself can be obtained from the "people's office Social Data Earnings per Share Focus Benchmark" Word document and the "EPSchanges _2013_ to _2014_ Master _ Matrix" Excel document. In 2016, month 1, we obtained a list of about 120 companies using the Morningstar website, which identified them as having either wide, narrow, or no berms. Then, we use the quotemia API to obtain the 12 financial variables described above (e.g., total revenue) directly, or to obtain the variables needed to compute these attributes (e.g., we have obtained a total common stock that was published and unadjusted, and then compute market values from these values). To remain in our analysis, the company must report all the variables needed to obtain the 12 financial input attributes of the model of the protected city river of its year 2014. Companies that we could not obtain all 12 attributes from the 2014 annual report are eliminated from our analysis. After filtration, a total of 59 companies were used in our final analysis. To improve efficiency, we obtained the DEPS of these 59 companies. After obtaining the DEPS and calculating the DEPS changes, we culled companies that did not change in DEPS because such companies rarely appeared (1 in 59 companies) and were too few to model. Our final analysis included 58 companies.
Data acquisition
To get the amortization per share of the company in our analysis, we developed a Python script that downloaded the data using the quotemia API. The script is summarized as follows. The script is uploaded into the Git's "Model _ Code _2_24_ 16" compressed folder. In this folder, they can be found in a subdirectory named "EPS _ Model". This script is set to get data from the 10 most recent annual reports for a given company, but a simple modification to the API call will allow the user to get more reports when needed. Before running the script, Python must be installed on the computer. In theory, Python2 or Python3 should work, but the data is obtained on the machine running Python 2.7. Furthermore, the code relies on the import of multiple python modules. If the module needed by the code is to be viewed, please use the text editor to open the script and view the first few lines of code.
The script and its purpose are as follows:
py-the script takes a list of company stock codes and returns the stock codes, annual amortization earnings and report dates on which the data was obtained in tab-separated format-to run this code for a particular company list, please open the script using a text editor and paste into the comma-separated company list in line 27 of this script.
The use method comprises the following steps: python gethistocaleps. py > histocaleps. txt (note: histocaleps. txt "may be mentioned in the section" model variables (finance and socialization) ", we used the following method in the analysis to obtain social media variables.
Here, social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram because of time constraints, we have used the number of links on a company's website up to 2016, assuming that companies have not added a large number of social media links to their website since 2013.
Total posting volume-this is the total number of postings that include the company's cash label (e.g., $ AMGN is Amgen's cash label). We have created a Buzz Monitor ("EPS; 2014-year-change data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook, and Tumblr from 1/2012 to 12/31/2013. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 2013, 12, 31, we get a total posting volume from 2012, 12, 31, to 2013, 12, 31. The total posting volume is obtained online from the monitor screen.
Total potential look-this is the total potential look generated by posts including company cash labels within 12 months prior to the company annual reporting date. We have created a Buzz Monitor ("EPS; 2014-year-change data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook, and Tumblr from 1/2012 to 12/31/2013. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 12/31/2013, we get a total potential look and feel from 12/31/2012 to 12/31/2013. We downloaded an excel file from Crimson Hexagon, which contains the total potential look and feel data on the web interface. In the Excel file, we add the number of potential impressions per day to derive the total potential impression for that time period.
Posting volume per author-we calculated the result as the posting volume 12 months before the company annual report date divided by the total number of Twitter authors released during that period. We have created a Buzz Monitor ("EPS; 2014-year-change data") on Crimson Hexagon, searching for company cash label usage on Twitter, Facebook, and Tumblr from 1/2012 to 12/31/2013. For a given company, data is collected from 12 months prior to the company's annual reporting date. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's cash label. We apply this filter to the monitor and set the time range to include the years before the company yearbook date. For example, if a company submits an annual report at 12/31/2013, we get a total potential look and feel from 12/31/2012 to 12/31/2013. We downloaded an excel file from Crimson Hexagon, which contains data of total Twitter authors and average number of posts per author in a day. In an Excel file, we first multiply the number of Twitter authors released on a certain day by the average number of releases per author on that day to obtain the number of releases per day. Then, we add the total number of posts over the time period and divide it by the total number of Twitter authors over the time period to yield the post count for each author.
Look and feel of each post-we calculated in Excel by dividing the total potential look and feel by the total post volume (obtained as described above).
Model testing and results
After we obtained all the financial and social media data of the 58 companies in the analysis, we generated an "EPS _ changes _2013_ to _2014_ Master _ Matrix" Excel spreadsheet, which can be found on the Confluence. The spreadsheet is too large to be included in the report, but it contains all the data as well as other details (e.g., cash labels, report dates, social data date ranges, etc.) that help to obtain further information about the company used in the modeling process. After generating this master data matrix, a data matrix is created for our random forest model (table 7).
Table 7 illustrates a snapshot of the matrix, with variables input into the DEPS model. Input variables are abbreviated for brevity. The variables that we are trying to predict ("Change") are highlighted in green. Although company names are not shown, each row corresponds to a particular company.
After establishing the baseline matrices, we continue to run a random forest model for each matrix to compute the average accuracy of our social fairness model in predicting DEPS changes. To this end, we developed an R Script named "Script for running models. While we intend to describe this script in detail separately, we will briefly summarize how this script determines the average accuracy and standard deviation of the model accuracy. This Script is uploaded to Git's "Model _ Code _2_24_ 16" compressed folder and contained in this archived "Modeling _ Script" subdirectory.
The first step of this script involves importing the baseline data matrix (see table 7).
Change Total Posts Total Potential Impressions Posts per Author Impressions per Post Identity Score
Decrease 1791 8473215 1.121856868 4730.99665 3
Decrease 662 1701948 1.167548503 2570.918429 3
Decrease 29 238645 1.26086957 8229.137931 3
Decrease 2134 16244398 1.277043268 7612.182755 3
Decrease 566 840597 1.292906179 1485.15371 3
Increase 1557 2974115 1.169078449 1910.157354 0
Increase 5 7830 1.25 1566 0
Increase 1133 5398097 1.252502779 4764.428067 0
Increase 1324 6758186 1.29480901 5104.370091 0
Increase 9087 88106058 1.340127005 9695.835589 0
After loading the matrix, the code randomly selected 60% of the data for training and 40% of the data for testing. As an example, if we loaded table 7 (which has 10 rows of data) into the code, 6 rows of data would be randomly selected to train the random forest model, and 4 rows of data would be randomly selected for testing purposes. After training, the code predicts to which class each data point in the test data belongs, and then compares the prediction to the actual class of each data point. The accuracy of the model is then stored in a list and the above steps are repeated 99 more times for a total of 100 iterations. After 100 iterations, the code prints the average precision and the standard deviation of the precision. We plot and compare the average accuracy and average error (square root of standard deviation/sample size) of the models we tested.
We generated 23 different models based on social media data only (see "earths _ per _ share _ changes _ model _ descriptions" for details). The number of models tested is far from exhaustive. Therefore, we conclude below based on a limited subset of combinations derived from combinations of baseline variables (considered as "one" variables) and social media variables. After implementing the above modeling code, we found that none of the models we constructed were able to predict DEPS changes more than the stochastic model. In fact, our model often behaves worse than randomly. The non-information rate (NIR; random prediction) of the DEPS model was 63.8%. These rates are the rates at which one would obtain whether a person guessed that the company's DEPS was changing randomly without information.
In view of these results, our data suggests that the population bureau should reduce the concern of predicting the change in revenue per share of a stall using social fairness data only.
Investor crowd funding oriented social data prediction
Overview of the model
The issuance of the JOBS act enables companies in the United states to raise the required funds in a crowd funded manner and enables non-registered investors to invest in small stock private companies and non-publicly traded funds. While this new capital investment model is an exciting way to link the general population to the way in which new businesses are invested, it also carries many risks to the contracting investors, as well as requiring a new infrastructure to deliver information and comply with new regulations. One of the risks comes from an "all or nothing" financing scheme, i.e., a company must fully achieve its financing objectives in order to obtain financed funds. It would be very valuable for businesses to be able to predict in advance the possibility of fully achieving a financing goal, particularly if they did not take a track of achieving this goal and still had time to change their election strategy.
The model analyzes whether social media data has predictive power in predicting whether a company will fully achieve its financing objectives, using data from the first quarter of its financing period and using data from the entire financing period of a company. Com, we identified 21 companies that either fully achieved the financing objective during the allocated financing period (n-11 companies) or did not fully achieve the financing objective (n-10). Then, we constructed several random forest models, using as input variables different combinations of 5 socially fair data points (described in detail later) collected during the first quarter of the company's financing period and throughout the financing period.
After obtaining social media variables and determining which companies in our analysis have fully reached their financing goals, we build a master data matrix with these data. We then apply a random forest model to several different variants of the matrix in order to distinguish fully subsidized companies from companies that do not obtain full subsidization. Model prediction is based on 500 regression trees, and we use the insertion notation packet in R to realize all modeling19
Figure DA00025221220037074
To test the accuracy of each model, we randomly separated our data into a training data set (60% of companies) and a test data set (40% of companies) (FIG. 17). After training the model with 60% of company data, we used the model to classify the remaining 40% of companies and calculate accuracy. Since the accuracy of the model can vary due to the random selection of training and test data, we performed the above sequence of steps 100 times and took the mean and standard deviation of 100 trials as our final accuracy score. Although we do not generate a final quantitative score for a company in this model, the probability of obtaining full funds for a given company can be obtained directly from the package of inserted symbols within R.
In total, we constructed 23 different models (model descriptions provided separately) that consisted of different combinations of social media variables (described below).
These models are not exhaustive in the number of combinations that can be created due to time constraints, but they do serve as a substantial starting point for the analysis. Com network site, Internet Archive (http:// Archive. org/. index. php), and other secondary research sources were used to obtain The financial information (i.e., capital status) needed for our analysis. In our analysis, Cridson Hexagon and company websites were used to obtain all social media variables.
Model variables (finance and social)
In our analysis, we collected data for a total of 6 different variables (1 financial variable and 5 social variables). The description of the financial variable and how we obtain/compute it is as follows:
com collects information about the company including a financing start date, a financing end date, financing objectives, and reservations/funds financed before and before the financing period. Companies that meet or exceed their financing goals during the financing period are considered "full funding" companies, and companies that do not meet their financing goals during the financing period are classified as "not full funding" companies.
The description of social media variables and how we obtain/compute these variables is as follows:
here, social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram because of time constraints, we have used the number of links on company websites up to 2016 < 2 > month, assuming that these companies have not recently added or deleted a large number of social media links to their websites.
Total posting volume — this is the total number of postings that include the company Twitter handle (e.g., @ Trust is Trustfy's Twitter handle). We created a Buzz monitor on Crimson Hexagon ("crowdFunder Companies") that searched for the use of corporate cash labels on Twitter, Facebook, and Tumblr from 12 months and 31 days 2013 to today. For a given company, data is collected during either the first quarter of its financing period (e.g., the first 25 days of a 100 day financing period) or during its entire financing period. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Total potential look-this is the total potential look that a post, including the company's Twitter handle, would produce during the first quarter of its financing period, or the entire financing period. Data are from "crowdFundercompanies" Buzz Monitor on Crimson Hexagon. Under 7 building blocks of social media, this would be classified as belonging to a conversation block.
Post for each author-we calculate this number as: the total number of posts in the crowd funding period, or the entire funding period, divided by the total number of Twitter authors released contemporaneously. Data are from "crowdFunder company" Buzz Monitor on Crimson Hexagon. Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 authors in this time frame, we manually set this value to 0 to avoid dividing by 0
Look and feel per post-we calculate as follows:
total potential impressions (see description above)/Total positions (see description above) data were obtained from "CrowdFunder Companies" Buzz Monitor on Crimson Hexagon. Under 7 building blocks of social media, this would be classified as belonging to a conversation block. Note that: if the company has 0 posts in this time frame, we manually set the value to 0 to avoid dividing by 0
Company incorporation Standard
This section is intended to outline our selection process for companies involved in the analysis of the berm. Specific information about the company itself can be obtained from "demographic-specific crown Funding Focus Benchmark" Word documents as well as "crown _ Data _ MasterMatrix _ First _ Quarter _ Funding" and "crown _ Data _ MasterMatrix _ Full _ Funding _ Excel" Excel documents. Com web site is used as the main source for obtaining financing data for a specific company. We primarily exclude companies that have not completed financing by 2016 but those that exceed their financing objective by the end of the financing period by 2016 (e.g., company a's lean end date may be 2016 6 months, but we will include company a in the analysis if its financing objective has been reached or exceeded by 2016.
Data acquisition
By 2016, 2 months, most of the financial information we used in the analysis for investors was obtained directly from the crowdfendar. However, we occasionally use internet archives and other resources (e.g., google searches, newsletters, etc.) to determine when the financing period for some companies has ended, because such information is not always available on the web site at any time. In our analysis, we have employed the following approach to obtain social media variables.
Here, social media websites include Facebook, Twitter, Tumbl, L inkedIn, Google +, Pinterest, and Instagram because of time constraints, we have used the number of links on the company's website up to 2016, assuming the company has not increased or decreased too many social media links since 2013.
Total posting volume-this is the total number of posts that contain the company Twitter handle. We created a Buzz Monitor on Crimson Hexagon ("CrowdFunder Companies") that searched for the use of corporate cash labels on Twitter, Facebook, and Tumblr from 12 months and 31 days 2013 to today. For a given company, data is collected during the first quarter of the company's crowd funding period (e.g., the first 25 days of a financing period lasting 100 days) or the company's full crowd funding period. To collect company and time specific data, we created a filter in the Buzz Monitor using the company's Twitter handle. We apply this filter to the monitor and set the time range to contain the required time range. The total posting volume is obtained online from the monitor screen.
Total potential look-this is the total potential look generated by posts including the Twitter handle for the first quarter of the crowd funding period or the entire funding period of the company. We created a Buzz Monitor on Crimson Hexagon ("CrowdFunder Companies") that searched for the use of corporate cash labels on Twitter, Facebook, and Tumblr from 12 months and 31 days 2013 to today. For a given company, data is collected throughout the first quarter of the financing period. To collect data for a company and a specific time, we created a filter in the Buzz Monitor using the company's Twitter handle. We apply this filter to the monitor and set the time range to the desired time window for which data is needed. We downloaded an excel file from Crimson Hexagon, which contains the total potential look and feel data on the web interface. In the Excel file, we add the number of potential impressions per day to derive the total potential impression for that time period.
Post for each author-we calculate this number as the total number of posts in the first quarter or the entire financing period divided by the total number of Twitter authors issued during this period. We created a Buzz Monitor on Crimson Hexagon ("CrowdFunder Companies") that searched Companies for their Twitter handles from 31.12.2013 to today on Twitter, Facebook, and Tumblr. To collect company and time specific data, we created a filter in the Buzz Monitor using the company's Twitter handle. We apply this filter to the monitor and set the time range to contain the required date. We downloaded an excel file from Crimson Hexagon, which contains data of total Twitter authors and average number of posts per author in a day. In an Excel file, we first multiply the number of Twitter authors released on a certain day by the average number of releases per author on that day to obtain the number of releases per day. Then, we add the total number of posts over the time period and divide it by the total number of Twitter authors over the time period to yield the post count for each author.
Look and feel of each post-we calculated in Excel by dividing the total potential look and feel by the total post volume (obtained as described above).
Model testing and results
Once we obtained all the financial and social media Data for the above 21 companies in the analysis, we generated "Crowdfunder Data MasterMatrix Full Funding Period" and "Crowdfunder _ DataMasterMatrix _ First _ Quarter _ Funding" Excel spreadsheets, which can be found on Confluence. These spreadsheets contain all the data, as well as other details (e.g., Twitter handle, reporting date, social data date range) that are useful for obtaining further information about the company used in the modeling process. After generating this master data matrix, we created a data matrix for the first quarter of the chip period and the entire chip period for our random forest model (see table 8 for an example view of the model matrix).
Table 8. A snapshot of an example matrix, variables input into an investor-specific crowd-funded social data prediction model. Input variables are abbreviated for brevity. The variables that we are trying to predict ("Funding") are highlighted in green. The following data is from the first quarter of the company financing period. Although company names are not shown, each row corresponds to a company.
Funding TotalPosts Total PotentialImpressions Posts per Author Impressions per Post Identity
Fully_Funded 181 470244 1.448 2598.033149 2
Fully_Funded 11 2287 1.1 207.9090909 3
Fully_Funded 152 303919 1.1111 1999.467105 2
Fully_Funded 27 8128 1.8 301.037037 3
Fully_Funded 77 379999 1.4808 4935.051948 0
Not_Fully_Funded 75 125391 1.2931 1671.88 3
Not_Fully_Funded 14 134652 1.2727 9618 3
Not_Fully_Funded 15 305815 1 20387.66667 0
Not_Fully_Funded 34 586026 1.0303 17236.05882 2
Not_Fully_Funded 4 11848 1 2962 0
After the matrices are built, we run a random forest model for each matrix to compute the average accuracy of our social fairness model in predicting the success of the financing. To do this, we developed an R Script named "Script _ for _ Running _ models.r". While we intend to describe this script in detail separately, we will briefly summarize how this script determines the average accuracy and standard deviation of the model accuracy. This Script is uploaded to Git's "Model _ Code _2_24_ 16" compressed folder and contained in this archived "ModglingScript" subdirectory.
The first step of this script involves importing the baseline data matrix (see table 8). After loading the matrix, the code randomly selected 60% of the data for training and 40% of the data for testing. As an example, if we loaded table 8 (which has 10 rows of data) into the code, 6 rows of data would be randomly selected to train the random forest model, and 4 rows of data would be randomly selected for testing purposes. After training, the code predicts to which class each data point in the test data belongs, and then compares the prediction to the actual class of each data point. The accuracy of the model is then stored in a list and the above steps are repeated 99 more times for a total of 100 iterations. After 100 iterations, the code prints the average precision and the standard deviation of the precision. We plot and compare the average accuracy and average error (square root of standard deviation/sample size) of the models we tested.
Based on social media data only, we generated 23 different models (see "invoke _ specific _ first _ quartz _ model _ descriptions" and "invoke _ specific _ full _ fusing _ period _ model _ descriptions" documents for details). The number of models tested is far from exhaustive. Therefore, we conclude below based on a limited subset of combinations derived from combinations of baseline variables (considered as "one" variables) and social media variables.
After implementing the modeling code described above, we found that several models we built using data from the first quarter or the entire financing period of the financing period are better able to predict the probability of a company being fully financed than a stochastic model.
In fact, we used the most accurate model of the first quarter financing data with almost 80% accuracy (model 5; mean accuracy 79.6%, standard deviation 6.5%), and the most accurate model of the full financing period data (model 15) with an average of 81.1% accuracy (standard deviation 13.9%). Both values are higher than the no information rate (NIR; random prediction), the latter being 52.4%. If a person guesses the likelihood that a company will randomly receive the full amount of funds, he or she will receive 52.4% accuracy. Model 5 consists of the identity score and the post of each author, and model 15 consists of the overall potential look and feel, look and feel of each post and post of each author. In view of these results, and we tested only a small fraction of all the different models that can be built (using only 5 social media variables), these data strongly suggest that social media has predictive power in predicting crowd funding success, while the demographic bureau should continue to use social stock data to develop its ratings for investors.

Claims (16)

1. A method for analyzing crowd-funding platforms, the method comprising:
connecting to a plurality of individual loan platforms using an electronic device;
retrieving loan book data from each of the individual loan platforms;
storing the loan book data using a memory coupled to the electronic device,
wherein the loan book data comprises metadata generated in a structured query language database, and
wherein the metadata comprises a list of data attributes and names of platforms associated with the loan book data;
converting, using a processor coupled to the electronic device, the loan book data from each platform such that the converted loan book data uses common data;
reading, using the processor, the converted loan book data; and
the destination unified data attributes are documented for each pair of platform and attribute.
2. The method of claim 1, wherein the metadata further comprises a timestamp for when the loan book data has been received.
3. The method of claim 1, wherein the list of attributes is associated with a loan origination with each borrower list associated with the platform.
4. The method of claim 1, wherein the common data is selected from the group consisting of: a common language; a public currency; a public time zone; a common unit; and a common numerical range.
5. The method of claim 1, wherein the storing the loan book data further comprises storing the loan book data in real time for each platform in its natural state.
6. The method of claim 1, wherein the documenting is performed according to a mapping table.
7. The method of claim 1, further comprising predicting whether a loan associated with the platform is likely to be repayed.
8. A system for analyzing crowd-funding platforms, the system comprising:
an electronic device configured to:
connecting to a plurality of individual loan platforms; and is
Retrieving loan book data from each of the individual loan platforms;
a memory coupled to the electronic arrangement, the memory configured to store the loan book data,
wherein the loan book data comprises metadata generated in a structured query language database, and
wherein the metadata comprises a list of data attributes and names of platforms associated with the loan book data; and
a processor coupled to the electronic device and configured to:
converting the loan book data from each platform such that the converted loan book data uses common data;
reading the converted loan book data; and is
The destination unified data attributes are documented for each pair of platform and attribute.
9. The system of claim 8, wherein the metadata further comprises a timestamp for when the loan book data has been received.
10. The system of claim 8, wherein the list of attributes is associated with each borrower list and loan issuance is associated with a host platform identified and listed across other platforms.
11. The system of claim 8, wherein the common data is selected from the group consisting of: a common language; a public currency; a public time zone; a common unit; and a common numerical range.
12. The system of claim 8, wherein the memory is further configured to store the loan book data for each platform in its native state in real-time.
13. The system of claim 8, wherein the processor is configured to perform the documentation according to a mapping table.
14. The system of claim 8, wherein the processor is further configured to predict whether a loan associated with the platform is likely to be repayed.
15. The system of claim 8, wherein the electronic device is selected from the group consisting of: a desktop computer; a laptop computer; a tablet computer; and a smart phone.
16. The system of claim 8, further comprising a graphical user interface, and wherein the memory is further configured to store a digital application configured to enable a user to access the destination unified data attribute using the graphical user interface.
CN201880078251.8A 2017-10-04 2018-10-03 System and method for analyzing crowd funding platform Pending CN111433806A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201762568105P 2017-10-04 2017-10-04
US62/568,105 2017-10-04
PCT/IB2018/001260 WO2019069138A1 (en) 2017-10-04 2018-10-03 System and method for analyzing crowdfunding platforms

Publications (1)

Publication Number Publication Date
CN111433806A true CN111433806A (en) 2020-07-17

Family

ID=65896682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880078251.8A Pending CN111433806A (en) 2017-10-04 2018-10-03 System and method for analyzing crowd funding platform

Country Status (5)

Country Link
US (1) US20190102836A1 (en)
EP (1) EP3692451A4 (en)
CN (1) CN111433806A (en)
GB (1) GB2581696A (en)
WO (1) WO2019069138A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102076A (en) * 2020-11-09 2020-12-18 成都数联铭品科技有限公司 Comprehensive risk early warning system of platform

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138621B2 (en) 2019-05-23 2021-10-05 Capital One Services, Llc Normalization grid
CN111930717B (en) * 2020-08-07 2024-06-07 暨南大学 Crowd-sourced database construction method and device based on blockchain and natural language processing
US20220327624A1 (en) * 2021-04-07 2022-10-13 Kingscrowd, Inc. System and method for rating equity crowdfunding capital raises
CN117992241B (en) * 2024-04-03 2024-06-04 深圳市元睿城市智能发展有限公司 Scientific and technological type middle and small enterprise bank-enterprise docking service system and method based on big data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030018558A1 (en) * 1998-12-31 2003-01-23 Heffner Reid R. System, method and computer program product for online financial products trading
US20030046670A1 (en) * 2001-06-15 2003-03-06 Marlow Mark J. Binary object system for automated data translation
US20030101120A1 (en) * 2001-11-29 2003-05-29 Lynn Tilton Method of securitizing a portfolio of at least 30% distressed commercial loans
US8117099B1 (en) * 2006-05-15 2012-02-14 Sprint Communications Company L.P. Billing systems conversions
CN106846145A (en) * 2017-01-19 2017-06-13 上海冰鉴信息科技有限公司 It is a kind of to build and verify the metavariable method for designing during credit scoring equation

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7849049B2 (en) * 2005-07-05 2010-12-07 Clarabridge, Inc. Schema and ETL tools for structured and unstructured data
US8589798B2 (en) * 2009-03-31 2013-11-19 Commvault Systems, Inc. Information management systems and methods for heterogeneous data sources
WO2012097171A2 (en) * 2011-01-13 2012-07-19 Jeffrey Stewart Systems and methods for using online social footprint for affecting lending performance and credit scoring
US20130185228A1 (en) * 2012-01-18 2013-07-18 Steven Dresner System and Method of Data Collection, Analysis and Distribution
US8874551B2 (en) * 2012-05-09 2014-10-28 Sap Se Data relations and queries across distributed data sources

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030018558A1 (en) * 1998-12-31 2003-01-23 Heffner Reid R. System, method and computer program product for online financial products trading
US20030046670A1 (en) * 2001-06-15 2003-03-06 Marlow Mark J. Binary object system for automated data translation
US20030101120A1 (en) * 2001-11-29 2003-05-29 Lynn Tilton Method of securitizing a portfolio of at least 30% distressed commercial loans
US8117099B1 (en) * 2006-05-15 2012-02-14 Sprint Communications Company L.P. Billing systems conversions
CN106846145A (en) * 2017-01-19 2017-06-13 上海冰鉴信息科技有限公司 It is a kind of to build and verify the metavariable method for designing during credit scoring equation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102076A (en) * 2020-11-09 2020-12-18 成都数联铭品科技有限公司 Comprehensive risk early warning system of platform

Also Published As

Publication number Publication date
WO2019069138A1 (en) 2019-04-11
EP3692451A4 (en) 2021-07-14
GB202006540D0 (en) 2020-06-17
EP3692451A1 (en) 2020-08-12
GB2581696A (en) 2020-08-26
US20190102836A1 (en) 2019-04-04

Similar Documents

Publication Publication Date Title
Härdle et al. Understanding cryptocurrencies
US20220343432A1 (en) Machine learning architecture for risk modelling and analytics
Zamore et al. Credit risk research: Review and agenda
Davradakis et al. EIB Working Papers 2019/01-Blockchain, FinTechs and their relevance for international financial institutions (Volume 2019/1)
US11928745B2 (en) Issue management system
Derrien et al. The real effects of financial shocks: Evidence from exogenous changes in analyst coverage
Pasquariello Financial market dislocations
See-To et al. Market sentiment dispersion and its effects on stock return and volatility
US20070282728A1 (en) Consolidation, sharing and analysis of investment information
US20140143126A1 (en) Loan Analysis And Management System
Daluwathumullagamage et al. Fantastic beasts: Blockchain based banking
CN111433806A (en) System and method for analyzing crowd funding platform
US8046295B1 (en) Private capital management system and method
US20170032458A1 (en) Systems, methods and devices for extraction, aggregation, analysis and reporting of financial data
US20210295436A1 (en) Method and platform for analyzing and processing investment data
US20130006684A1 (en) Engine, system and method of providing business valuation and database services using alternative payment arrangements
US20240232922A9 (en) Systems, methods, and devices for automatic dataset valuation
Harper et al. Managerial ability and bond rating changes
Liu et al. Valuing catastrophe bonds involving credit risks
US20130103555A1 (en) System and method for business verification using the data universal numbering system
Yang et al. Optimal longevity hedging framework for insurance companies considering basis and mispricing risks
US20120310796A1 (en) Engine, system and method of providing realtime cloud-based business valuation and database services
Chiantera Data quality and data governance in insurance corporations
Hennessy Cryptonetworks-The incentive-based Economics of Blockchain
Trujillo Towards a Centralized Venture Capital Data Source-The Key to Increased Minority Access to Venture Capital?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40029119

Country of ref document: HK

WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200717