This is the third blog in a series showing how to use Intrinio financial data in R or R Studio to create quant models. The tools at Intrinio are built to make modeling financial data straightforward. The first blog shows the basics of making an API call for financial data in R. The second blog shows how to write two functions, one to pull in historical stock prices and another to pull in historical fundamentals data.
This blog takes both of those blogs a step further, creating a single function that will pull in historical stock prices as well as historical fundamentals for many companies and many metrics at once. The function code as well as an explanation of what is going on under the hood is included, enabling R developers to quickly create a data frame for analysis with exactly the data they want.
Update 05/22/16- Check out this blog showing how to create a for loop in R to get multiple pages of data via API. This example shows the best way (known to the author) to parse JSON from an API in R.
Update 11/30/2017- Feel free to skip ahead to this recently released package that does the hard work for you.
Creating a data frame with the desired tickers, date ranges, and financial data
On line 32 I am using the “allTickerIndexData” function, seen later in this blog, to pull in the data I specified on lines 16-24. On line 16 I specified SPX to use as a control for market fluctuation. On line 18, I specified a list of tickers I am interested in. On line 20, I created a list of the financial metrics I want to see for those companies, and on lines 22 and 23 I specified the date ranges I am looking for. In this case, I will pull data from the last quarter in 2009 through today.
Remember, you need to enter your own API keys as explained in earlier posts.
The result is a data frame that looks something like this:
The data frame includes the daily high, open, low, close, volume, and adjusted values for all of the tickers I specified as well as the SPX index and the quarterly values I specified such as EBITDA margin and total revenue.
Whats under the hood of this function:
Less experienced R developers and analysts can feel free to skip ahead to the next section of code that shows how to subset, graph, and start modeling with the data frame.
More advanced R users might want to dig into the functions. You will notice, for example, on line 109, that the “na.locf” function from the zoo package has been used to fill forward monthly and quarterly data so that it lines up with the stock prices, which are released daily.
You may also notice, on line 41, that the “content” function from the httr package is being used to parse the API response so that, on line 44, a matrix can be used to transform the vector into a data frame.
Care has been taken on line 111 to convert the appropriate data to date format, with the same being done for numeric data on line 82. Note how column names are applied across the functions with lines 78 and 79 providing good examples.
Subsetting, Modeling Financial Data, and Graphing
This next bit of code represents a very basic example of how to start working with the data now that you have it in a data frame. It’s time to start modeling financial data! It is the continuation of the script show earlier in this blog and starts on line 35:
On line 41 I’m removing historical price data, which goes back to the 1980s, to make the data set more wieldy. Then, on lines 43 I create separate data frames for each of the stocks I am interested in. On lines 56-63 I run a simple correlation to determine how strongly related my variables, which I intend to use to predict stock prices, are to each other.
Line 71 is designed to show how easy it easy to now create a model. I cannot emphasize strongly enough that this is for demonstration purposes only- a linear model is almost certainly inappropriate here. The purpose of this code is to show that we are now ready to being quant modeling.
Finally, on lines 82-86, I generate a graph with a scaled version of the SPX plotted against BA’s stock price. Again, this may or may not be of interest, the point is to demonstrate how easy this data is to work with and how quickly you can start modeling financial data.
My team and I will now be exploring random forests, neural networks, logistic regression, and KNN clustering while adding, subtracting, and swapping predictor variables in an attempt to model how stock prices relate to financial metrics.
The ease with which we can perform these calculations is a testament to Intrinio’s platform and the flexibility of the APIs Intrinio provides, as well as the nifty functions demonstrated in this blog. Remember- just because we’ve made it easy to produce quantitative models doesn’t mean its easy to produce good quantitative models. My team will be using log transformations, variable scaling, back testing, and a hearty dose of skepticism with a dash of common sense interpretation to create our models.
Many thanks to business analytics graduate students Andrew Carpenter, Shrutika Troup and Kathryn Lescinski, as well as Professor Dan Zhang of the University of Colorado Boulder for their help in creating the code and content of this blog.