# Historical Financial Data in R for Stocks

This blog is a follow up to a blog explaining how to pull Intrinio financial data into R and R-Studio. In that blog I showed the basics of how to get the data flowing. In this blog I take it one step further and provide custom functions that will allow you to pull historical data into R very efficiently. I plan to build quant models, predicting historical prices based on historical metrics for a stock, and use a subset of the historical data to back test my models. This blog explains how to get the data for such an analysis.

Update 05/22/16- Check out this blog as well showing how to create a for loop in R to get multiple pages of data via API. This example shows the best way (known to the author) to parse JSON from an API in R.

Update 11/30/2017- Feel free to skip ahead to this recently released package that does the hard work for you.

The first function I created pulls the entire daily price history for a stock, creating a data-frame with many thousands of rows and providing pricing data like high, open, low, and close for every trading day between today and the 1980s.

prices <- function(ticker){
price_base <- "https://api.intrinio.com/prices?identifier="

price <- paste(price_base,ticker,sep="")
z <- unlist(content(tp,"parsed"))

n=length(z)
b=as.data.frame(matrix(z[1:(n-5)],(n-5)/13, byrow = T))
names(b)=names(z)[1:13]

return(b)
}

If you call the function, creating an object, and using your own API keys for the "username" and "password" objects within the function, you will have your data frame:

t <- prices("AAPL")

The really nice part about this function is that you can now create data frames quickly by swapping out AAPL for any tickers you are interested in analyzing. Graphing prices becomes very easy, but my goal is quant modeling.

For that, I need financial metrics that I expect to be correlated with price. This second function returns daily historical data over the period of your choice for the metric of your choice.

history <- function(ticker, item, start_date, end_date){
history_base <- "https://api.intrinio.com/historical_data?ticker="

historical <- paste(history_base, ticker, "&item=", item, "&start_date=", start_date, "&end_date=", end_date, sep="")
z <- unlist(content(tp,"parsed"))

n=length(z)
b=as.data.frame(matrix(z[1:(n-5)],(n-5)/2, byrow = T))
names(b)=names(z)[1:2]
return(b)
}

This time, the function needs a couple of inputs because we are looking at a certain period for a specific financial metric. You could, for example, call the function with these inputs to get the daily market cap for AAPL ranging from October 1, 2016 back to January 1, 2010.

app_hist <- history("AAPL","marketcap", "2010-01-01","2016-10-01")

The applications of this type of function are fantastic if you are a analyst or a developer working in R. Using Intrinio's data in R means you won't spend so much time gathering and organizing data, the bane of our existence. Instead, you can quickly get the data you need to start modeling, testing assumptions, and making decisions.

If you are interested in looking up the tags and API syntax that Intrinio uses, we have a blog explaining how to get started with the Intrinio API.

I will break down the functions from this blog to make your life easier when you replicate them with other Intrinio data.

## Function 1 breakdown:

prices <- function(ticker){

#price_base, below, is the base syntax of our API call since we are looking at prices. There are other base syntax for other purposes

price_base <- "https://api.intrinio.com/prices?identifier="

The price object we create next completes the API call by pasting the ticker on the end of the base API syntax.

price <- paste(price_base,ticker,sep="")

#tp uses the httr function to make our API call, passing our API username and password to Intrinio's server to authenticate.

#z parses the code and unlists it. This is another function of the httr package and it takes the raw return values of our API call and sorts them out for us

z <- unlist(content(tp,"parsed"))

#These next three lines are the hardest to understand. Their purpose is to convert the character vector we have into a data frame to make it easier to analyze. Starting with n, we get the length of z, our output.

n=length(z)

#b takes our output and creates a matrix. Right now, we have one long vector that repeats the 13 values we are interested in over and over. Additionally, there are 5 values on the end that we don't need, those values tell us the status of our API call and other information from Intrinio's servers. The matrix removes those values, then turns every 13 values into a row.

b=as.data.frame(matrix(z[1:(n-5)],(n-5)/13, byrow = T))

#Finally, we give our columns the same names as the first 13 values in our vector.

names(b)=names(z)[1:13]

return(b)
}

## Function 2 breakdown:

#You'll notice that the historical_data function needs more inputs than prices.

history <- function(ticker, item, start_date, end_date){

#Notice the difference in base API syntax

history_base <- "https://api.intrinio.com/historical_data?ticker="

#Notice the extra data in our API call to inform the server of what we want

historical <- paste(history_base, ticker, "&item=", item, "&start_date=", start_date, "&end_date=", end_date, sep="")