Historical Financial Data in R for Stocks

This blog is a follow up to a blog explaining how to pull Intrinio financial data into R and R-Studio. In that blog I showed the basics of how to get the data flowing. In this blog I take it one step further and provide custom functions that will allow you to pull historical data into R very efficiently. I plan to build quant models, predicting historical prices based on historical metrics for a stock, and use a subset of the historical data to back test my models. This blog explains how to get the data for such an analysis.

Update 05/22/16- Check out this blog as well showing how to create a for loop in R to get multiple pages of data via API. This example shows the best way (known to the author) to parse JSON from an API in R.

If you read the first blog in this series, you already know how to get your API keys for free at intrinio.com/login. I'm using example API keys in this blog, you need to substitute your own. Additionally, you will need to have the httr package installed and loaded. As a disclaimer, I am a noob to R and I would love to hear how you would improve on this code.

The first function I created pulls the entire daily price history for a stock, creating a data-frame with many thousands of rows and providing pricing data like high, open, low, and close for every trading day between today and the 1980s.

prices <- function(ticker){
price_base <- "https://api.intrinio.com/prices?identifier="
username <- "a543b029ec930ab0c7add95bfa1ea3ac"
password <- "991d8ca925d74ecfbe7f78b4784d88b0"

price <- paste(price_base,ticker,sep="")
tp <- GET(price, authenticate(username, password, type = "basic"))
z <- unlist(content(tp,"parsed"))

n=length(z)
b=as.data.frame(matrix(z[1:(n-5)],(n-5)/13, byrow = T))
names(b)=names(z)[1:13]

return(b)
}

If you call the function, creating an object, and using your own API keys for the "username" and "password" objects within the function, you will have your data frame:

t <- prices("AAPL")

The really nice part about this function is that you can now create data frames quickly by swapping out AAPL for any tickers you are interested in analyzing. Graphing prices becomes very easy, but my goal is quant modeling.

For that, I need financial metrics that I expect to be correlated with price. This second function returns daily historical data over the period of your choice for the metric of your choice.

history <- function(ticker, item, start_date, end_date){
history_base <- "https://api.intrinio.com/historical_data?ticker="
username <- "a543b029ec930ab0c7add95bfa1ea3ac"
password <- "991d8ca925d74ecfbe7f78b4784d88b0"

historical <- paste(history_base, ticker, "&item=", item, "&start_date=", start_date, "&end_date=", end_date, sep="")
tp <- GET(historical, authenticate(username, password, type = "basic"))
z <- unlist(content(tp,"parsed"))

n=length(z)
b=as.data.frame(matrix(z[1:(n-5)],(n-5)/2, byrow = T))
names(b)=names(z)[1:2]
return(b)
}

This time, the function needs a couple of inputs because we are looking at a certain period for a specific financial metric. You could, for example, call the function with these inputs to get the daily market cap for AAPL ranging from October 1, 2016 back to January 1, 2010.

app_hist <- history("AAPL","marketcap", "2010-01-01","2016-10-01")

The applications of this type of function are fantastic if you are a analyst or a developer working in R. Using Intrinio's data in R means you won't spend so much time gathering and organizing data, the bane of our existence. Instead, you can quickly get the data you need to start modeling, testing assumptions, and making decisions.

If you are interested in looking up the tags and API syntax that Intrinio uses, we have a blog explaining how to get started with the Intrinio API.

I will break down the functions from this blog to make your life easier when you replicate them with other Intrinio data.

Function 1 breakdown:

prices <- function(ticker){

#price_base, below, is the base syntax of our API call since we are looking at prices. There are other base syntax for other purposes

price_base <- "https://api.intrinio.com/prices?identifier="

#Username and password here are from a test account, you can get yours at intrinio.com/login. They are free up to a certain amount of data daily.

username <- "a543b029ec930ab0c7add95bfa1ea3ac"
password <- "991d8ca925d74ecfbe7f78b4784d88b0"

The price object we create next completes the API call by pasting the ticker on the end of the base API syntax.

price <- paste(price_base,ticker,sep="")

#tp uses the httr function to make our API call, passing our API username and password to Intrinio's server to authenticate.

tp <- GET(price, authenticate(username, password, type = "basic"))

#z parses the code and unlists it. This is another function of the httr package and it takes the raw return values of our API call and sorts them out for us

z <- unlist(content(tp,"parsed"))

#These next three lines are the hardest to understand. Their purpose is to convert the character vector we have into a data frame to make it easier to analyze. Starting with n, we get the length of z, our output.

n=length(z)

#b takes our output and creates a matrix. Right now, we have one long vector that repeats the 13 values we are interested in over and over. Additionally, there are 5 values on the end that we don't need, those values tell us the status of our API call and other information from Intrinio's servers. The matrix removes those values, then turns every 13 values into a row.

b=as.data.frame(matrix(z[1:(n-5)],(n-5)/13, byrow = T))

#Finally, we give our columns the same names as the first 13 values in our vector.

names(b)=names(z)[1:13]

return(b)
}

 

Function 2 breakdown:

#You'll notice that the historical_data function needs more inputs than prices.

history <- function(ticker, item, start_date, end_date){

#Notice the difference in base API syntax

history_base <- "https://api.intrinio.com/historical_data?ticker="
username <- "a543b029ec930ab0c7add95bfa1ea3ac"
password <- "991d8ca925d74ecfbe7f78b4784d88b0"

#Notice the extra data in our API call to inform the server of what we want

historical <- paste(history_base, ticker, "&item=", item, "&start_date=", start_date, "&end_date=", end_date, sep="")
tp <- GET(historical, authenticate(username, password, type = "basic"))
z <- unlist(content(tp,"parsed"))

n=length(z)

#In this case, notice how we divide by 2 instead of 13. This API call only returns two data points, the date and the marketcap, whereas the last API call returned 13.

b=as.data.frame(matrix(z[1:(n-5)],(n-5)/2, byrow = T))
names(b)=names(z)[1:2]
return(b)
}

Update 10/24/16 A third blog in this series showing more advanced functions is available here.

  • Christopher Flach

    Hello, I’m trying to do something that, in my head, seems very simple. But I just can’t find the solution. I just want to pull the EOD data for all currently trading stocks back to a specific time (2016-01-01 for example). Any recommendations on how to do that?

  • Rachel Carpenter

    Hey Christopher! Can you do me a favor, and message us on the chat support? We can help you much faster and easier on there. Just click the button in the bottom right hand corner and copy your request. Thanks!

  • Khosro Heydari

    This is a better way in R to use your API:
    history <- function(ticker, item, freq){
    history_base <- "https://api.intrinio.com/historical_data?identifier=&quot;
    username <- "whatever"
    password <- "whatever "

    historical <- paste(history_base, ticker, "&item=", item, "&frequency=", freq, sep="")
    tp <- GET(historical, authenticate(username, password, type = "basic"))
    do.call(rbind, content(tp,"parsed")$data)
    }

    Instead of your last few lines (that made no sense to me but it works somehow), I just do.call rbind on data part of the content of the call return.

    I hope it helps others.

    PS: I like your API but the limit (500 calls) is too restricted. Please increase it to 10,000 calls or so per day.