U.S. Unemployment

I am tired of Bloomberg News quoting single numbers and percentages in their articles on unemployment, it gives me an incomplete picture. So I decided to graph the relevant series myself. I sourced the data from the St.Louis Fed and started graphing.


The gray areas are the recessions as reported by the FED. The current recession has apparently ended, but even the FED has the latest date as a placeholder only.

As you can see Roubini and Bernanke are "right". This recovery is jobless. Never before has the duration of unemployment been so long. Basically, people who lost their jobs are still unemployed. The "official" unemployment is at 10%, but as you can see in the second graph the labour force has decreased in the last year, which is strange since the labour force has a general uptrend. Also the participation rate is down, which means that the FED is hiding unemployment behind their definition of "people who have not looked for a job in the past 4 weeks are not part of the labour force".

I created the graph using Python with matplotlib. For a while now I've been using the two to prototype studies at work. Matplotlib is great, and I can see that it's constantly improving. A year ago it could handle plotting only a day or two of high-frequency data, now I can plot a week. The widgets (sliders, radio buttons, etc) included in the package make prototyping extremely easy. I am also learning more and more about Python and I like what I am seeing, generally.

It was fairly easy to put together the graph, but I found out that another package which I like R with quantmod can do the work as well. Choices, choices... maybe I'll end up combining Python with R :)

Now, on to graphing more of the economy ...

VIX vs. S&P500 Historical Vol

A while back I read at article about a correlation between the VIX and the SPX 20-day historical volatility. Since I am trying to do more work with quantmod, I decided to try and put it together - shouldn't take too long. I've included the code below and as you can see from the screen capture the guys at Goldman Sachs were right :)


## VIX discounted to SPX historical vol?
##
## A concern that I voiced last week has carried over into this week, as the CBOE
## Market Volatility Index (VIX – 54.28) is once again trading at a discount to
## the SPX's 20-day historical volatility. During the past several months, major
## market rallies have tended to occur after the VIX traded at a premium to the
## SPX's 20-day historical volatility. Moreover, the market's darkest moments
## during the past couple of months have occurred when the VIX trades at a steep
## discount to the SPX's 20-day historical volatility reading, which is the case now.
## -- http://www.forbes.com/2008/12/15/goldman-morgan-stanley-pf-ii-in_jh_1215outlook_inl.html
##
## output a graph and a file
## Author: Peter Michaels, (c) 2009
##
##

library(TTR)
library(quantmod)

SnP500_tick <- "^GSPC" #yahoo symbol for the S&P500 index
VIX_tick <- "^VIX" #yahoo symbolc for the VIX index
histVolPeriod <- 20

snpIndexData <- getSymbols(SnP500_tick,from=Sys.Date() - 365,to=Sys.Date(),auto.assign = FALSE)

vixIndexData <- getSymbols(VIX_tick, from=Sys.Date() - 365,to=Sys.Date(),auto.assign = FALSE) # calculate the 20-day historical volatility of S&P500

snpHistVol <- rollapply(snpIndexData, histVolPeriod, sd, by = 1,ascending = TRUE,by.column = TRUE, na.pad = TRUE) # graph the VIX + S&P500 20-day historical vol

chartSeries(Cl(vixIndexData), name="VIX vs. S&P500 Hist Vol", col='green', TA=NULL)
addTA(Cl(snpHistVol), col ='red', on=1) # graph S&P500 + SMA to detect rallies

addTA(Cl(snpIndexData), col='blue')
addSMA(n=10, overlay=TRUE, col='red', on=2)



The picture above shows:
top chart, the VIX (green) and the 20-day historical volatility of the S&P500 (red) - ooohh the gap is narrowing once again, perhaps it's time to take profits
bottom chart, daily S&P500 closing (blue) and a 10-day SMA to better display rallies

So far I am loving quantmod, my only complaints are inability to label graphs properly and automate saving to a file. Perhaps I just haven't discovered how to yet.

Intraday Data Loading Using Python

I find myself using Python for most of my scripting needs, although sometimes I revert to Bash.

A recent trading idea required intraday data. I found it difficult to find historical intraday data for free; all the news sites offer only historical daily close data. Google and MSN offer real-time quotes, but not historical, creating a page scrapper and running it for a few days seemed like overkill.
I found a small shareware application called "Historical Quotes Downloader" (it uses data from prophet.net) and used it to get 5 days of minute-by-minute intraday data for the S&P 500. That left me with 500 files that needed to be loaded in MySQL...so I wrote a little Python script to do just that. After loading the data, I plan to use quantmod for R to do the research.

Here is the quick Python script for loading files from Historical Quotes Downloader/prophet.net into MySQL, each tick is in a separate table.


# import all of the intra-day data in the table
# create a table for each of the files, using the name before the first period
# and load all of the intraday data
#
# this script could be implemented using fileinput, should try that next time

import sys, datetime, time
import os, fnmatch
import MySQLdb

try:
conn = MySQLdb.connect(host = "localhost",
user = "username",
passwd = "password",
db = "intraday")
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
sys.exit(1)

if len(sys.argv) < 2:
print "Proper Command: "+sys.argv[0]+" ..."
sys.exit(0)

# expand the regular expression to all the matching files
fileList = []
for fileName in os.listdir(os.path.dirname(sys.argv[1])):
if fnmatch.fnmatch(fileName, os.path.basename(sys.argv[1])):
fileList.append(fileName)

# loop though each of the files passed in
for tickFileName in fileList:

tickName = os.path.basename(tickFileName)
tableName = tickName[0:tickName.index(".")]
print "creating table:%s:\n" %(tableName)

cursor = None
# create the table
try:
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE `"""+tableName+"""` (
`datetime` DATETIME NOT NULL DEFAULT 0,
`open` DOUBLE NOT NULL DEFAULT 0,
`high` DOUBLE NOT NULL DEFAULT 0,
`low` DOUBLE NOT NULL DEFAULT 0,
`close` DOUBLE NOT NULL DEFAULT 0,
`volume` INTEGER UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY(`datetime`)
)ENGINE = MYISAM;""")
cursor.close()
conn.commit()
except MySQLdb.Error, e:
print "Error %d: %s" %(e.args[0], e.args[1])
try:
conn.rollback ()
except:
continue

#open the file and load
tickFile = None
try:
tickFile = open(os.path.join(os.path.dirname(sys.argv[1]), tickFileName), "r")
cursor = conn.cursor()

# Symbol (optional), Date (optional), Time (optional), Open, High, Low, Close, Volume
tickFile.readline() # remove the header
tickLines = tickFile.readlines()
for line in tickLines:
# detect the end of the ticks in the file
if line.find("barsofar") != -1:
break
tokens = line.split(",")
try:
sqlDate = time.strptime(tokens[0], "%m/%d/%Y %H:%M:%S")
except (ValueError, IndexError):
print "Unexpected format " + str(sys.exc_info()[0])
break #the file might be empty so proceed with next one

try:
cursor.execute("""INSERT INTO `"""+tableName+"""` (datetime, open, high, low, close, volume) VALUES(%s, %s, %s, %s, %s, %s)""",
(time.strftime("%Y-%m-%d %H:%M:%S", sqlDate),tokens[1],tokens[2],tokens[3],tokens[4],tokens[5]))
except MySQLdb.Error, e:
print "Error %d: %s" %(e.args[0], e.args[1])
break

cursor.close()
conn.commit()
except MySQLdb.Error, e:
print "Error %d: %s" %(e.args[0], e.args[1])
try:
conn.rollback ()
except:
pass
finally:
if tickFile != None:
tickFile.close()

# end of for loop over files

conn.close()



The files I created are in the format MMM.intraday and I call the script "python intraday_db_load.py *.intrday".

If the experiments are a success and a trading pattern is discovered, the plan is to use the Interactive Brokers API for real-time data. I am hoping that opentick.com comes back and I can make use of their API too.

Helper Scripts

At work (FX Trading company) I created a number of scripts to analyze our logs.
I wanted to look at the spreads (ask - bid) and how they change during the day, average price comparison between liquidity providers, as well as graphing the delay of producing a tick in our system.

The first few scripts were written in bash and then gnuplot was used to graph the results. However, recently I discovered SciPy an extension to Python, along with Matplotlib for graphing. I have converted most of my scripts to Python - it's a lot cleaner, and easier to work with. The only problem I've encountered is having too many data points - it crashes Matplotlib, so then I have to fall back to gnuplot.

I am planning on learning more of Python and SciPy - I wonder if finance packages already exist.

Pairs Trading

After auditing a course on Time Series stats and reading a few books, I put together some R code to do pairs trading. It's a fairly old strategy that some people claim doesn't work particularly well at the moment. Regardless, I wanted to try it out and indeed it is possible to generate profits. The hardest part is discovering pairs.

Pairs trading involves picking two stocks whoes price spread over time is cointegrated, thus forming a stationary spread. The spread must be stationary and according to the mean reversion principle any deviation from the mean of the spread will be corrected over time. It is these deviations from the mean that can be exploited for profit. A spread can be formed using StockX and StockY in the following way StockX - r * StockY. If the current spread is above the mean, short the spread (sell StockX and buy r of StockY) and wait for mean reversion. If the current spread is below the mean, go long on the spread and wait for mean reversion. From personal observations, it seems that the mean level of a spread lasts about 200 days, at that point a new mean is established. It is recommend to use this strategy during highly volatile markets or during earnings/news releases.

Strategy Basics:
1. Find correlated stock pairs on which to run search (best pairs come from the same industry) - you can run a correlation search, although not the best way.
2. Test to see if the individual price series are stationary, if not test the residuals of the spread using the Phillips-Perron test. You want to establish that the spread is stationary, if it's not move any further calcs are meaningless.
3. Create the spread (StockX - r * StockY) of the two stocks using linear regression (not the only way).
4. Check for deviations from the mean above a certain level and use those as triggers.
Bonus:
5. Create an ARMA model of the spread so that you can use standard stats tools like standard deviation and confidence intervals. In my experience most spreads can be mapped using an AR(2) process.
6. Once ARMA model is complete use 95-99% confidence interval as a trading trigger. You can also try to do forecasting using the model.


I might be willing to share the R source for this, but you must have something to offer in return :)

In addition I create some R code to do backtesting, comes in useful when exploring the possibilities for generating profits.

Stock Price Updater

I created a program for updating a mySql database with stock quotes from Yahoo finance. It can be run as a daemon so that it downloads all the data for a specified number of years the first time it's run, and updaes all the quotes an the end of every day after that. You may find it useful, if you do write back to let me know :)

I am offering the Eclipse project (just import it) - it will help if you know some Java. If there is enough demand I'll put together a binary for ease-of-use.

stockSaver.zip
- to configure the application look at the stocksaver.properties file, I've included the sql to create a basic table for all the data, the symbols for the S&P500 and some sql code to run correlations on all the pairs in the table - enjoy

P.S. to run it as a deamon check out the apache-daemon project

Evaluating Risk using the Merton Model

s mentioned previously, for my AMP466 Assignment, I'll be using the Merton Model to evaluate credit risk of corporate debt i.e. calculate the probability of default for a given company.

You can learn more about the Merton Model here.

Basically, the model uses a single debt and equity issue to represent the financing. The assets of the company is the total market capitalization and the debt is the market capitalization multiplied by the debt/equity ratio. Adding the two numbers gives the total value of the company. The model is used to calculate numbers for one year.

It is mostly an academic tool, since there are problems with its application in real world scenarios, a few short comings are:

  • the model can be used only for companies who have stock
  • governments cannot be evaluated
To compensate for these limitations a company, KMV, was created to commercialize Merton's work, it was later bought by Moodys.

I choose to evaluate First Solar (FSLR):

First you need to calculate the historical volatility of the stock, here and here is some more info on that. First Solar has been public since Nov 17th, 2006, so I used all of the available daily data.

logarithmic return: is the stock value on day i

sample variance: , is the mean of daily returns.

annualized volatility: multiply by 252 trading days since I use daily data

I calculated the historical volatility to be 70

Whenever necessary, I used the LIBOR risk free interest rate of 2.52%, it changes so look it up.

Credit Spread
S = Market Cap = 21,864.65 million
B = Current Debt = S * Debt/Equity = 21864.65e6 * 0.0986 = 2155.85 million
V = B + S = 2155.85e6 + 21864.65e6 = 24020.5 million

Finding Future Debt
K = 0.0986 * 2155.85e6 * e0.0252 = 2210.87 million this is the value of the debt discounted to one year into the future.

Using the Black-Scholes formula I find the volatility of the assets. John Hull has a B-S calculator on his website, I recommend using it or another of the B-S calculators floating around.
Treat the Future Debt as the strike price, and the Total Assets as the value of the equity. Use the historical volatility that was calculated previously.

Updating the volatility on each iteration, using the result is σv = 71.52% Asset Volatility

ln(K/B) = ln(2210.87/2155.85) = 0.0252  yield
2.52% - (risk free rate = 2.52%) = 0%  Credit Spread

Annual Risk Neutral Probability of Default

to be con't

Expected Credit Loss

to be con't


All the beautiful equations that you see are care of LATEX for blogger.