This post is a brief summary of how I found myself trying to learn how to model the term structure of the yield curve and the struggles that I faced.

If google brought you to this page in the late moments of your despair and agonising confusion with this model, I recommend you scroll straight to the bottom. The last six paragraphs might shed some light on the problems you are facing.

As I mentioned in my post about doing my PhD, I’m researching bond markets. When I was still working in fixed income and I was trying to figure out what investors wanted, one of the things I came across was some studies of the effect that certain investors had on the yield curve. Two articles in particular caught my eye as being quite interesting:

- Tracking Global Demand for Advanced Economy Sovereign Debt, by Arslanalp and Tsuda
- Government Bonds and Their Investors: What Are the Facts and Do They Matter, by Andritzky

However, while they were both appealing on account of the fact that they actually looked at what investors were doing, they also appeared to me to suffer from the same deficiency: They only looked at one maturity of bonds. Admitedly, by focusing on the most liquid and abundant maturity (10yrs) they picked the best out of the lot, but it seemed to me while I was working that choosing a maturity of issuance was a very big deal if you were a troubled issuer about to struggle to meet an avalanche of redemptions. So it seemed like their framework might be missing some dynamics hidden by the terms structure of the yield curve, ie: the fact that debts with different maturities offer different returns to investors.

I figured I should invest some time into this, which I quickly was made aware was easier said than done. There are at least two conceptual problems with trying to capture all of the maturities of debt outstanding that motivated the authors above to just focus on one maturity:

- First, there are a ton of maturities. Literally anywhere between 7 and 42. Am I going to regress each one of them against my explanatory variables and any controls that I may come across?
- Second, even if I do, won’t I also have to include the other maturities when regressing them against each other? Clearly there might be transfers of debt from one maturity to another that could explain some of the effects.

So I had to find a way around this problem. I needed a way to look at the yield curve, not all the maturities that made it up. So I dug around for articles until I found this very nice ECB Working Paper by Afonso and Martins (2010) who regress the latent factors of the yield curve against a bunch of macro and financial explanatory variables.

What are the latent factors of the yield curve?, you might ask (I did…) Well they are 3 or 4 time varying coefficients that can be estimated from the yields of existing maturities to delineate a mathematical function that describes the whole yield curve for any potential maturity at any given point in time. They are useful for pricing new bonds with maturities between segments that are not yet in the market and for which there is therefore no real world price yet. They are part of a tradition of parametric estimation of the yield curve that dates back to a 1987 static contribution and is known as the Dynamic Nelson-Siegel model, or DNS for short. While other models exist, such as spline methods or the models in the Vasicek (1977) tradition such as the one by Ang and Piasezzi (2003), the DNS has the advantage that the 3 or 4 latent factors can be interpreted in a logical way as describing the level, slope and curvature (as well as the convexity for the 4 latent factor models of Svensson 1994) of the yield curve. Using this model thus allows me to go from having a gazillion dependent variables to only having 3 (as it turns out, using 4 was problematic because there was a lot of correlation between the curvature and the convexity, which came back to bite me in the form of multicolinearity when I started running regressions). So that’s great, right?

Sure, except that now I needed to actually understand and estimate all this stuff which was not easy. I didn’t really have a formal academic modelling background in finance so as I transited from Afonso and Martins (2010)’s familiar world to the more abstract modelling world of the DNS model, I was quickly overwhelmed. There was algebra I had never seen before everywhere, and it seemed like the Kalman filter just decided it was not going to go away. Oh and I had no idea how to use the data or really how much data I needed. The DNS world was complete jibberish to me when I started out. I did have some data for the USA, because Afonso and Martins (2010) mention they took their data from Gurkaynak, Sack and Wright (2006). But at this time I was so confused I hadn’t realised I did not need to estimate all the maturities.

As I said the DNS model is neat because it allows you to take some maturities and plot what the yield curve would looks like for all other maturities. Using the model normally is done in 2 steps. First you estimate the DNS model and the latent factors and then you estimate you use this model, plug in the maturities you are interested in knowing the hypothetical yield of and compute. However, this is not what I wanted to do. All I needed to do was get the latent factors. After step one, I was gold. Surprisingly, it took me a while to get that…

Anyway, that’s not the end of it. As it turns out the model is a bit tricky. It looks like this:

Where yt(T) is the yield of a specific maturity T at time t, Lt is the level latent factor, St is the slope latent factor and Ct is the curvature latent factor. Also as you can see, the model follows an exponential form, which adds a little added problem. You need to calculate the rate of exponential decay lambda for every period t and for every maturity T. Now, if you assume it away, say by taking the estimates some other article found, estimation of the latent factors is very easy, All need is the maturity dates for the maturities whose yields you’ve got and you estimate the latent factors of the yield for every period t through OLS. The problem is that ideally, you want to estimate the lambda simulataneously with the latent factors, which is much more difficult than just OLSing away. I was using matlab to code the research, so I went online and tried to find an explanation. Unfortunately, mathworks only has a page about how you can use one of their toolboxes to estimate the whole yield curve through DNS, not just the step where you get the latent factors.

Fortunately, Diebold and Rudebusch (2013) came out just in time. There, the authors argue that there’s very little value added to bothering a lot with the lambda. So I did just that, also because you know, I had the rest of the research to focus on and this little detail was dragging the whole process down. I just took Afonso and Martins’ estimates of lambda and went with the simpler estimation process.

Honestly, I’m still working on it. I do want to figure out how to estimate the lambda myself. As it turns out the trick seems to lie with using a Kalman Filter, but I haven’t had the time to investigate it further. The irony, of course, is that apparently there’s a simple way of doing it in excel, but I’m still trying to figure out what the programme is doing in the background.