The idea that we can use distributions to describe political systems really stuck with me. Sure, it’s not the only one way to do so, but it seems to offer a particularly interesting and unbiased approach to measuring the survival of political systems. Unfortunately, when I discussed this fact with an example of the Roman Empire, I got it wrong. Not the general conclusion. Just the detail. The way I did it was wrong… which is embarrassing. But there you go, now there’s going to be a record of how I did it wrong and how you do it right, which was kind of what I wanted to do with this blog in the first place.
The rest of this post is organised under the following headings:
- How did I figure I was wrong?
- I counted duration in approximate (integer) years not in days
- I made a mistake calculating the Kolmogorov-Smirnov Test
- Getting accurate data
- My Final Say – Data, Procedure and Results
If you are just interested on how it gets done, scroll through to the end, where I’ve actually posted the excel file with the correct solution to the problem. I hope this helps and somewhat redeems my mistake.
How did I figure I was wrong?
Well, I wanted to go back and bask in the glory of my achievement, and suddenly couldn’t find the excel file where I had done it all. Unfortunately, because I was still pretty fascinated with this stuff, I was obsessed about it enough I just decided to start over, which really isn’t that difficult. Until you start coming across problems, which is when things began to unravel and I realised that for some reason, I wasn’t 100% on what I was doing. Here are the mistakes I made:
I counted duration in approximate (integer) years not in days
Why is this a problem? Because if the distribution is an exponential distribution and for a duration of zero the value of the Exponential CDF is always
CDF(Xi) = 1 – exp(-lambda*Xi),
- Xi is the value of the ith observation of the variable X I am interested in.
- Lambda is the exponential decay factor given by
- lambda = n/ Sum(Xi) (pretty much the inverse of the mean)
for Xi=0 => CDF (0) = 0.
Why is this a problem? Well, because in my simple count of the duration of the reign of an emperor I had counted everyone whose reign had lasted less than 1 year as a 0. From Octavian to Carinus, there were 16 such emperors, which was approximately 30.8% of my entire sample. So you can see that the way I counted things was incompatible with what I was trying to do. I am embarrassed to admit this took me longer to figure out than it should have. But moving onwards…
I made a mistake calculating the Kolmogorov-Smirnov Test
The question you are asking yourself is probably the one I was asking myself at that stage: “How did I manage to do it when I was doing the calculations for that post?” Another way of asking this which is more accurate is: How did I manage to support my conclusion that the empirical distribution was an exponential distribution?” Yet another way of asking this is “How did the Kolmogorov-Smirnov Test allow me not to reject the hypothesis that the two distributions were the same?”
The anwer is that I didn’t. I just screwed up the test. Now, this was probably the most frustrating part of this entire exercise because I had no idea where I had put the original file, which meant I had called it something pointless like “Book2” so for all my efforts I couldn’t find it, which was bugging the crap out of me. But eventually, I did find a file named “Book2” in a suspicious folder opened it and “tadaaa“, there it was in all its glory for me to see: What I had done wrong:
When you want to check whether two samples are taken from the same population you can apply the K-S Test. It is a fascinatingly simple statistical test, which requires you to calculate the difference between the two distributions and then check if that difference would statistically occur by chance if it was taken from a Kolmogorov distribution. So all you need is:
- the two distributions (at least one of which must come from a sample),
- the critical value of your test (corresponding to a level of confidence)
- and the size of sample.
The K-S Test then requires you to compare the maximum difference between the two distributions and checking whether the value is higher than the value of the ratio of the critical value (corresponding to the level of confidence you have chosen) divided by the squared root of the size of the sample. If the maximum difference is larger than the critical value ratio, then you can reject the null hypothesis that the two distributions are the same (with the level confidence that corresponds to the critical value used) and the distributions are probably different. If the maximum different is lower than the critical value ratio, then we cannot reject the null hypothesis that the two distributions are the same.
In the latter case, for a level of confidence of XX%, the maximum difference will actually XX% of the time, but not necessarily XX% +1% of the time. So I can find that two distributions are not the same if I use a critical value corresponding to a level confidence of 90% but then find that at the 99% level of confidence the hypothesis cannot be rejected. This might seem counter-intuitive, but it makes sense? Why, because increasing the level of confidence increases the amount of if I include another 9% of samples, more extreme values are likely to show up in the Kolmogorov distribution.
It seems counter-intuitive to me because I am used to think of significance testing a la economiste, where my null hypothesis is that there is no relationship between the variables, so as I increase the confidence interval I actually restrict the number of values that I can possibly accept. However, in this case the null hypothesis is that they are actually related, so increasing the level of confidence for rejecting the null hypothesis expands my choices.
Anyway, I’m just rambling on because I want to post-pone the inevitable embarrassment of admitting what I did wrong.
You know what I did wrong? I did all of this but instead of dividing it by the squared root of the sample size I divided it by the square root of the standard error, which in this case was smaller, so it increased my critical value giving me more room to not reject the null hypothesis. Once I fixed this it was clear that measuring duration in years wouldn’t cut it. So I started over, which was a head-ache…
Getting accurate data
Why was it a head-ache you ask? For several reasons:
- In the best case scenario, where we have accurate information for the beginning and end of the reign of a Roman emperor, actually calculating the duration in days is a mess. Why? – Because all of this stuff happened over 1500 years ago. Excel is not programmed with this type of dates in mind. As a matter of fact it starts freaking out once you give it dates prior to 1900. This is because excel doesn’t actually understand dates. It calls January 2st 1900 “1” and every day from there is just an addition. So December 31st 1889 would be “0” for MS Excel. Of course it can’t make sense of this fact, freaks out and suddenly you can no longer calculate the difference between two dates through subtraction of the corresponding cells. So even in a perfect world calculating the duration in years is going to be a headache. How do I do this? – Count the days one-by-one? Good luck with that! It’s gonna take ages and as far as I could tell there is no calendar online that you can use. – Approximate? Octavian ruled for approximately 41 years. Just do 41*365? Sure but what about the ones who ruled between less clear periods? Well, what I did was to subtract years and months and multiply by 365 and by 30, respectively, crossed my fingers and hoped that the errors this approximation was introducing were not too problematic.
- The really big problem however is that the (end of) reigns of some emperors was so messy that accurate dates are not recorded accurately. This is particularly problematic when they died in campaign and the troops were on the move. Remember these people did not have GPS and internet. Their time keeping was imperfect and often they were more busy trying to avoid getting killed than keeping time. On top of this in add memoriam damnatio, data censoring and historical revisions and it gets really tricky. Often we are told that someone died between this and that month or in a season of a certain year. To overcome this problem I followed the logic described in Khamaladze, Brownrigg and Haywood (2005) [KBH(2005)] :
When several possible days or even several possible years are suggested, if the dates were not too far apart we selected mid-points or even ‘average dates’. For example, dates “June, 251 – August, 253” for Trebonianus Gallus we interpreted as “15 June, 251 – 15 August, 253” when calculating duration of reign in days. Relative to the total of 729 days of his reign the possible error does not look large. Similarly, “early 244 – September or October 249” as reign dates for Philippus we interpreted as “30 January, 244 – 30 September, 249”, which, again, does not seem to involve a big error relative to the total of 2070 days, etc. As historical statements concerning individual rulers these mid-points or averages would be an impractical solution, but as far as we consider the whole collection of rule lengths as a “statistical ensemble”, replacement of two or more uncertain points by a mid-point will deform the pattern very little.
My Final Say – Data, Procedure and Results
There are some differences between my data set of reign duration in years, in days and the data used by KBH(2005). The latest emperor they include is Theodosius, which bothers me cause the empires final collapse only took place about 170 years after he passed away, so stopping there ignores a lot of data. Their preference to stop there is motivated by the fact that he was the last emperor who ruled all the empire as one. I dislike this because if unity of rule was the rule then we would only include 2 emperors for the Tetrarchy. I am not criticising their choise of who to include and who not to include. More intelligently than myself they defer that choice to historians and use 3 different data sets, all of which give them similar results. This data set just humbly hoped to overcome that gap in the three data sets they used to include the end of the Roman empire. It includes the following remaining emperors: Honorius, Constantine III, Constantius III, Johannes, Valentinian III, Petronius Maximus, Avitus, Majorian, Libius Severus, Anthemius, Olybrius, Glycerius, Julius Nepos and Romulus Augustulus. Feel free to wikipedia them. They belong to one of the most fascinating periods in the history of the Western world, of which they were central players in the events that led to the end of what would these days be called a united Europe.
As an aside, I must admit that seeing their names always makes me a bit nostalgic for a time I never experienced, that I know must have been terribly unpleasant. I know that the middle ages were important for the technological development of Europe and that the enforced unity of the Roman Empire was unusual in that it was only possible to acquire in the presence of overwhelming technological and organisational advantages and that it was possible to maintain in the absence of foreign attacks and internal disruption. Nevertheless, I grew up reading comic books and films that glorified that period of time and for me it holds the same appeal as the Tokugawa period of Japan and its Samurai. Sure it was developmentally stagnant, brutally violent and in addition institutionally unstable, but it was elegant, which is its own sort of appeal. I know, only a minority of those who lived through it would prefer it for what we have now. But when we think about ourselves in an alternative reality, we don’t imagine ourselves as the losers. But I have digressed…
Obtaining the ECDF
Anyway, that’s the data. How do you do the analysis? This is one of those process questions everyone asks and no answer is readily available. Even the wonderfully complete article by KBH(2005) assumes you know how to get the empirical CDF (ECDF) and how to calculate the theoretical exponential CDF. It’s not complicated, but you have to know what you are doing. So here is a step by step process of how to do this:
The first thing to do is to calculate the ECDF. How do you do that? Organise the data set by ascending order in one column.
Next, in the column next to it create a series that follows the span of the observations and increases by one unit every time it goes to the next observation. So next to your ascending observations of X (the Xi) you should have a series that starts with 1, 2, 3, 4, …, N, where N is the total number of observations in the sample.
Then, create a third variable whose value is the ratio that divides the values in the second column by the sample size. You are almost there. You just need to consolidate the data in case some observation is repeated. In this case, we have to make sure that not two (or more) emperors ruled for the same time. If we don’t do this we’ll have the same X measured twice.
Notice that when looking at continuous data it is not very common to find repetitions but it does happen. In my case, there were 2 sets of emperors whose duration of rule coincided so that 2 different durations appeared twice each. What do I mean? Both Pupienus and Balbinus ruled for 104 days in 238; both Gordian I and his son Gordian II ruled for 20 days (or 21?) in 238. Yes it was a messy year in the history of the empire. Right after the end of the Severan dynasty last stable dynasty before the Constantines (which can’t really be called stable given that they all killed each other like maniacs… but again, I digress!)
Now, each observation of duration (our “Xi”, in this instance) is observed once, except for the two values described above. Copy the first column with the Xi into another sheet and eliminate any X that may be repeated. Then take the third column and copy it next to the consolidated Xi in the new sheet and make sure that it accounts for the repeted durations. Done?
Congratulations! This column is your ECDF! To see it, graph the Xi (on the horizontal axis) against the ECDF calculated in the third column and aggregated in a another sheet.
If you’ve done this, you’ll notice that it doesn’t look like an exponential CDF. It’s just a straight, upwards slopping line. If someone showed you this with no further information than that it was a CDF, your best guess would be that it is the CDF of a uniform distribution. But you’d be wrong. Why? Because you are ignoring the scale. In this case, the horizontal axis does not follow any scale. It just jumps between the values observed for the rule duration of the Roman Emperors. How do you overcome this? Adjust the scale of X. Instead of jumping values, create a series in another sheet that is continuous (no need to use decimals just make a data set that has incremental 1-unit increases in its values from 0 to the the longest duration. If you map the ECDF values to this, repeating the next value is observed in the sample, it will begin to look like a CDF that is not a linear function. This is nice for illustration purposes, but you don’t really need it to check whether you variable follows a specific distribution.
The Theoretical CDF
To make the theoretical CDF all you need are the required parameters. If you are checking it against a normal distribution, these are the average and the standard deviation. For the exponential distribution it is the lambda described previously. So just calculate it once and then copy its value on a column on the side of the consolidated Xi series and ECDF. To calculate it you can use the formula above or use the MS Excel formula for the Exponential distribution. Either way, the results should be the same and you should get two pretty curves as below (they are scaled):
By the way, why do we do this? Because we are trying to find out how the variable is distributed. We have an ECDF and want to find out whether it matches a specific distribution better than another.
The K-S Test
Now that you have the two distributions, calculate the difference between the ECDF and theoretical CDF for all Xi.
Then find the maximum difference, the difference from the one that was calculated that is the largest.
Now do the K-S test: Chose a confidence level, somewhere between 90% and 99% and use the relevant critical value. The null hypothesis is that the two distributions are the same. If the difference is larger than the critical value, we can reject the null hypothesis at the chosen level of confidence. If it is smaller, then we cannot reject the null hypothesis with that level of confidence and must thus accept that the variable is distributed according to the same distribution as the one used to calculate the ECDF.
I did this for the exponential distribution and obtained results not dissimilar to those of KBH(2005). For clarity, the figure below shows the maximum difference I found and the critical values that this needs to be compared with, together with the relevant levels of confidence.
My conclusion is that I cannot reject with 99% confidence that the two distributions are the same. Note however that I can reject this hypothesis with 97.5%, 95% and 90% confidence, which is in line with what I had said before. Normally, people use the critical value of 1.36, which corresponds to the 95% confidence level, so this choice is not the most orthodox approach. By pushing the confidence level to 99%, I’m really giving myself a good chance of finding that the two distributions are the same. This isn’t wrong, but it’s important to be conscious of it. Normally people are satisfied with 90% levels of confidence, when the null hypothesis is that the two variables are not related, so it’s important to be consistent. If the rule is that we require results to be accepted with 99% confidence once, that may become the standard and that’s a pretty hefty standard. Or at least there is no reason why the standard should change.
Anyway, if after all this it is not clear, just check out this MS Excel file ( format): Roman Empire duration – Final
It’s all there. I hope it helps!
2 thoughts on “Correcting the Probability of Roman Emperors – Data, Empirical CDF, Exponentional Functions, Kolmogorov-Smirnov Test, Confidence Levels and Critical Values”