BAS Main Index
  [Science]   [BAS home]   [Met home] Antarctic Meteorology 


Lund and Reeves: detection of changepoints

The paper and abstract is:

Lund R, Reeves J, JOURNAL OF CLIMATE, 15 (17): 2547-2554 SEP 2002

Abstract:
Changepoints (inhomogeneities) are present in many climatic time series. Changepoints are physically plausible whenever a station location is moved, a recording instrument is changed, a new method of data collection is employed, an observer changes, etc. If the time of the changepoint is known, it is usually a straightforward task to adjust the series for the inhomogeneity. However, an undocumented changepoint time greatly complicates the analysis. This paper examines detection and adjustment of climatic series for undocumented changepoint times, primarily from single site data. The two-phase regression model techniques currently used are demonstrated to be biased toward the conclusion of an excessive number of unobserved changepoint times. A simple and easily applicable revision of this statistical method is introduced.

A comment appeared as Wang, J Climate, 3383-5, v16, 2003, about how things change if you know the 2 slopes are the same and the series just has a jump. This is coded up via the /wang option.

IDL code implementation (CVS as ,v)

The main code is: lund.pro (,v) and a "driver" routine is lund_plot.pro (,v). There is an auxillary routine lund_sig (,v) to report significance at 95%. It takes no account of autocorrelation.

The code *should* be internally documented so I won't repeat myself here.

The code implements the Lund and Reeves paper. It additionally allows for missing data, and (necessarily) for irregularly spaced time points. This produces correct-looking results, but I'm not sure whether or no it affects the validity of the statistics.

I've tested the whole thing somewhat, and I believe it to be correct. But be cautious...

Significance

Is included...

Examples

Example 1. South Pole data

@ex1

Only the break found in MAM is "significant". But do we believe it?

Example 1a. South Pole data, redone with /wang

lund_plot,deseasonalise((readfromfile('$WMCDATA/a-s.gjm.dat'))(indgen(12)+1,*)),tis=1958+indgen((2002-1958+1)*12)/12.,title='A-S',/wang
gettwogifs,out='pole-wang'                                                                                                             
The break isn't sig, and doesn't look it either.

Example 2. Olenk (Siberia)

@ex2

This time the breakpoint *is* considered significant, and it looks to fit too.

Example 3. Orcadas [2003/11/18]

Looking at Orcadas (data from Gareths web page), with(out) the "wang" option (which forces same-slope on the two segments).
lund_plot,deseasonalise((readfromfile('$WMCDATA/orcadas.gjm.dat'))(indgen(12)+1,*)),/wang
gettwogifs,out='orc'

lund_plot,deseasonalise((readfromfile('$WMCDATA/orcadas.gjm.dat'))(indgen(12)+1,*))      
gettwogifs,out='orc1'                                                              
LHS: break detection, using /wang).
RHS: break-and-slope, using the original.

Using /wang, a sig jump is detected about point 240, ie 20 years in, ie about 1924. A secondary max (nearly as high) is detected about 560, ie 47 years in, ie about 1951. Did anything happen at those 2 years?

Using the original methos, the max jump is an implausible one near the end. But there is still a secondary max near 560.

Example 4. Faraday [2003/11/18]

Again, from Gareth's web page.
!p.multi=[0,2,2]
lund_plot,deseasonalise((readfromfile('$WMCDATA/faraday.gjm.dat'))(indgen(12)+1,*)),/wang,/nop,tis=1946+indgen((2002-1946+1)*12)/12.
lund_plot,deseasonalise((readfromfile('$WMCDATA/faraday.gjm.dat'))(indgen(12)+1,*)),/nop,tis=1946+indgen((2002-1946+1)*12)/12.,title='Faraday'      
gettwogifs,out='far'

Both methods pick up a jump in 1976. If removed, this would *increase* the trend. But... is this plausible? The jump is about 1 oC. Sadly the metadata says not: the screen moved in 1986 and PRTs were used from 1984. And in fact the jump doesn't really look like a jump. Thats statistics for you.

Test - when breaks are not sig

Testing on 250 series each of length 1000 (500,break,500): (BTW, if you run lund-test, it ends up by drawing you a piccy, a histogram of where the breaks were, with all in black and those sig in green):
lund_test,l=500,n=250,jump=0.0
Prints: 0.0440000 fraction of breaks were considered significant

Ie, on series with no breaks, about 5% are reported to have them, as it should be.

Using the /wang option:

lund_test,l=500,n=250,jump=0.0,/wang
0.0520000 fraction of breaks were considered significant, which is marginally better, as it should be.

On a series of length 500 with unit variance and a unit jump in the middle, 100% are detected as having a break, most in the right place.

Adding a bit more info:

lund_test,l=250,n=100,jump=1.0,/wang

      1.00000 fraction of breaks were considered significant
     0.990000 fraction of breaks were found in the right place (+/-25)
     0.990000 fraction of breaks were sig and found in the right place (+/-25)
     0.990000 fraction of sig breaks were found in the right place (+/-25)
so thats good.

But for a shorter series (25,25):

lund_test,l=25,n=100,jump=1.0,/wang 
     0.120000 fraction of breaks were considered significant
     0.140000 fraction of breaks were found in the right place (+/-2)
    0.0200000 fraction of breaks were sig and found in the right place (+/-2)
     0.166667 fraction of sig breaks were found in the right place (+/-2)
Ah well. Its hard on short series.

 
lund_test,l=100,n=100,jump=1.0,/wang
     0.670000 fraction of breaks were considered significant
     0.810000 fraction of breaks were found in the right place (+/-10)
     0.570000 fraction of breaks were sig and found in the right place (+/-10)
     0.850746 fraction of sig breaks were found in the right place (+/-10)
So a series of (25,25) is too short for reliable detection of real breaks equal to the variance; (250,250) is OK; (100,100) is about 2/3 OK. If you increase the jump to 2 (ie twice variance) then (100,100) is 100% OK, but (25,25) still isn't.

Lets see a piccy of the histogram: remember, black is all breaks, green is those sig, thick dashed line is where they should be, thin dashes are limits where breaks are considered to be in about the right place:

Now, consider the case when there really is a break in the trend. Suppose we add the /trend keyword, which adds a trend of magnitude "trend" from after the break, with zero mean.

lund_test,l=100,n=100,jump=0.0,tre=2,cc=cc,cn=cn,ssig=ssig
     0.980000 fraction of breaks were considered significant
     0.820000 fraction of breaks were found in the right place (+/-10)
     0.820000 fraction of breaks were sig and found in the right place (+/-10)
     0.836735 fraction of sig breaks were found in the right place (+/-10)

lund_test,l=100,n=100,jump=0.0,tre=2,cc=cc,cn=cn,ssig=ssig,/wang
     0.980000 fraction of breaks were considered significant
     0.140000 fraction of breaks were found in the right place (+/-10)
     0.140000 fraction of breaks were sig and found in the right place (+/-10)
     0.142857 fraction of sig breaks were found in the right place (+/-10)
So, with (100,100+trend) the original Lund method is good. But if you accidentally use the (inappropriate, in this case) Wang modification, it persistently gets the break in the wrong place (no great surprise, of course).


Well, there you go. Comments to wmc@bas.ac.uk

Past last modified: 2/3/2004

© Copyright Natural Environment Research Council - British Antarctic Survey 2002