Notes on processing the SPACE sonic data

 

Contents (hyperlinked)

 

 

Last updated 23rd January, 2007. Phil Anderson

 


Overview.

 

The generation of processed "sonic data" involves two stages

 

 

Raw data are written with Endian reversal (due to the LabView logger programme) and in a mixture of int16 and int32. To allow rapid search for time boundaries (see below), the files are re-written in double. In addition, binary files start and end exactly at 00:00, whereas raw files have data bleed into the next day (due to buffering).

 

Once binary files are generated, the tilt analysis is performed on the binary files (see the document "Summary of Tilt Analysis" in this directory).

 

Once the tilt parameters are calculated, processed files are generated, which are the means, cross- and co-variances calculated over 1 minute periods.

 


Data types and name conventions

 

 

raw file name convention

 

csYYMMDD.00N

 

where YY is the year number (03 = 2003 etc), MM is the month number (1 = January etc) and DD is the day of the month. N is the identifier of the sonic:

0          4 m sonic on the 30 m mast

1          16m sonic on the 30 m mast

2          32m sonic on the 30 m mast

3          4 m sonic to the south of the 30 m mast (10 m horizontal separation)

4          4 m sonic to the north of the 30 m mast (100 m horizontal separation)

 

Binary file name convention

 

The binary files use the extension .b00, .b01 etc instead of .000, .001 but the root name is identical to the raw data

 

xp file name convention

 

The binary (-mat) cross product files ("xp" files) use .x00, .x01 etc

 

ASCII file name convention

 

The ASCII cross  product files use .a00, .a01 etc

 


Binary file generation

 

The 40 Hz binary files may be viewed as a pre-processing stage for the cross product generation, but are also used for tilt analysis and internal wave studies, hence they are held on the SAN along with raw and fully processed data.

 

Each daily binary data file differs from the raw file as follows:

 

 

programmes

 

raw_to_binary.m calls

            f_raw_to_binary.m

f_read_rec3.m

 

or

 

            f_raw_to_binary_local.m

f_read_rec3.m

 

 

raw_to_binary.m provides the start date and number of days (plus some flags) to f_raw_to_binary.m. f_raw_to_binary.m generates an input directory (where the raw data are held) and an output directory for the binary files. f_raw_to_binary.m calls f_read_rec3.m to read in the raw data.

 

f_raw_to_binary_local.m copies the u:drive raw files to a local (PC) folder to increase speed of operation.

 

Definitive copies of these files are held in the SAN directory

 

//space/sonics_processing/m_files/

 

raw_to_binary.m can stand alone to generate a block of binary data, or f_raw_to_binary.m can be called as a function.

 

Binary files have the added advantage that file pointer arithmetic can be carried out using fseek.m and ftell.m (internal Matlab function).
Processed file generation

 

One minute averages of means and cross products are generated from the binary files The programmes perform the following:

 

 

Spikes are ignored in the data processing, but the percentage of spike data are recorded for later QC.

 

The data are rotated and de-tilted according to the analysis given in Summary of Tilt Analysis.

 

Record Block Selection

 

Due to the poor handling of high iteration loops in Matlab, the averaging period selection uses file pointer arithmetic based on the expected length of the one minute (data block (other averaging periods are possible). Three " jump and re-calc" steps are taken, in that the initial pointer jump will hit a record with a time stamp possibly a few fractions of a second away from the exact end of the minute. This is due to the slight variability in the sonic anemometer logging rate, that there will not necessarily be exactly 24 x 3600 x 40 records in every day.

 

The time of the record at the new file pointer position is recovered, and the difference from the target time (typically < 1 s) used to nudge the file pointer. Three iterations was found to be sufficient. The whole block of data, from initial file position to next file position is loaded, and forms the basis of the data set for the cross product calculations.

 

De-spiking

 

The each [u,v,w,t] vector, the data block was indexed to find points with abnormal deviation from the mean; that is, value - mean value outside the expected range. These ranges were +/- 50 m/s for wind vector and -20 to +20 for temperature. Occasionally, data exhibited significantly poor condition when the whole one minute block would be affected by erroneous readings. In these cases, the de-spike algorythm failed, because the mean was skewed sufficiently to bring all the data within the limit range. Such data are obvious from the mean winds / temperature plots.

 

Elimination was by interpolating the across the spike data using MAtlab interp1. cubic interpolation was used to resolve rare interpolation failure when data at the end of the record were spiked (linear interpolation requires straddle data). The effect on the means and cross products of using spline vs. linear was not detectable.

 

De-tilting and rotation

De-tilting is carried out using a seperate set of tilt coefficients for each year, corresponding to the possible re-deployment of the instrument on the mast. Redeployment is necessary for the lower instruments to account for snow accumulation at the site.

 

The mean tilt correction coefficients are extracted from the data according to the technique described in "Summary of Tilt Analysis".

 

The wind vector set, [u,v,w] is de-tilted to give, effectively, a mean w=0 for the year; this correction is applied to [u,v,w] by generating a 3 x 3 rotation matrix based on the wind direction. The [u, v] then rotated about a vertical axis, by generating a 2 x 2 rotation matrix to give mean v = 0.

 

Cross Product Calculations

 

Following de-spiking, the one minute long vectors are de-trended to remove both trend and mean, and then multiplied to generate the ten cross products in order:

uu,uv,uw,uT, vv,vw,vT, ww,wT, TT.

 

The one minute means are calculated on the un-rotated and non-de-tilted data. This is to allow later analysis of tilt and provide wind direction information.


Raw, binary, xp and ASCII file formats

 

 

Raw data are in unsigned short integer ("uint8") format, in LabView orientation (big endian, little endian). Each record is 13 bytes long:

 

Number of byte

Variable

Comments

1

date-time stamp, T1

time is held in 5 bytes

2

date-time stamp, T2

 

3

date-time stamp, T3

 

4

date-time stamp, T4

 

5

100th of second, T5

 

6

u1

 

7

u2

 

8

v1

 

9

v2

 

10

w1

 

11

w2

 

12

T1

 

13

T2

 

 

 

LabView time, TLabview , = T1 x 224 + T2 x 216 + T3 x 28 +T4 + T5/100

 

Note that data must be handled as unsigned. These operations carried out using bitwise shift. Result is in units of seconds from LabView's zero time date. The date-time stamps in the binary data are in MatLab time:

 

TMatlab = TLabview / (24 x 60 x 60) + 695422

 

TMatlab are the number of days since the zeroth of January, 0000; hence, 1.5 is midday on Jan 1st, 0000, whilst midday on Jan 1st 2003 is 731582.5.

 

wind and temperature data are byte swapped, and given 2's compliment to ensure negatives are held correctly. Data are divided by 100 to kive SI units: ms-1 and oC.

 

Binary data are written in double precision floating point (8 byte) Intel format (little endian, big endian). Records are five doubles wide = 40 bytes:

 

TMatlab

u / ms-1

v / ms-1

w / ms-1

T / oC

 

xp Cross product data are held as Matlab savesets (-mat format) to be loaded using the load.m intrinsic function. On load, the following are available for a complete day

 

desp_h

1440 vector

11520 bytes

de-spike ratio

m_h

1440x4 array

46080 bytes

[u,v,w,T]

t_h

1440 vector

11520 bytes

TMatlab

xps_h

1440x10 array

115200 bytes

[uu,uv,uw,uT,vv,vw,vT,ww,wT, TT]

 

ASCII Cross product data

 

Columns

value

 

1-6

yy,mm,dd,hh,min,ss

mean time of 1 minute data record:ss ~30

7-10

mean u,v,w,T

 

11-21

cross products

as for xp file [uu,...,TT]

22

de-spike ratio

 

 

Columns are all floating point, 4 decimal place x.xxxxEsxxx, where x is a number and s a possible sign. Columns are separated by spaces, and lines are terminated by

<c/r> <l/f>