TIBCO Spotfire: A comprehensive primer
Insights - Data Preparation

On this page, I provide insight into how I prepared some of the data for the examples in the book.

Baseball Team Data
The task to transform the baseball player data table into a baseball team data table was very straightforward.

I used the pivot tool in Spotfire on the BaseballPlayerData table, selecting [Team] and [League] as row identifiers and then aggregating the other columns.
The result is a new table with all the players removed and the statistics now representing each team (summed in the case of At Bats, although you could choose to average or even take the minimum or maximum).
Money Market Interest Rates
I had a bit more to do to transform the money market interest rates data that I downloaded from Eurostat into the form I wanted for the line chart example. I faced three issues: a) the dates were in an unusable string form, there were more countries and geographical entities than I needed, and there were multiple types of interest rate (1-month, 12-month, etc. I made all the manipulations using Spotfire.
The first task was to filter for just three countries: Czech Republic (not shown on the screenshot), Sweden, and United Kingdom. While I was at it, I also filtered to include just the 1-month rate.
Then it was time to tackle the date, and I converted it using a calculated column in Spotfire:
Date(Integer(left([TIME], 4)), Integer(right([TIME], 2)), 1)
The Date function looks for year, month, day. The left([TIME]), 4) operation simply gets the leftmost four characters of the value, giving us the year. The right([TIME], 2) operation gets the two final digits from the value, giving us the month. The integer conversion is required by the Date function. Finally, I chose the first of the each month to complete the date.
The final task was to change the column names and export the data to a csv or Microsoft Excel file.
Hourly Temperature Data (NY)
The hourly temperature data set for New York's JFK airport that I downloaded from the United States' National Climatic Data Center required similar manipulation to get it into shape. Again, I made all the manipulations using Spotfire. The raw data set included a datetime column formatted in a form that wasn't directly usable and a temperature column formatted as an integer, rather than a decimal.
I formatted the date using the following expression:
DateTime( Integer(left([DATE],4)), Integer(Mid([DATE],5,2)), Integer(Mid([DATE],7,2)), Integer(Mid([DATE],10,2)), Integer(right([DATE],2)),0,0)
The idea was to parse out the individual year, month, day, hour, and minute elements in the string in the raw data and convert to a datetime.
The temperature string was a bit easier to deal with using a simple calculated column, which included a Fahrenheit to Centigrade conversion:
(([HLY-TEMP-NORMAL] / 10) - 32) * 5 / 9
Hourly Temperature Data (U.S. states and territories)
The hourly temperature data set for U.S. states and territories that I downloaded from the United States' National Climatic Data Center was a little more challenging. There was a station reference, but I had no idea where each station was located. There was a column for month and day and then a column for each hour in the 24 h clock—in other words a wide table. I wanted a narrow table. Finally, the temperatures were in Fahrenheit and expressed as strings with qualifying flags. I was able to make all the manipulations I needed to make using Spotfire.
I downloaded a station inventory file to get the station location, but that file provided a state abbreviation and the "state" included unfamiliar (to me) U.S. overseas territories. It did give me elevation, longitude, and latitude, which proved useful when I looked at the map chart.
I downloaded a U.S. postal code data file to translate the state abbreviations into state names.
The next step was to unpivot the data set into a narrow format, ie, with a single datetime column and a single temperature column. The unpivot was done using Spotfire's unpivot transformation.
The unpivot transformation put the data into the structure I wanted, but there were still a few things to do. I had to convert the month, day, and hour columns into a datetime column. I created a hiertarchy:
[Hour] AS [Hour]

The temperature column was similar to the one I found in the NY data set except I had to deal with those flags (that "C" has nothing to do with Centigrade by the way) and the special case of -9999, which was used to indicate the absence of a recorded temperature. I used the following calculated column to translate the column into a digital form:
when [Value]="-9999" then null
else Real(Integer(Mid([Value],1,Len([Value]) - 1)) / 10)

Finally, I wanted the temperature in Centigrade: ([Temperature F] - 32) * 5 / 9
These changes gave me a month>day>hour hierarchy against which I could analyze temperature.