Sunrise sunset

Some 10 or so years ago, I sought some data showing sunrise and sunset times throughout the year for New York and London. I was interested in how the day lengths and sun times differed between the two cities, having lived in both. Below is the result.

It was hardly rocket science. But it showed how New York’s winter days were longer than London’s, while London enjoyed longer summer days. The spikes represented the slightly different times at which our respective clocks changed to summer time.

Roll on ten years, and on a whim I decided to recreate this in Tableau. Before doing so, I didn’t revisit the original work. (“Work” here is a word used in its loosest form.) I vaguely remembered what it had looked like, but in hindsight I didn’t fully appreciate how truly ugly it had been.

So the other evening I set about cobbling together the sunrise and sunset times again. This time, picturing how my Tableau viz might look, I augmented it with some other cities:

  • Glasgow: to see how northerliness within the UK affects things
  • Reykjavík: to see what happens when you go even further north
  • Sydney: to add a representative from the Southern Hemisphere
  • Singapore: as it’s almost exactly on the Equator
  • Rio de Janeiro: because Sydney was looking lonely in the Southern Hemisphere

The first chart was straightforward to create.

A dual axis plot of the sunrise and sunset times by day, coloured by city; and day length below it. Things were complicated slightly by the fact that in summer, Reykjavík’s sun sets after midnight. Only by three minutes or so, but that necessitated a little Excel pre-prep jiggery-pokery.

This is when I got a little excited. The curves were beautiful. I was mesmerised by the fact that the equinoxes saw 12-hour days for the whole world, with the lines all meeting beautifully. I should have known this, but it was nonetheless beautiful to see, something that wasn’t apparent in the ugly Excel viz.

Here, I added a couple of little tricks. I allowed the user to remove the daylight saving element, thus smoothing the lines. I did this by using a time field that I had to convert back to the non-summer equivalent. And I also gave the option of artificially flipping the cities in the Southern Hemisphere to the Northern Hemisphere. This I did by artificially shifting the dates in the first half of the calendar to the second half and vice versa. So viewers could now compare Sydney to London in a meaningful way.

I then decided to throw in an additional viz that showed the height of the sun at its peak, using a full circuit from 1 January at the top, moving clockwise through the year to 1 July at the base and then back to the top. This involved intricate trigonometry and many iterations of buggy code. But it was utterly satisfying when it all came together. I found Singapore (green) carved a fascinating pattern.

And finally, more as an intellectual challenge than something overtly tangible, I decided to add the direction of the sunrise and sunset to the tooltips, not as degrees clockwise from north as the data was pulled, but instead as traditional compass bearings. So instead of giving the user 71°, they are presented with the delectable E 19° N.

The full viz can be found here. Hope you like it.

Battle of the Tipsters: Post-match analysis

I recently filled out a form putting myself forward to participate in the Battle of the Tipsters at April’s #LondonTUG. I wasn’t too worried. After all, I didn’t expect to be selected.

Imagine my surprise a couple of weeks later when David Pires asked me whether I’d like to take up the challenge. I’d filled out a form. How could I now say no?

That’s when the fear kicked in. I’d be up against Lorna Eden. She of Data School fame, herself a champion, not once but twice!

Now I love faffing around in Tableau as much as the next person, assuming I’m sat next to someone who loves faffing around in Tableau. But now I had to come up with five tips. Five tips that stood a chance of beating Lorna’s. But also five tips that would pass muster when presented to a friendly crowd of Tableau Rockstars, future and present.

Were I to lose, I couldn’t go out with a whimper.

So I got my thinking cap on. And I came up with a list. It wasn’t a shortlist, as that would suggest I had drawers full of tips ready to bring out at the drop of a hat. No, it was simply a list. A list of things that most people would know how to do, but that hopefully few people would have thought of doing.

I put some life to my tips by giving them musical names:

  • S’Express: Making your tooltips grammatically correct with a “pluraliser” field
  • Goodie Two Choose: Limiting the selections available within a field
  • Let’s Do The Time Warp Again: Letting the user select the time unit used for reporting
  • Break On Through (To The Other Side): Sorting a field but forcing the “Other” category to the end
  • Total Eclipse of the Chart: Putting summary totals onto a stacked bar chart.

(Videos of each tip can be accessed at the bottom of this post.)

I even created my own UK version of the Superstore data, featuring Asda, Lidl and Waitrose, to make the tips a little more accessible to my local audience.

Then the day itself came. And I was a little nervous, but I quelled the nerves with a beer downstairs beforehand with Ravi Mistry and Mark Edwards.

And then the fun began. And I have to say, the experience was somewhat of a blur. And not because of the beer. For me, presenting the tips required quite deep levels of concentration. At home you can do them in your sleep, and a slip-up is unimportant. Presenting to the London TUG, my main worry was that a slip-up would unravel the tip, making me hit the 1m30s time limit with nothing to show but groans and regrets. My step-by-step instructions sheet was my lifeline.

Lorna’s tips were impressive, but were similarly blurry, as I composed myself for the next round. After each pair of tips, there was audience participation: a cheer and a whoop for each of us to determine that round’s victor. As I was positioned off to one side of the room, it was difficult to tell who won each round, so I had no idea how we stood come the end. As I understand it after the event, it was pretty much a draw and it came down to a tweet-off, which Lorna narrowly won. That said, I think I remember speaking to Liam Butcher from Bose over a drink after the event. Perhaps he could commission some more calibrated and accurate whoop-analysis?

All in all, it was great fun and I pushed my comfort zone. There’s even talk of me figuratively wearing Lorna’s crown to the next London TUG as she may be retiring undefeated. To do that, I’d need five more tips. Yikes!

 

 

 

 

CNTD and Running Totals: A word of caution

I have a business operation in which people call my call centre. The same person can call the call centre more than once, but we can identify the caller from their details using a field called Caller ID.

I need to know how many unique people called the call centre each month (based on Call Date And Time), but I wanted the cumulative number at the end. I.e. What is the total number of people that have called my call centre.

I tried to do this using probably the most obvious way:

  • Drag Call Date And Time to the columns, and select Month. It appears as a blue pill
  • Right-click and drag Caller ID to the rows and select CNTD (count distinct). It appears as a green pill
  • Apply a quick table calculation to the Caller ID pill, making this a Running Sum.

The line goes up nicely, and if I use the same running sum as a label, I can see that the total number of callers to the call centre at the end of the last month is 34,567.

Except that it’s not. It’s somewhat lower than that.

You see, the 34,567 that appears against the last month is the total number of unique callers in month 1, plus the total number of unique callers in month 2, plus … plus the total number of unique callers in the final month.

If Jane Bloggs called up in month 1 and again in month 2, she contributes twice to the 34,567 total. In other words, Tableau sums the distinct counts for each month rather than providing a total unique count to date.

To get around this, we need to create an LOD calculation to create a new field called First Call:

IF {FIXED [Caller Id] : MIN([Call Date And Time])} = [Call Date And Time] THEN 1
ELSE 0
END

This basically marks the first call for each Caller ID with a 1.

Now, instead of using the CNTD(Caller ID) in our rows, we can use SUM(First Call). Again we can apply a table calculation to give ourselves the Running Sum.

This time, the total will be lower than the 34,567, and will reflect the true number of unique callers. And each month’s total will only count the *new* folks that called during that month.

Shipping containers: How and why

A few hours ago, Andy Kriebel posted his latest #MakeoverMonday dataset, information about the largest shipping companies in the world. I expect this was off the back of the bankruptcy of Hanjin, a situation that leaves over 500,000 containers stranded at sea.

I was in need of cheering up. Big time. And so I reached to Tableau a couple of hours ago to see what could be done. My mood was one seeking fun over integrity, and the result clearly showed this.

My first thought was to create some form of bar chart where the bars were containers. After a brief foray into this visualisation, it was clear that this would look rubbish.

I clicked on the dropdown in the Marks card to see what else might be of use. I saw the “Square” option, and got to work.

You see, shipping containers are rectangular in shape. (Technically, the Square mark should be called Rectangle. But that’s another post.) I imagined my visualisation instantly, and got on with creating it.

You can find the visualisation at the bottom of this post. Or else, click here.

There are three images in the background, which in hindsight could have been combined into a single image. (My image manipulation skills are woefully lacking, as evidenced by the use of PowerPoint as my image-editing-tool of choice.)

There is a PNG of the boat. There’s another PNG of the background (sea and sky). And at the top is a single-colour bar to go behind the parameters. This ensures that the shading of the parameters doesn’t look odd atop a vari-shade background.

Each of these is floating, which is quite apt given the subject matter. The boat is literally floating atop the water, in both senses.

And the visualisation itself is as simple as they get. A “Squares” chart, the size of each square driven by the quantity being displayed. A parameter allows the user to select what to chart; a calculated field uses this parameter to drive its own value:

CASE [Chart by]
WHEN “Cargo capacity (total TEU)” then [Total TEU]
WHEN “Cargo capacity (chartered TEU)” then [Chartered TEU]
WHEN “Cargo capacity (owned TEU)” then [Owned TEU]
WHEN “Ships (total)” then [Total Ships]
WHEN “Ships (chartered)” then [Chartered Ships]
WHEN “Ships (owned)” then [Owned Ships]
END

A second parameter allows the user to select how many companies to display, and this drives a “Top X” filter on the Company field.

Like the images, the worksheet floats, and it sits in front of all the rest to ensure that it hides the background. Placing it just above the ship gives the impression of shipping containers. Possibly.

I had a lot of fun with this one. The ability to select the Top X companies makes the visualisation statistically woeful. (The area chart should be a percentage of the total, not of some arbitrary subset.) I knew this, but this was kindly reinforced by Andy Cotgreave 😉

The Tableau community cheered me up no end this evening, something I so dearly needed. Nineteen likes on Twitter gave me a much needed boost. Thank you all! Hope you like it.

Adding the polish

At the weekend, I suffered a serious bout of viz envy. It came at the hands of Jeremy Poole in the delightful viz below, an early response to this week’s Makeover Monday. I’ve embedded it at the end of this post.

The visualisation itself is delightful. I would like to think I’d have come up with a similar concept in terms of its visualisation. (I’d already started putting together a graph on similar lines when I saw his viz.) But beyond this, what strikes me most is the way in which the entire view sits together so beautifully as a single body of work.

The fonts, the callouts, the hover actions, the backdrop, they are all beautiful.

As I develop my Tableau knowledge, I think I’m plugging the gaps in terms of my ability to create evermore involved dashboards that add richness to information. I look back on old dashboards with a slight sense of shame, which I think is a good thing. If I didn’t, I wouldn’t be learning and progressing.

However my prowess when it comes to this level of polishing needs work. I think my brother inherited the vast majority of the family’s creative genes. I am, I feel, more methodical, more analytical and certainly less creative.

With visualisations, I think I do well in getting the right objects on the page, and for them to interact well with one another. However I lack in making them look beautiful, making them sing. I would love to learn how to become better in this.

If anyone knows of any resources that might be able to help in this, please let me know. I can see what’s good and what’s not, so it’s not about learning how to spot goodness. It’s about learning the principles that will allow me to create things that are good in the first place.

St. Totteringham’s Day

Yesterday, Arsenal’s ten men came back to draw against Spurs in arguable the most important North London derby in living memory. Before the match my good friend, who is a Leicester fan, suggested an Arsenal victory would have been an ideal result, as it would have left Leicester three points clear of both North London teams, with a game in hand. I argued that a draw was ideal, as it left fewer serious competitors for the title race.

In looking into the numbers, I discovered the concept of St. Totteringham’s Day. This is the day upon which Tottenham mathematically are unable to beat Arsenal in the Premier League. Given that Arsenal have beaten Spurs for the last 20 seasons, there has been such a day in each of those years.

In these years, the earliest it took place was in 2007/08, when Spurs were unable to surpass their rivals on 8 March 2008. (Spurs finished 11th in the table, to Arsenal’s third.) The latest was on 19 May 2013, the last day of the 2012/13 season, when Arsenal secured third spot, one point ahead of fourth-place Spurs.

Here’s my visualisation of the history. I wonder whether there’ll be such a day in this 2015/16 season.

Creating aliases without breaking your blends

Aliases are lovely. They allow you to see what I’ve always termed “friendly names” for field values as opposed to codes and the like. You may have legal entities that are referenced as numbers in the underlying data but you want to display as the names of those entities to the users. You may have state abbreviations that you want to display in full, or a handful of currency codes that you want to show as symbols or full currency names.

I learnt the hard way that creating aliases can be dangerous. Originally, I had no idea that when blending, Tableau uses the alias values as opposed to the raw values coming in on the data feeds. And I learnt that the hard way.

Let’s say my primary data source contains currency codes (EUR, GBP, USD etc.) and this is blended to a secondary data source that contains FX rates that also use those codes. If I add aliases ($, £, €) to the currency codes in the primary data source, then my blend breaks. (I could fix it by applying the same aliases to the secondary data source. But this is not always efficient or wanted.)

To me, this is undesired behaviour. I see aliases as being purely presentational. If I am using a field in two data sources to blend, I want Tableau to respect the raw value rather than some name I’ve chosen to to display to the user.

There is, however, a simple way round the problem: Duplicate your field.

Use the duplicate field to blend on, and apply your aliases to the original field. That way, the integrity of the original field’s name remains intact, and its aliases can be used in presenting the information to the user.

Allow the user to select which chart to see

I’ve often been asked to produce dashboards whose views change fundamentally based on the choices of the viewer.

For example, a user may want to select whether to show a line chart or a bar chart; or they may want to choose which secondary data source to use as the basis for reporting.

Often, such user requirements can be accommodated by being creative with your worksheet designs. And some of the posts here may help with some of that artistry. But sometimes, you may need to resort to presenting different worksheets depending on the user’s selection

This can be done with a few steps.

First, create two worksheets, one for each of the visualisations that the audience may wish to see: Chart 1 and Chart 2, if you will. Once you’re happy with these views, create a string parameter that allows the user to select which chart to show. And give it a list of values that it can take:

Parameter

And now, create two new calculated fields in the primary data source being used in the two worksheets: Chart 1 selected and Chart 2 selected. The fields will be boolean, giving a TRUE/FALSE value based on whether the appropriate parameter value has been selected.

Chart equation

Now you can use this field as a filter in Chart 1, with a value of TRUE. This will only show the chart if the value of the boolean variable is TRUE, or else if the parameter has been set to Chart 1. (You cannot use a parameter directly as a filter, hence this interim step.)

If you play with the value of the parameter, you’ll see the content of the worksheet appear and disappear. Repeat the above steps for the second chart, and the two worksheets will work in tandem with one another.

Next up, we need to get those charts into a dashboard.

First, they need to sit together in their own container, so you can select either a vertical or horizontal one, either is fine.

Panes

And now you can drag each of the worksheets into that container. Click on the dropdown at the top right corner of either chart to show the parameter control. By playing with the parameter value, you’ll see the respective charts appearing and disappearing.

Now unfortunately, the title of a worksheet appears regardless of whether the content of that worksheet is present, so you’ll need to hide the title on both sheets to ensure that the hidden worksheet fully disappears. If you want a title that’s consistent for the two views, then you can put this in as a Text component (see the above selector) above the container you’ve just created. If you want one that varies depending on the chart chosen, you can use a similar technique as this one to create automatically changing titles that sit in their own container.

And there you have it. The user can select a parameter value and the chart that they’re presented with will change in front of their eyes.

Allowing the user to select what to report

Of all of the things I enjoy doing in Tableau, perhaps my favourite is allowing the user to choose what information they want to see.

Do they want to see a report showing the number of sales? Or do they want to see the total value of those sales? Do they want to see the data listed by salesperson or by customer or by region?

This is where parameters are your friend. Perhaps even your BFF.

The solution involves two steps. First, you need to create a parameter that allows the user to select their field of choice.

So first, the parameter. Create a string parameter that allows the user to select which parameter they want to see in the view.

Parameter

Here, I’ve create a parameter that allows the values “Number of invoices” or “Value of invoices”.

Next, I need to create a calculated field within my data set that uses that parameter to determine its value.

Equation

 

So this is simply using a CASE statement against our new parameter to figure out how to populate the new field “Reporting field”.

Now, you just need to use the new field in your views and make the parameter selectable by the user. When the user changes the parameter, your views will update automatically.

An identical approach can be used for alphanumeric fields. Instead of populating with the reporting field with the number of invoices or the invoice value, you could choose to make it a string field and populate it with either region or salesperson. And you’d change the permitted values of your parameter accordingly.

One limitation of this approach is that the display formatting of the two fields cannot be different. For example, I can’t have the number of invoices display as a number (1,234) and the value of the invoices display as a high currency amount (e.g. £1,234m).

You can do this in labels and tooltips via an additional calculated field:

IF [What to report] = “Value of invoices” THEN “£” ELSE “” END

Add this new field into your label or tooltip immediately after the “Reporting field” value, and it’ll slap an “m” after the number whenever the invoice value is chosen by the user.

To get the axes to be formatted differently based on the user’s selection, you instead need two completely separate worksheets, identical apart from the value that they report. A Display toggle field will then allow you to choose which one to show in your dashboard.

More on that another approach time.

Sorting by a field in a secondary data set

One of my biggest frustrations with Tableau is the limited facility that comes when linking to a secondary data set. One of these is the inability to sort fields from the secondary data source.

Maybe I have my invoiced amounts in my primary data source by relationship reference, and a secondary data source that tells me the salesperson for each relationship reference.

And I want to report the total amount invoiced by salesperson.

I can do that quite simply. I drag the invoice amount into the body of the view from the primary data source. I then flip to the secondary data source, click the chain link to make the data sets linked, and drag the salesperson into my rows.

Now if salesperson had been in the primary data source, the sort buttons in the toolbar would be clickable. Or if you clicked the dropdown arrow on the Salesperson pill in the rows, a Sort option would have been available. We could have sorted the view by amount, showing the salesperson with the highest total invoice amount at the top.

But because the field has been brought from the secondary data set, we’re presented with this.

Grey sort

And this.

Salesperson

And so we can’t sort.

Although perhaps we can, with a little hack.

If you want to sort in descending order, then create a calculated field in your primary data source that which is simply the negative of the field being reported

Equation

Now drag that field into the rows next to your Salesperson. As it is a measure, it’ll be treated as being continuous and will be green, and will only be allowed to the right of Salesperson. And because it’s coming from a different data source, it’ll appear as an aggregate field.

Now click the new pill’s dropdown and change it to discrete. It’ll turn blue, and you can now reorder the two row fields so that the aggregate variable appears first.

Rows

And now, because you have a field from the primary data source in your rows, you can click the dropdown on the AGG([Inv Amt (neg)]) field and you’ll see a Sort menu option.

Click this and you’re presented with only two options. You can either sort by the data source or you can manually configure the sort order based on the values you see. Most of the time you’ll want to sort by data source. (It only allows low to high sorting here, not high to low. Hence our need to create a negative version of our variable.)

So by sorting in ascending order by a negative field, it’s the equivalent of sorting a positive field in descending order.

Your first column headings will be the negative versions of the values in the table. Now you’ll probably want to hide this first column, so you can simply make the text white and narrow the column to its narrowest possible width.

And there you have it.

Note: if there’s any likelihood that two values in the table will be the same, you’ll need to make them subtly different. I often do this by adding a very small but variable to the calculated field that you just created. I often do this by finding a couple of numbers that are generally quite large and fluid in the data sets and dividing 1 by their product:

1/([Sales volume] * [Outstanding amount]

If you add this to the calculated field that you calculated earlier, then the two figures will be different by a fraction of a penny, and they’ll appear as separate ungrouped items in the table.