Adding the polish

At the weekend, I suffered a serious bout of viz envy. It came at the hands of Jeremy Poole in the delightful viz below, an early response to this week’s Makeover Monday. I’ve embedded it at the end of this post.

The visualisation itself is delightful. I would like to think I’d have come up with a similar concept in terms of its visualisation. (I’d already started putting together a graph on similar lines when I saw his viz.) But beyond this, what strikes me most is the way in which the entire view sits together so beautifully as a single body of work.

The fonts, the callouts, the hover actions, the backdrop, they are all beautiful.

As I develop my Tableau knowledge, I think I’m plugging the gaps in terms of my ability to create evermore involved dashboards that add richness to information. I look back on old dashboards with a slight sense of shame, which I think is a good thing. If I didn’t, I wouldn’t be learning and progressing.

However my prowess when it comes to this level of polishing needs work. I think my brother inherited the vast majority of the family’s creative genes. I am, I feel, more methodical, more analytical and certainly less creative.

With visualisations, I think I do well in getting the right objects on the page, and for them to interact well with one another. However I lack in making them look beautiful, making them sing. I would love to learn how to become better in this.

If anyone knows of any resources that might be able to help in this, please let me know. I can see what’s good and what’s not, so it’s not about learning how to spot goodness. It’s about learning the principles that will allow me to create things that are good in the first place.

St. Totteringham’s Day

Yesterday, Arsenal’s ten men came back to draw against Spurs in arguable the most important North London derby in living memory. Before the match my good friend, who is a Leicester fan, suggested an Arsenal victory would have been an ideal result, as it would have left Leicester three points clear of both North London teams, with a game in hand. I argued that a draw was ideal, as it left fewer serious competitors for the title race.

In looking into the numbers, I discovered the concept of St. Totteringham’s Day. This is the day upon which Tottenham mathematically are unable to beat Arsenal in the Premier League. Given that Arsenal have beaten Spurs for the last 20 seasons, there has been such a day in each of those years.

In these years, the earliest it took place was in 2007/08, when Spurs were unable to surpass their rivals on 8 March 2008. (Spurs finished 11th in the table, to Arsenal’s third.) The latest was on 19 May 2013, the last day of the 2012/13 season, when Arsenal secured third spot, one point ahead of fourth-place Spurs.

Here’s my visualisation of the history. I wonder whether there’ll be such a day in this 2015/16 season.

Creating aliases without breaking your blends

Aliases are lovely. They allow you to see what I’ve always termed “friendly names” for field values as opposed to codes and the like. You may have legal entities that are referenced as numbers in the underlying data but you want to display as the names of those entities to the users. You may have state abbreviations that you want to display in full, or a handful of currency codes that you want to show as symbols or full currency names.

I learnt the hard way that creating aliases can be dangerous. Originally, I had no idea that when blending, Tableau uses the alias values as opposed to the raw values coming in on the data feeds. And I learnt that the hard way.

Let’s say my primary data source contains currency codes (EUR, GBP, USD etc.) and this is blended to a secondary data source that contains FX rates that also use those codes. If I add aliases ($, £, €) to the currency codes in the primary data source, then my blend breaks. (I could fix it by applying the same aliases to the secondary data source. But this is not always efficient or wanted.)

To me, this is undesired behaviour. I see aliases as being purely presentational. If I am using a field in two data sources to blend, I want Tableau to respect the raw value rather than some name I’ve chosen to to display to the user.

There is, however, a simple way round the problem: Duplicate your field.

Use the duplicate field to blend on, and apply your aliases to the original field. That way, the integrity of the original field’s name remains intact, and its aliases can be used in presenting the information to the user.

Allow the user to select which chart to see

I’ve often been asked to produce dashboards whose views change fundamentally based on the choices of the viewer.

For example, a user may want to select whether to show a line chart or a bar chart; or they may want to choose which secondary data source to use as the basis for reporting.

Often, such user requirements can be accommodated by being creative with your worksheet designs. And some of the posts here may help with some of that artistry. But sometimes, you may need to resort to presenting different worksheets depending on the user’s selection

This can be done with a few steps.

First, create two worksheets, one for each of the visualisations that the audience may wish to see: Chart 1 and Chart 2, if you will. Once you’re happy with these views, create a string parameter that allows the user to select which chart to show. And give it a list of values that it can take:

Parameter

And now, create two new calculated fields in the primary data source being used in the two worksheets: Chart 1 selected and Chart 2 selected. The fields will be boolean, giving a TRUE/FALSE value based on whether the appropriate parameter value has been selected.

Chart equation

Now you can use this field as a filter in Chart 1, with a value of TRUE. This will only show the chart if the value of the boolean variable is TRUE, or else if the parameter has been set to Chart 1. (You cannot use a parameter directly as a filter, hence this interim step.)

If you play with the value of the parameter, you’ll see the content of the worksheet appear and disappear. Repeat the above steps for the second chart, and the two worksheets will work in tandem with one another.

Next up, we need to get those charts into a dashboard.

First, they need to sit together in their own container, so you can select either a vertical or horizontal one, either is fine.

Panes

And now you can drag each of the worksheets into that container. Click on the dropdown at the top right corner of either chart to show the parameter control. By playing with the parameter value, you’ll see the respective charts appearing and disappearing.

Now unfortunately, the title of a worksheet appears regardless of whether the content of that worksheet is present, so you’ll need to hide the title on both sheets to ensure that the hidden worksheet fully disappears. If you want a title that’s consistent for the two views, then you can put this in as a Text component (see the above selector) above the container you’ve just created. If you want one that varies depending on the chart chosen, you can use a similar technique as this one to create automatically changing titles that sit in their own container.

And there you have it. The user can select a parameter value and the chart that they’re presented with will change in front of their eyes.

Allowing the user to select what to report

Of all of the things I enjoy doing in Tableau, perhaps my favourite is allowing the user to choose what information they want to see.

Do they want to see a report showing the number of sales? Or do they want to see the total value of those sales? Do they want to see the data listed by salesperson or by customer or by region?

This is where parameters are your friend. Perhaps even your BFF.

The solution involves two steps. First, you need to create a parameter that allows the user to select their field of choice.

So first, the parameter. Create a string parameter that allows the user to select which parameter they want to see in the view.

Parameter

Here, I’ve create a parameter that allows the values “Number of invoices” or “Value of invoices”.

Next, I need to create a calculated field within my data set that uses that parameter to determine its value.

Equation

 

So this is simply using a CASE statement against our new parameter to figure out how to populate the new field “Reporting field”.

Now, you just need to use the new field in your views and make the parameter selectable by the user. When the user changes the parameter, your views will update automatically.

An identical approach can be used for alphanumeric fields. Instead of populating with the reporting field with the number of invoices or the invoice value, you could choose to make it a string field and populate it with either region or salesperson. And you’d change the permitted values of your parameter accordingly.

One limitation of this approach is that the display formatting of the two fields cannot be different. For example, I can’t have the number of invoices display as a number (1,234) and the value of the invoices display as a high currency amount (e.g. £1,234m).

You can do this in labels and tooltips via an additional calculated field:

IF [What to report] = “Value of invoices” THEN “£” ELSE “” END

Add this new field into your label or tooltip immediately after the “Reporting field” value, and it’ll slap an “m” after the number whenever the invoice value is chosen by the user.

To get the axes to be formatted differently based on the user’s selection, you instead need two completely separate worksheets, identical apart from the value that they report. A Display toggle field will then allow you to choose which one to show in your dashboard.

More on that another approach time.

Sorting by a field in a secondary data set

One of my biggest frustrations with Tableau is the limited facility that comes when linking to a secondary data set. One of these is the inability to sort fields from the secondary data source.

Maybe I have my invoiced amounts in my primary data source by relationship reference, and a secondary data source that tells me the salesperson for each relationship reference.

And I want to report the total amount invoiced by salesperson.

I can do that quite simply. I drag the invoice amount into the body of the view from the primary data source. I then flip to the secondary data source, click the chain link to make the data sets linked, and drag the salesperson into my rows.

Now if salesperson had been in the primary data source, the sort buttons in the toolbar would be clickable. Or if you clicked the dropdown arrow on the Salesperson pill in the rows, a Sort option would have been available. We could have sorted the view by amount, showing the salesperson with the highest total invoice amount at the top.

But because the field has been brought from the secondary data set, we’re presented with this.

Grey sort

And this.

Salesperson

And so we can’t sort.

Although perhaps we can, with a little hack.

If you want to sort in descending order, then create a calculated field in your primary data source that which is simply the negative of the field being reported

Equation

Now drag that field into the rows next to your Salesperson. As it is a measure, it’ll be treated as being continuous and will be green, and will only be allowed to the right of Salesperson. And because it’s coming from a different data source, it’ll appear as an aggregate field.

Now click the new pill’s dropdown and change it to discrete. It’ll turn blue, and you can now reorder the two row fields so that the aggregate variable appears first.

Rows

And now, because you have a field from the primary data source in your rows, you can click the dropdown on the AGG([Inv Amt (neg)]) field and you’ll see a Sort menu option.

Click this and you’re presented with only two options. You can either sort by the data source or you can manually configure the sort order based on the values you see. Most of the time you’ll want to sort by data source. (It only allows low to high sorting here, not high to low. Hence our need to create a negative version of our variable.)

So by sorting in ascending order by a negative field, it’s the equivalent of sorting a positive field in descending order.

Your first column headings will be the negative versions of the values in the table. Now you’ll probably want to hide this first column, so you can simply make the text white and narrow the column to its narrowest possible width.

And there you have it.

Note: if there’s any likelihood that two values in the table will be the same, you’ll need to make them subtly different. I often do this by adding a very small but variable to the calculated field that you just created. I often do this by finding a couple of numbers that are generally quite large and fluid in the data sets and dividing 1 by their product:

1/([Sales volume] * [Outstanding amount]

If you add this to the calculated field that you calculated earlier, then the two figures will be different by a fraction of a penny, and they’ll appear as separate ungrouped items in the table.

The GCloud Dashboard

Far and away my most popular dashboard is my GCloud dashboard, with over 4,000 clicks to date. It’s gone through various iterations over the last 18 months or so, but its primary offering has pretty much settled down into something that I think is flexible and summarises quite well the underlying data. It can be found here.

The GCloud framework was set up by government in 2012 as a way of disaggregating its IT provision. Its intent was to open up provision to a wider set of small and medium-sized enterprises, and become less reliant on the traditional big IT suppliers.

And as its name suggests, its intent was to bring about a much more cloud-based approach to IT provision.

The underlying data is published monthly (usually on around the 20-something of the month for the prior month) and can be found here.

The dashboard itself uses the feature that allows the user to choose their time unit for reporting. Otherwise, it’s a relatively basic presentation of the data. Perhaps it could do with another overhaul. But even so, I think it shows how GCloud has evolved, and gives the user the flexibility to drill down on any aspect.

My friend Alan posted earlier today his interpretation of the data here, which is well worth a read. It’s lovely to see someone presenting commentary based on cuts of data drawn from your own dashboard.

How this monkey discovered Tableau

I started working part-time as a contractor for a small public service company associated with the telecoms industry in January 2013. My role was to lead on all aspects of reporting – internal reporting to understand the overall health of our operations; and external reporting to our stakeholders and shareholders against formally measured KPIs and SLAs.

At first, I cobbled together some stuff in the world of Excel that I know and love. But it was clear that something more strategic would be needed to allow more dynamic, automated and strategic reporting. So we conducted an evaluation of three packages that at the time were all unknown to me: Tableau, Qlikview and MicroStrategy.

Tableau came out on top against our requirements, and so we bought it in May 2013.

Perhaps the most grating aspect of the product was its use of the terms rows and columns. In the spreadsheet world whence I’d come, these terms had a very different meaning to their meaning in the new Tableau world. (In some respects, “across” and “up” might be more suitable terms. Maybe that’s for another post.)

We had two days of dedicated training from the inimitable Emiliano. This was useful to a degree, but in some respects things hadn’t yet gelled with the product. As a newbie, the package was rather daunting, and perhaps some of the training was too advanced for us at the time, coming so soon after we had started using the package.

That said, if nothing else it gave us a huge insight into what the tool was capable of. We went live on Tableau 8.0 and went through a couple of minor upgrades before upgrading to version 9.0 in summer 2015. (Note to self: Must get round to upgrading to 9.2.)

I was doing a couple of days a week with this client, the other three days of my week being spent doing business analysis for a large bank: specifying data feeds into a billing system, and managing various migrations onto the new system.

But about a year ago, I heard that the bank was using Tableau for reporting, and so I threw my hat into the ring and developed a very rudimentary dashboard to show what might be possible to report on their business area. (My having now used the product for 18 months, I think they liked what they saw.)

And the rest is history. Nowadays, much of the time I spend with the bank is focused on developing and enhancing dashboards, introducing new elements, pushing the boundaries (at least my boundaries) of the software. I find it thoroughly rewarding. And I continue to use Tableau daily on the Telecoms contract, enhancing the existing offering andintroducing new stuff. Having recently implemented Alteryx there, I’m keen to use this to enhance the way in which data is presented to Tableau to make things more efficient and to take the offering to the next level.

For the last 27 years (yes, literally), whenever I’ve received some data, my automatic instinct has been to open my spreadsheet package of choice and play: Lotus 1-2-3 for five years, then a short period of Quattro Pro, and 21 years of Excel. Nowadays, when I receive some structured data, I invariably throw it into Tableau and try to make it sing.

My view is that working in Tableau with two clients – operating in completely different fields – makes me better. I encounter problems producing something for the bank that I learn how to solve, and these then benefit my telecoms client; and vice versa.

As for Tableau, I adore the software. I adore the company and its ethos. And I adore its community. And long may I continue to use it.

Aside

Polishing your dashboard: Tidy your tooltips and axis titles

If you want to polish your dashboard, make sure your tooltips and axis titles are tidy.

Tableau will always create default values for these, but these are often ugly, particularly if you’ve got table calculations going on.

Before you publish your dashboard, go into each sheet and check the titles of your axes. And make sure the structure of your tooltips makes sense from a user perspective.

I find that these two pieces are often overlooked. I’ve been known to overlook them myself.  Getting them right will likely pass your users by. But getting them wrong will ensure they don’t forget, for all the wrong reasons.

DATETRUNC by decade

I recently wrote a post describing how to allow users to decide the level at which their time-based data was displayed: year, quarter, month, week, day etc.

The solution uses the DATETRUNC function. However one weakness of this is that it does not allow decade-based reporting. The DATETRUNC function only goes as high as the YEAR unit.

This evening, I had a desire to allow the user to report at either year or decade level. So I set out to tackle the problem.

In the parameter field (described in the aforementioned post), I added the ability to select Decade. And I used the following formula to define the reporting field:

if [Date unit] = “decade” then str(year([Date])-(year([Date]) % 10))+”s”
elseif [Date unit] = “year” then str(year([Date]))
end

The initial clause is invoked when when Decade is selected. The last part, featuring the % symbol, is the equivalent of the MOD function in Excel. It divides the year of the date by 10 and brings back the remainder.

So if you have a date in the year 1987, it’ll do the calculation 1987 – 7, bringing back 1980. The “s” is strung onto the end to make “1980s.

If you need the field to appear as and be treated as a date, then things get a little more tricky. Unfortunately, you can’t use the DATETRUNC feature against the parameter, as whenever Decade is selected, it’ll break, even if that clause isn’t being invoked. (Rather annoying.) And so you’ll need to use an IF THEN ELSEIF to populate based on each of the parameter’s values:

IF [parameter] = “Week” then DATETRUNC(‘week’,[Date])

ELSEIF [parameter] = “Month” then DATETRUNC(‘month’,[Date])

ELSEIF [parameter] = “Year” then DATETRUNC(‘year’,[Date])

ELSEIF [parameter] = “Decade” then date(“01/01/”+str(year([Date])-(year([Date]) % 10)))

END

Not the most elegant. But it should do the trick.