Читать книгу Tableau Your Data! - Murray Daniel G. - Страница 14

Part I
Desktop
Chapter 2
Connecting to Your Data

Оглавление

“I think a manager’s world is not black and white. It’s a world filled with uncertainties and dilemmas. The sort of thing that would leave any neophyte moaning, “What the Hell is this.”

Gordon Mackenzie8

It would be nice if all the data you needed to access resided in one place, but it doesn’t. Your data is scattered over multiple databases, text files, spreadsheets, and public services. Tableau’s ability to directly connect to a wide variety of data sources makes it easier to analyze data residing in different places. At the time of this writing, fifty different connectors are available in Tableau for Windows (twenty-two connectors for the Mac version). You can analyze spreadsheets, public data tools, analytic databases, Hadoop, and a large variety of general-purpose databases as well as data cubes.

What You Will Learn in This Chapter

Tableau Software has made the user interaction easier for data connections, joins, and data blending. A new Connect Page persists in the worksheet view that provides easier access to all of your data connections. Joining tables is now a more visual experience, and the new Data Interpreter provides easy tools for dealing with poorly formatted spreadsheet and text file data sources. The Data Interpreter provides a better way to deal with typical spreadsheet problems. Readers of the first edition of this book will find a lot of new content in this chapter.

We start by introducing the Connect Page in the context of connecting to a local file. Then you will learn about connecting to databases and cloud data sources. After that, we discuss values that Tableau generates when you connect to any data source. You learn the difference between a direct connection to a data source and using Tableau’s data extract engine, as well as the advantages of each type of connection.

If you spend a lot of time working with data, you know that data preparation normally takes a lot more time than the actual data analysis. In that context, we will introduce Tableau V9.0’s new Data Interpreter by working with some spreadsheet data downloaded from the U.S. Census Bureau. Then you learn how to perform joins between different tables in a database or different tabs in a spreadsheet. Then you learn how to blend data from different data sources in a single visualization.

How to Connect to Your Data

The most fundamental skill in Tableau is connecting to your data. You can connect to local files on your computer, database files on servers, and public data sources in the cloud. In this section, you learn more about the details of the connecting to different types of data sources. Before you start working through connection examples, let’s look at one part of the Start page that we didn’t open in Chapter 1. Clicking the More Servers option opens the expanded pane of database connection possibilities. Figure 2-1 shows the Windows and Mac editions of Tableau Desktop versions.


Figure 2-1: More Servers pane


The exposed list of connections includes databases, data cubes, and cloud-based services. To access a database, you must install the driver particular to the data source. Installation normally takes a couple of minutes. You can find the drivers at www.tableausoftware.com/support/drivers.

Frequently used databases will appear in the list of sources included in the main Connect pane below the (To a Server) section. The arrangement of the connections is dynamic and dependent on the frequency of usage. On the right side of Figure 2-1, the Google Analytics connection appears second and then Google BigQuery. That is because I connected to Google Analytics yesterday while in Tableau Desktop for Mac to analyze website activity. I also used Google BigQuery for some additional analysis.

If Tableau doesn’t provide a dedicated connector for a database you want to analyze, try the Other Databases (ODBC) option. That connection utilizes the Open Database Connectivity standard.

The saved data sources area you see at the bottom left of the Connect pane displays data sources that you have saved for easy access. Tableau also provides some sample training data sources in this area by default. The exact number and type will depend on the version of Tableau Desktop (Windows or Mac).

You learn how to save a data source later in this chapter. Saved data source files (.tds) are found on your computer’s hard disk in the data sources directory under the My Tableau repository. If you are logged into Tableau Server, you may also see saved data sources on your server’s repository. Next you’ll learn how to make a connection to a local file on your computer.

Connecting to Desktop Sources

Now you will connect to an Excel spreadsheet data source. The files used in this example and in the examples for the rest of the chapter can be found on the book’s companion website. See Appendix F, “Companion Website,” for the address. Download the Chapter 2 file and put it in a folder on your computer. Open Tableau Desktop and click the Excel option. This exposes the Open window, providing a view of the directories on your computer. It should look like Figure 2-2.

Navigate to the location where you downloaded the Superstore for TYD2 spreadsheet and select the file; then click the Open button you see in the lower-right side of Figure 2-2 to connect the spreadsheet file to Tableau. Doing this establishes the connection. You should now see the Connect Page in Figure 2-3.


Figure 2-2: Open file window


Figure 2-3: The Connect Page


Look at the left pane. The items below the sheets area are individual tabs that are contained in the spreadsheet. Double-clicking the Orders sheet will cause that table to appear in the join area. You can also drag and drop the Orders sheet into the join area. Doing this establishes a live connection to the sheet in your workbook. Save your work. Let’s pause for a moment and go through the contents of the Connect Page in more detail.

Understanding the Connect Page

The Connect Page replaces the Connect to Data screen and connection window used until the release of Tableau Desktop V8.2 in 2014. It provides a more visual interface for connecting to data and joining tables. It also centralizes other tools for analyzing the contents of the data, performing data extracts, and restructuring data. You will learn how to use the new data cleaning features later in this chapter in the “The Data Interpreter” section.

The Left Pane Area

On the left side of the page, you can rename the data source connection and view the related sheets contained in the data source. At the top left of Figure 2-3, you see the connection currently in view. The connection has been renamed as Orders (Superstore for TYD2). You can rename the connection at the top left of the page by entering your own text and pressing Enter.

Immediately below in the Workbook area you see the filename of the data source. The sheet area contains the individual worksheet tabs from the spreadsheet data source. If the data source is a database, tables contained in the database would be listed. To see the contents of a particular sheet, look at Figure 2-4.

Hovering your mouse over the sheet of interest exposes the View Data icon to the right of the sheet name. Clicking that icon will open a tabular view of the data. This is similar to the Preview area but allows you to see the data before you place the sheet in the join area.

Connection Options

The upper-right area in Figure 2-3 contains connection options. Tableau uses a live connection to the data source by default. Clicking the Extract radio button enables you to exact the data into Tableau’s proprietary data engine. Using an extract can significantly improve performance. It also allows you to view database files remotely even if you don’t have access to the Internet. Extracting compresses the data and allows you to save the data extract on your computer. I’ll cover the details about data extracts later in this chapter.


Figure 2-4: Viewing sheet content


To the right of the connection area is the Filters option. Selecting the Add option allows you to filter the data source. A running tally of the filters applied is also displayed. By filtering a direct connection, you may improve performance by eliminating unneeded data from your analysis.

The whitespace containing the orders sheet in Figure 2-3 is the join area. Starting with desktop V8.2, Tableau made defining joins a more visual experience. This approach makes the concept of joining tables more accessible to new users. You learn about the nuances of joining tables later in this chapter.

The Preview area in the bottom half of the page displays the rows and columns of the sheets placed in the join area. The two small icons at the top left of the Preview area permit you to toggle between the Data Source view visible in the Preview area in Figure 2-3 and the Manage metadata view in Figure 2-5. The Show aliases and Show hidden fields check boxes allow you to display or not display renamed or hidden items in the preview.

The Data interpreter helps you deal with problematic source data. It is turned on by clicking the button above the Preview area as shown in Figure 2-3. The Rows preview box on the top-right side of the Preview area displays the row count contained within the data source. If you are connected to a very large dataset, Tableau will initially limit the row count to 10,000 records. You can enter any number you want as the upper limit in the Rows text box.


Figure 2-5: Manage metadata view


Saving Data Sources and Workbooks

Saving a data source adds the connection to your Start page at the bottom of the Connect pane. Saving a workbook combines the data source connection metadata and all of the work you have done in one file, the workbook. You will be saving workbooks much more often than data sources. Once you’ve established saved data connections to the sources you use frequently, you won’t have to edit them very often.

Saving a Data Source

There are a few steps required to save a data source. Figure 2-6 shows the menu options required.


Figure 2-6: Saving a data source


Clicking the Sheet 1 tab in the lower left of the Data Source page takes you to the worksheet. Once there, point at the data pane displaying the data source. In Figure 2-6, the filename is Dans Superstore TYD2. The following are the steps required to save the data source:

1. Point at the data connection and right-click to expose the menu.

2. Select the Add to Saved Data Sources menu option.

3. Click the Save button in the Add to Saved Data Sources window.

Notice in Figure 2-6 that the data source is being saved under the name assigned earlier. It was saved to the data sources directory on my computer as a Tableau Data Source (*.tds) file. Now, whenever I open Tableau Desktop, my Start page will include this data source, as you see in Figure 2-7.

The Start page includes the Dans Superstore TYD2 connection, but my workbook (Tableau – Book 1) is not saved yet.

Saving a Tableau

Before saving the workbook, drag the Product Category field from the Dimensions shelf to the Rows shelf. Then drag the Sales field from the Measures shelf to the Columns shelf. You should now have a bar chart in the view. Figure 2-8 shows one way to save the workbook.


Figure 2-7: Saved data sources


Figure 2-8: Saving the workbook


You can select File and then Save from the menu or just click the Save icon. Name the file “My first workbook” and then click the Save button, as you see in Figure 2-8. You have now saved your work as a Tableau Workbook (*.twb) file. In this scenario, you not only save the connection metadata but the work you’ve done in your workbook. Next, let’s look at how you can connect to a database that resides on a server.

Connecting to a Database

Databases have an additional level of security – requiring you to enter a server name and user credentials to access the data. The username and password you enter are assigned in the database, meaning the security credentials and the amount of access granted are controlled by the database – not Tableau. Figure 2-9 shows the connection window for a MySQL database.


Figure 2-9: Connection to MySQL


If you need to access a database source and you don’t know the server address, your username, and your password, you must request that information from your database administrator. After you receive the requested server connection information, you will access the same Connect Page you worked with in the previous example in Figure 2-9, but you will probably have a much larger number of tables displayed under the sheets section. Finding the tables you need for your analysis is facilitated by the Search filter.

Tableau’s manual provides specific details and screenshots for accessing all of the data sources that you can connect to. Figure 2-10 is a screenshot of the online manual.

Because Tableau frequently adds new connectors, the online manual is the best source of information regarding data connections. Go to Help ⇒ Open Help ⇒ Supported Data Sources in Tableau’s online manual to find specific details on connecting to different data sources. Click the Technical Specifications link in the manual to find the latest drivers for each database.


Figure 2-10: Tableau online manual


Connecting to Cloud Services

The increasing quantity and variety of data available on the Internet falls into three categories:

• Public domain datasets

• Commercial data services

• Cloud database platforms

The United State Census Bureau provides free population and business data. The World Bank provides a wide variety of country data, and many other public data repositories have emerged over the past decade. This data can be used to augment your proprietary data.

There are also a growing number of commercial data sources. Tableau currently provides connectors to

• Google Analytics

• Google Big Query

• Amazon Redshift

• Salesforce.com

• Open Data Protocol (ODATA)

• Microsoft Windows Azure Marketplace

The Google Analytics connector can be used to create customized click-stream analysis of web traffic. Google Big Query and Amazon Redshift connectors allow you to leverage storage and data processing services offered by Google and Amazon. Both enable you to purchase petabyte-scale database processing capacity on demand. There is also a connector for the popular cloud-based CRM tool, Salesforce.com, as well as the related Salesforce data products (Force.com and Database.com). Microsoft supplies data over the web via the Windows Azure Marketplace DataMarket and was the founding developer of OData (Open Data Protocol), facilitating the creation and consumption of REST APIs.

Connecting to BigQuery

Let’s look at one of the cloud database platforms, Google BigQuery. The connection screens are displayed in Figure 2-11. You can read all of the details about connecting to BigQuery in the help manual, but the setup requires only two steps. First you log in to your Google account; then you authorize Tableau to access your account.


Figure 2-11: BigQuery connection


I’ve used BigQuery to build dashboards that have over 100 million records, with less than two-second latency using a standard Internet connection. More businesses are using cloud data services because they are secure and reliable and perform well in many use cases. Tableau Software also provides a free cloud-based service, called Tableau Public, for publishing workbooks and dashboards.

Tableau Public

Tableau Public is the largest deployment of Tableau Server in the world. Thousands of people use it to share dashboards and visualizations with others. Figure 2-12 shows an example dashboard published on Tableau Public that was used in a blog post.


Figure 2-12: Tableau Public


After you sign up for a free account, two steps are required to share a dashboard on Tableau Public: Build a workbook that includes visualizations and dashboards and then publish the content you wish to share to your Tableau Public account.

The left side of Figure 2-12 shows a dashboard in Tableau Desktop and the menu options for publishing. To publish a workbook, go to the Server menu, select Tableau Public, and then pick the Save to Tableau Public menu option. If you haven’t logged into the service, you’ll be prompted to log in; then you can define exactly what parts of your workbook you want to publish, along with other options. The right side of Figure 2-12 shows the dashboard in the Tableau Public environment. Note the Share menu option at the bottom right of the image. Clicking that option exposes the embed code and link. If you want to embed the dashboard in a website, you copy that code and paste it into your post. You can also share a link to your published material on Twitter or Facebook.

Tableau has continued to expand the amount of data that Tableau Public users are able to publish. If you don’t have access to a licensed copy of Tableau Desktop, download the free Tableau Public desktop version. It works just like the paid desktop license with the notable exception being that the only place you can save your work is Tableau Public. Be careful not to publish proprietary data there as it is available to everyone without restriction.

What Are Generated Values?

When you connect to a data source, Tableau creates new fields that make difficult tasks easier. You see these fields in the data pane when you connect to a data source.

Figure 2-13 zooms in on the data pane area. If you want to follow along, the companion website contains the workbook used to build the figures in this section. Open that workbook or connect to one of the sample datasets in your saved data sources.


Figure 2-13: Tableau-generated fields


Measure Names, Measure Values, and Number of Records are always present. If your dimensions include standard geographic place names, Tableau will also generate center-point geocode coordinates.

Measure Names and Measure Values

Measure Names and Measure Values can be used as shortcuts for viewing all the measures in your dataset or to express multiple measures on a single axis.

In Figure 2-14, you can see that two measures are shown, SUM (Profit) and SUM (Sales). These appear as separate columns in the same bar chart. The generated value, Measure Names, is used in the Columns shelf to separate the bars. Measure Names is also used on the Marks card to distinguish color and on the Filters shelf to limit the number of measures shown in the view. Measure Values contains the data, and this is shown as rows as you would expect from this type of bar chart.


Figure 2-14: Multiple measures on an axis


The side-by-side bar chart in Figure 2-14 was created by multi-selecting one dimension and two measures. Using Show Me the side-by-side bar chart was selected to create the view. Tableau automatically applied Measure Names to the Columns shelf and plotted both measures on the horizontal axis. The Measure Names quick filter was exposed by right-clicking Measure Names in the Filters shelf and selecting the Show Quick Filter menu option. Using the Measure Names quick filter, you can add or remove measures from the axis. This style of chart works well only if the measures being plotted have similar value ranges.

There are many ways to combine multiple measures on a single axis. You learn those details in Chapter 3.

Tableau Geocoding

If your data includes standard geographic fields, such as Country, State, Province, City, or Postal Codes – denoted by a small globe icon – Tableau will automatically generate the longitude and latitude values for the center points of each geographic entity displayed in your visualization. If Tableau doesn’t recognize a geographic dimension, you can edit the field by right-clicking it and selecting the desired geographic unit. Figure 2-15 shows a map created using country, state, and city and then using Show Me to display a symbol map. The map is filtered using the region field to show only the United States.


Figure 2-15: Automatic geocoding


You can edit the color of the marks in the map by clicking the Color button in the Marks card and then selecting the desired color. Figure 2-15 shows the Color dialog box open with Transparency at 50 % and a black border. The marks in the map were styled using the Marks card’s Color – changing the color transparency to 50 % and adding a black border. This makes overlapping clusters of marks easier to see. Hovering over a mark exposes a small dialog box (tooltip) that includes additional details about the mark. The contents of tooltips can be edited by clicking the tooltip button on the Marks card.


Конец ознакомительного фрагмента. Купить книгу

8

Gordon Mackenzie, Orbiting the Giant Hairball: A Corporate Fool’s Guide to Surviving with Grace (New York: Viking, 1998), p. 88.

Tableau Your Data!

Подняться наверх