Introduction

Categorical data items are often reported using frequency tables where the number of times a particular survey item was selected is displayed. However, there are many ways to visualize frequency data and people using the plots introduced in this tutorial often find it easier to understand the underlying data than with a table.


Bar Plot

A bar plot is used to display the frequency count for categorical data. The following figure is a bar plot showing the number of automobiles with three, four, and five gears according to the mtcars data frame.

These types of visuals are more effective than a table full of numbers and they are easy to generate with R.

Demonstration: Bar Plot

The following script creates a simple bar plot. Note: this is one long R command that has been broken up over several lines to make it easier to understand.

  • Line 2: This creates a bar plot using the barplot function. The first argument sent to the function is the data source for the heights of each bar in the plot. In this case, R creates a table from the gears variable in mtcars and then uses that table as data input for the plot. All of the other lines in this script embellish the bar plot to make it more usable.
  • Line 3: The “main” attribute sets the main title for the bar plot. In general, for any graphic in R “main” is used to set the title of the graph.
  • Line 4: This creates the label for the x-axis.
  • Line 5: This creates the label for the y-axis.
  • Line 6: This sets the color palette for the graph. In this case, the “rainbow” palette is used for the graph. Three colors were requested from that palette but specifying any number larger than three would have worked and created a slightly different palette. Experimentation is needed to find the most suitable palette for any given graph. (Note: setting colors on graphs was introduced in Visualizing Descriptives.)
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNpbXBsZSBCYXIgUGxvdFxuYmFycGxvdChoZWlnaHQgPSB0YWJsZShtdGNhcnMkZ2VhciksXG4gIG1haW4gPSBcIk51bWJlciBvZiBDYXJzIEJ5IEdlYXJzXCIsXG4gIHhsYWIgPSBcIkdlYXJzXCIsXG4gIHlsYWIgPSBcIkNvdW50XCIsXG4gIGNvbCA9IGNtLmNvbG9ycygzKVxuKSJ9
The DataCamp interface generates graphics in a Plots tab but because of the size of the interface those plots are “squished” and impossible to read. Click the double-headed arrow button on the Plots tab to open the graph in a larger window for evaluation and copying to a document. If the graphic does not open in a larger window then temporarily pause the browser’s pop-up blocker.

Skill Check: Bar Plot

Using the stackloss data frame, create a bar plot for Water.Temp. The plot should have a main title of “Water Temperature”, the X-Axis should have a lable of “Number”, the Y-Axis should have a lable of “Temperature”, and the color palette should be “cm.colors(9).”

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IiMgTm8gcHJlLWV4ZXJjaXNlIGNvZGUgZm9yIHRoaXMgZXhlcmNpc2UiLCJzYW1wbGUiOiIjIENyZWF0ZSBhIGJhciBwbG90IGZvciBXYXRlci5UZW1wIHVzaW5nIHRoZSBzcGVjaWZpY2F0aW9ucyBsaXN0ZWQgYWJvdmUuIiwic29sdXRpb24iOiJiYXJwbG90KGhlaWdodCA9IHRhYmxlKHN0YWNrbG9zcyRXYXRlci5UZW1wKSxcbiAgbWFpbiA9IFwiV2F0ZXIgVGVtcGVyYXR1cmVcIixcbiAgeGxhYiA9IFwiTnVtYmVyXCIsXG4gIHlsYWIgPSBcIlRlbXBlcmF0dXJlXCIsXG4gIGNvbCA9IGNtLmNvbG9ycyg5KVxuKSIsInNjdCI6ImV4MTFuZXEgPC0gXCJDaGVjayB0aGUgcGxvdCBzcGVjaWZpY2F0aW9ucy5cIlxuZXgxMm5lcSA8LSBcIlRoZSBtYWluIHRpdGxlIHNob3VsZCBiZSBXYXRlciBUZW1wZXJhdHVyZS5cIlxuZXgxM25lcSA8LSBcIlRoZSB4LWF4aXMgbGFiZWwgc2hvdWxkIGJlIE51bWJlci5cIlxuZXgxNG5lcSA8LSBcIlRoZSB5LWF4aXMgbGFiZWwgc2hvdWxkIGJlIFRlbXBlcmF0dXJlLlwiXG5leDE1bmVxIDwtIFwiVGhlIGNvbG9yIHBhbGV0dGUgc2hvdWxkIGJlIHNwZWNpZmllZCBhcyBjbS5jb2xvcnMoOSkuXCJcblxuc3RhdGUxMSA8LSBleCgpICU+JSBjaGVja19mdW5jdGlvbihcImJhcnBsb3RcIilcblxuc3RhdGUxMSAlPiUgXG4gIGNoZWNrX2FyZyhcIm1haW5cIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXgxMm5lcSlcblxuc3RhdGUxMSAlPiUgXG4gIGNoZWNrX2FyZyhcInhsYWJcIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXgxM25lcSlcblxuc3RhdGUxMSAlPiUgXG4gIGNoZWNrX2FyZyhcInlsYWJcIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXgxNG5lcSlcblxuc3RhdGUxMSAlPiUgXG4gIGNoZWNrX2FyZyhcImNvbFwiKSAlPiUgXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDE1bmVxKVxuXG5zdGF0ZTExICU+JSBcbiAgY2hlY2tfcmVzdWx0KCkgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXgxMW5lcSlcblxuc3VjY2Vzc19tc2coXCJQZXJmZWN0ISBBIGJhciBwbG90IGxpa2UgdGhpcyBtYWtlcyBmcmVxdWVuY3kgYW5hbHlzaXMgZWFzaWVyLlwiKSJ9

Clustered Bar Plot

A clustered bar plot (sometimes called a “Grouped Bar plot”) displays two or more categorical variables. In general, clustered bar plots are best at showing relationships between variables but not so good for determining the absolute size of each variable. The following plot shows the number of passengers on board the Titanic when it sank. While it is easy to determine that there were a lot more males than females on board, it is not possible to read the exact bar height of, for example, third class males.

Demonstration: Clustered Bar Plot

The following script creates a clustered bar plot.

  • Line 2: This begins the barplot function. It creates a table that contains the counts for cyl and gear in the mtcars data frame and then uses that table to produce the bar plot. Note the order of the variables in the table command. The grouping variable is listed second. In this example, the cars are grouped by gears and within each group the number of cylinders are displayed. The height of each bar is determined by the count of cars in each group.
  • Lines 3-6: These lines are essentially the same as for a simple bar plot as described above.
  • Line 7: Setting legend to TRUE displays a legend in the corner of the plot. Whenever more than one variable is being plotted it is important to display a legend for the reader. In this case, the legend displays the colors used for the cyl variable.
  • Line 8: A stacked bar is the default type of plot but Line 8 instructs R to create a plot with the variables beside each other. “Stacked” plots are described in the next section of this tutorial.
  • Line 9: This rather odd-looking line adds a title to the legend, otherwise users would be confused about what the various colors used in the plot mean.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENsdXN0ZXJlZCBCYXIgUGxvdFxuYmFycGxvdChoZWlnaHQgPSB0YWJsZShtdGNhcnMkY3lsLCBtdGNhcnMkZ2VhciksXG4gIG1haW4gPSBcIk51bWJlciBvZiBDYXJzIGJ5IEdlYXJzIGFuZCBDeWxpbmRlcnNcIixcbiAgeGxhYiA9IFwiR2VhcnNcIixcbiAgeWxhYiA9IFwiQ291bnRcIixcbiAgY29sID0gdG9wby5jb2xvcnMoMyksXG4gIGxlZ2VuZCA9IFRSVUUsXG4gIGJlc2lkZSA9IFRSVUUsXG4gIGFyZ3MubGVnZW5kID0gbGlzdCh0aXRsZSA9IFwiQ3lsaW5kZXJzXCIpXG4pIn0=

Clustered Bar Plot with Gradient Colors

Designers must be careful about using multiple hues on a single visual display since that creates what is sometimes called “clown’s pants” due to the extreme patchy color scheme. Plots with that type of coloration can be distracting and unusable. Instead, it is generally a best practice to use only shades of the same color or gradients from one color to another. As a comparison, the following script produces the bar plot seen in the previous figure but using only shades of blue.

  • Line 2: This line creates a custom palette named colpal (for “color palette”), which is like a new R command that creates color codes for plots. In this case, the function will create the codes for color gradients between blue and white. Note: In order to make plots more usable for readers who are color blind, only one hue should be selected along with either white or black.
  • Lines 4-7: These are the same as found in the previous bar plot script.
  • Line 8: This line sets the color for this plot by using colpal which was created in Line 2 and specifying three colors.
  • Lines 9-12: These are the same as found in the previous bar plot script.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENsdXN0ZXJlZCBCYXIgUGxvdCBXaXRoIEdyYWRpZW50IENvbG9yc1xuY29scGFsIDwtIGNvbG9yUmFtcFBhbGV0dGUoYyhcImJsdWVcIiwgXCJ3aGl0ZVwiKSlcblxuYmFycGxvdChoZWlnaHQgPSB0YWJsZShtdGNhcnMkY3lsLCBtdGNhcnMkZ2VhciksXG4gIG1haW4gPSBcIk51bWJlciBvZiBDYXJzIGJ5IEdlYXJzIGFuZCBDeWxpbmRlcnNcIixcbiAgeGxhYiA9IFwiR2VhcnNcIixcbiAgeWxhYiA9IFwiQ291bnRcIixcbiAgY29sID0gY29scGFsKDMpLFxuICBsZWdlbmQgPSBUUlVFLFxuICBiZXNpZGUgPSBUUlVFLFxuICBhcmdzLmxlZ2VuZCA9IGxpc3QodGl0bGUgPSBcIkN5bGluZGVyc1wiKVxuKSJ9

Skill Check: Clustered Bar Plot

Using the infert data frame, create a clustered bar plot for spontaneous grouped by education. The plot should have a main title of “Spontaneous Cases by Education Level”, the X-Axis should have a lable of “Education Level”, the Y-Axis should have a lable of “Count”, the colors should be from a custom palette using orange to white, and the legend should have a title of “Spontaneous.”

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IiMgTm8gcHJlLWV4ZXJjaXNlIGNvZGUgZm9yIHRoaXMgZXhlcmNpc2UiLCJzYW1wbGUiOiIjIENyZWF0ZSBhIGNsdXN0ZXJlZCBiYXIgcGxvdCBmb3IgaW5mZXJ0IHVzaW5nIHRoZSBzcGVjaWZpY2F0aW9ucyBsaXN0ZWQgYWJvdmUuIiwic29sdXRpb24iOiJjb2xwYWwgPC0gY29sb3JSYW1wUGFsZXR0ZShjKFwib3JhbmdlXCIsIFwid2hpdGVcIikpXG5cbmJhcnBsb3QoaGVpZ2h0ID0gdGFibGUoaW5mZXJ0JHNwb250YW5lb3VzLCBpbmZlcnQkZWR1Y2F0aW9uKSxcbiAgbWFpbiA9IFwiU3BvbnRhbmVvdXMgQ2FzZXMgQnkgRWR1Y2F0aW9uIExldmVsXCIsXG4gIHhsYWIgPSBcIkVkdWNhdGlvbiBMZXZlbFwiLFxuICB5bGFiID0gXCJDb3VudFwiLFxuICBjb2wgPSBjb2xwYWwoMyksXG4gIGxlZ2VuZCA9IFRSVUUsXG4gIGJlc2lkZSA9IFRSVUUsXG4gIGFyZ3MubGVnZW5kID0gbGlzdCh0aXRsZSA9IFwiU3BvbnRhbmVvdXNcIilcbikiLCJzY3QiOiJleDIxbmVxIDwtIFwiVGhlIGNvbG9yIHNwZWNpZmljYXRpb25zIHNob3VsZCBiZSBvcmFuZ2UsIHdoaXRlLlwiXG5leDIybmVxIDwtIFwiVGhlIGhlaWdodCBzaG91bGQgYmUgYSB0YWJsZSBvZiBzcG9udGFuZW91cyBhbmQgZWR1Y2F0aW9uLlwiXG5leDIzbmVxIDwtIFwiVGhlIG1haW4gdGl0bGUgc2hvdWxkIGJlIFNwb250YW5lb3VzIENhc2VzIEJ5IEVkdWNhdGlvbiBMZXZlbC5cIlxuZXgyNG5lcSA8LSBcIlRoZSB4LWF4aXMgbGFiZWwgc2hvdWxkIGJlIEVkdWNhdGlvbiBMZXZlbC5cIlxuZXgyNW5lcSA8LSBcIlRoZSB5LWF4aXMgbGFiZWwgc2hvdWxkIGJlIENvdW50LlwiXG5leDI2bmVxIDwtIFwiVGhlIGNvbG9yIHBhbGV0dGUgc2hvdWxkIGJlIHNwZWNpZmllZCBhcyBjb2xwYWwoMykuXCJcbmV4MjduZXEgPC0gXCJUaGUgbGVnZW5kIHZhbHVlIHNob3VsZCBiZSBUUlVFLlwiXG5leDI4bmVxIDwtIFwiVGhlIGJlc2lkZSB2YWx1ZSBzaG91bGQgYmUgVFJVRS5cIlxuZXgyOW5lcSA8LSBcImFyZ3MubGVnZW5kIHNob3VsZCBoYXZlIGEgdGl0bGUgb2YgU3BvbnRhbmVvdXMuXCJcblxuXG5zdGF0ZTIxIDwtIGV4KCkgJT4lIGNoZWNrX2Z1bmN0aW9uKFwiY29sb3JSYW1wUGFsZXR0ZVwiKVxuXG5zdGF0ZTIxICU+JSBcbiAgY2hlY2tfYXJnKFwiY29sb3JzXCIpICU+JSBcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4MjFuZXEpXG5cbnN0YXRlMjIgPC0gZXgoKSAlPiUgY2hlY2tfZnVuY3Rpb24oXCJiYXJwbG90XCIpXG5cbnN0YXRlMjIgJT4lIFxuICBjaGVja19hcmcoXCJoZWlnaHRcIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXgyMm5lcSlcblxuc3RhdGUyMiAlPiUgXG4gIGNoZWNrX2FyZyhcIm1haW5cIikgJT4lXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDIzbmVxKVxuXG5zdGF0ZTIyICU+JSBcbiAgY2hlY2tfYXJnKFwieGxhYlwiKSAlPiUgXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDI0bmVxKVxuXG5zdGF0ZTIyICU+JSBcbiAgY2hlY2tfYXJnKFwieWxhYlwiKSAlPiUgXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDI1bmVxKVxuXG5zdGF0ZTIyICU+JSBcbiAgY2hlY2tfYXJnKFwiY29sXCIpICU+JSBcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4MjZuZXEpXG5cbiMgVGhpcyBjaGVjayBkb2Vzbid0IHdvcmsuIFRlc3RUaGlzIHNlZW1zIHRvIHdhbnQgdG8gY2hlY2sgbGVnZW5kIGFzIGEgZnVuY3Rpb24gbm90IGFuIGFyZy5cbiMgZnVuMjJzdGF0ZSAlPiUgXG4jICAgY2hlY2tfYXJnKFwibGVnZW5kXCIpICU+JSBcbiMgICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXgyN25lcSlcblxuc3RhdGUyMiAlPiUgXG4gIGNoZWNrX2FyZyhcImJlc2lkZVwiKSAlPiUgXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDI4bmVxKVxuXG5zdGF0ZTIyICU+JSBcbiAgY2hlY2tfYXJnKFwiYXJncy5sZWdlbmRcIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXgyOW5lcSlcblxuc3VjY2Vzc19tc2coXCJQZXJmZWN0ISBBIGNsdXN0ZXJlZCBiYXIgcGxvdCBpcyB1c2VmdWwgZm9yIGNvbXBhcmluZyB2YXJpYWJsZXMuXCIpIn0=

Stacked Bar Plot

A stacked bar plot has one variable stacked on top of another. In general, these are very difficult to read and should only be used to make broad generalizations. Consider, for example, the following figure. This plot shows admissions for the University of California at Berkeley for six different programs. The top part of each bar (in brown) are the number admitted while the bottom part of each bar (in green) are the number rejected. Look at programs C and D. Were more students accepted in C or in D? Because these two values do not have the same baseline it is impossible to tell for certain which is larger.

Demonstration: Stacked Bar Plot

The following script is the same barplot function used in the clustered bar plots above, except “beside = TRUE” is missing. By default, bar plots are stacked in R so if the “beside” argument is missing (or set to “FALSE”) then the result will be a stacked bar plot.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFN0YWNrZWQgQmFyIFBsb3QgV2l0aCBHcmFkaWVudCBDb2xvcnNcbmNvbHBhbCA8LSBjb2xvclJhbXBQYWxldHRlKGMoXCJCcm93blwiLCBcIndoaXRlXCIpKVxuXG5iYXJwbG90KGhlaWdodCA9IHRhYmxlKG10Y2FycyRjeWwsIG10Y2FycyRnZWFyKSxcbiAgbWFpbiA9IFwiTnVtYmVyIG9mIENhcnMgYnkgR2VhcnMgYW5kIEN5bGluZGVyc1wiLFxuICB4bGFiID0gXCJHZWFyc1wiLFxuICB5bGFiID0gXCJDb3VudFwiLFxuICBjb2wgPSBjb2xwYWwoMyksXG4gIGxlZ2VuZCA9IFRSVUUsXG4gIGFyZ3MubGVnZW5kID0gbGlzdCh0aXRsZSA9IFwiQ3lsaW5kZXJzXCIpXG4pIn0=

It should be evident that the bar plot created in the above script is not very useful. While it is fairly easy to see that the number of 8-cylinder cars with three gears is much larger than the other categories, it is difficult to determine, for example, how many cars have five gears and eight cylinders. This difficulty is even worse when there are more than three levels for either of the two variables being plotted.

Skill Check: Stacked Bar Plot

Using the infert data frame, create a stacked bar plot for induced grouped by education. The plot should have a main title of “Induced Cases by Education Level”, the X-Axis should have a lable of “Education Level”, the Y-Axis should have a lable of “Count”, the colors should be from a custom palette using purple to white, and the legend should have a title of “Induced.”

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IiMgTm8gcHJlLWV4ZXJjaXNlIGNvZGUgZm9yIHRoaXMgZXhlcmNpc2UiLCJzYW1wbGUiOiIjIENyZWF0ZSBhIHN0YWNrZWQgYmFyIHBsb3QgZm9yIGluZmVydCB1c2luZyB0aGUgc3BlY2lmaWNhdGlvbnMgbGlzdGVkIGFib3ZlLiIsInNvbHV0aW9uIjoiY29scGFsIDwtIGNvbG9yUmFtcFBhbGV0dGUoYyhcInB1cnBsZVwiLCBcIndoaXRlXCIpKVxuXG5iYXJwbG90KGhlaWdodCA9IHRhYmxlKGluZmVydCRpbmR1Y2VkLCBpbmZlcnQkZWR1Y2F0aW9uKSxcbiAgbWFpbiA9IFwiSW5kdWNlZCBDYXNlcyBCeSBFZHVjYXRpb24gTGV2ZWxcIixcbiAgeGxhYiA9IFwiRWR1Y2F0aW9uIExldmVsXCIsXG4gIHlsYWIgPSBcIkNvdW50XCIsXG4gIGNvbCA9IGNvbHBhbCgzKSxcbiAgbGVnZW5kID0gVFJVRSxcbiAgYXJncy5sZWdlbmQgPSBsaXN0KHRpdGxlID0gXCJJbmR1Y2VkXCIpXG4pIiwic2N0IjoiZXgzMW5lcSA8LSBcIlRoZSBjb2xvciBzcGVjaWZpY2F0aW9ucyBzaG91bGQgYmUgcHVycGxlLCB3aGl0ZS5cIlxuZXgzMm5lcSA8LSBcIlRoZSBoZWlnaHQgc2hvdWxkIGJlIGEgdGFibGUgb2YgaW5kdWNlZCBhbmQgZWR1Y2F0aW9uLlwiXG5leDMzbmVxIDwtIFwiVGhlIG1haW4gdGl0bGUgc2hvdWxkIGJlIEluZHVjZWQgQ2FzZXMgQnkgRWR1Y2F0aW9uIExldmVsLlwiXG5leDM0bmVxIDwtIFwiVGhlIHgtYXhpcyBsYWJlbCBzaG91bGQgYmUgRWR1Y2F0aW9uIExldmVsLlwiXG5leDM1bmVxIDwtIFwiVGhlIHktYXhpcyBsYWJlbCBzaG91bGQgYmUgQ291bnQuXCJcbmV4MzZuZXEgPC0gXCJUaGUgY29sb3IgcGFsZXR0ZSBzaG91bGQgYmUgc3BlY2lmaWVkIGFzIGNvbHBhbCgzKS5cIlxuZXgzN25lcSA8LSBcIlRoZSBsZWdlbmQgdmFsdWUgc2hvdWxkIGJlIFRSVUUuXCJcbmV4MzhuZXEgPC0gXCJhcmdzLmxlZ2VuZCBzaG91bGQgaGF2ZSBhIHRpdGxlIG9mIEluZHVjZWQuXCJcblxuXG5zdGF0ZTMxIDwtIGV4KCkgJT4lIGNoZWNrX2Z1bmN0aW9uKFwiY29sb3JSYW1wUGFsZXR0ZVwiKVxuXG5zdGF0ZTMxICU+JSBcbiAgY2hlY2tfYXJnKFwiY29sb3JzXCIpICU+JSBcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4MzFuZXEpXG5cbnN0YXRlMzIgPC0gZXgoKSAlPiUgY2hlY2tfZnVuY3Rpb24oXCJiYXJwbG90XCIpXG5cbnN0YXRlMzIgJT4lIFxuICBjaGVja19hcmcoXCJoZWlnaHRcIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXgzMm5lcSlcblxuc3RhdGUzMiAlPiUgXG4gIGNoZWNrX2FyZyhcIm1haW5cIikgJT4lXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDMzbmVxKVxuXG5zdGF0ZTMyICU+JSBcbiAgY2hlY2tfYXJnKFwieGxhYlwiKSAlPiUgXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDM0bmVxKVxuXG5zdGF0ZTMyICU+JSBcbiAgY2hlY2tfYXJnKFwieWxhYlwiKSAlPiUgXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDM1bmVxKVxuXG5zdGF0ZTMyICU+JSBcbiAgY2hlY2tfYXJnKFwiY29sXCIpICU+JSBcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4MzZuZXEpXG5cbiMgVGhpcyBjaGVjayBkb2Vzbid0IHdvcmsuIFRlc3RUaGlzIHNlZW1zIHRvIHdhbnQgdG8gY2hlY2sgbGVnZW5kIGFzIGEgZnVuY3Rpb24gbm90IGFuIGFyZy5cbiMgZnVuMzJzdGF0ZSAlPiUgXG4jICAgY2hlY2tfYXJnKFwibGVnZW5kXCIpICU+JSBcbiMgICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXgzN25lcSlcblxuc3RhdGUzMiAlPiUgXG4gIGNoZWNrX2FyZyhcImFyZ3MubGVnZW5kXCIpICU+JSBcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4MzhuZXEpXG5cbnN1Y2Nlc3NfbXNnKFwiR29vZCBKb2IhIEEgc3RhY2tlZCBiYXIgcGxvdCBpcyB1c2VmdWwgZm9yIGNvbXBhcmluZyB2YXJpYWJsZXMuXCIpIn0=

Pie Chart

A pie chart is commonly used to display categorical data; however, pie charts are notoriously difficult to interpret, especially if the writer uses some sort of 3-D effect or “exploded” slices. The human brain seems able to easily compare the heights of two or more bars, as in bar plots, but the areas of two or more slices of a pie chart are difficult to compare. For this reason, pie charts should be avoided in research reports. If they are used at all, they should only illustrate one slice’s relationship to the whole, not comparing one slice to another; and no more than four or five slices should ever be presented on one chart.

The following figure shows the results of an experiment to compare the effectiveness of various feed supplements on the growth rate of chickens. This figure illustrates the problem with pie charts. Notice that “casein”" seems to promote growth better than “horsebean,”" but it is impossible to determine if “casein”" is better than “sunflower”" from this chart.

Demonstration: Pie Chart

The following script creates a pie chart.

  • Line 2: This starts a pie chart function. The “x” parameter is the data that needs to be charted. In this line, the feed variable in the chickwts data frame is extracted to a table since the pie chart function expects input in the form of a table.
  • Lines 3-4: These lines define the main title and colors used for the pie chart. These parameters are the same as was seen in other graphs in this lab.
  • Line 5: This tells R to use the labels used in the feed variable as the labels on the pie chart.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENvdW50IG9mIENoaWNrcyBieSBGZWVkXG5waWUoeCA9IHRhYmxlKGNoaWNrd3RzJGZlZWQpLFxuICBtYWluID0gXCJDb3VudCBvZiBDaGlja3MgYnkgRmVlZFwiLFxuICBjb2wgPSByYWluYm93KDYpLFxuICBsYWJlbHMgPSBjKGxldmVscyhjaGlja3d0cyRmZWVkKSlcbiAgKSJ9

Note: The pie chart shows the number of chicks that were given a particular type of feed. Even though the pie “slices” are very nearly the same size there was a slightly different number of chicks on each type of feed.

Skill Check: Pie Chart

Create a pie chart for the state.division data frame. NOTE: This data frame has only one vector so it is not necessary to use the “$” operator. The chart should have a main title of “States by Division”, nine colors should be used from the topo.colors palette, and the lables should be the same as the “states.division” vector.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IiMgTm8gcHJlLWV4ZXJjaXNlIGNvZGUgZm9yIHRoaXMgZXhlcmNpc2UiLCJzYW1wbGUiOiIjIENyZWF0ZSBhIHBpZSBjaGFydCBmb3Igc3RhdGVzLmRpdmlzaW9uIHVzaW5nIHRoZSBzcGVjaWZpY2F0aW9ucyBsaXN0ZWQgYWJvdmUuIiwic29sdXRpb24iOiJwaWUoeCA9IHRhYmxlKHN0YXRlLmRpdmlzaW9uKSxcbiAgbWFpbiA9IFwiU3RhdGVzIGJ5IERpdmlzaW9uXCIsXG4gIGNvbCA9IHRvcG8uY29sb3JzKDkpLFxuICBsYWJlbHMgPSBjKGxldmVscyhjaGlja3d0cyRmZWVkKSlcbiAgKSIsInNjdCI6ImV4NDFuZXEgPC0gXCJUaGUgY2hhcnQgc2hvdWxkIHVzZSBhIHRhYmxlIG9mIHN0YXRlLmRpdmlzaW9uIGZvciBpbnB1dC5cIlxuZXg0Mm5lcSA8LSBcIlRoZSBtYWluIHRpdGxlIHNob3VsZCBiZSBTdGF0ZXMgYnkgRGl2aXNpb24uXCJcbmV4NDNuZXEgPC0gXCJUaGUgY29sb3IgcGFsZXR0ZSBzaG91bGQgYmUgdG9wby5jb2xvcnMoOSkuXCJcbmV4NDRuZXEgPC0gXCJUaGUgbGFiZWxzIHNob3VsZCBiZSB0aGUgbGV2ZWxzIG9mIGNoaWNrd3RzJGZlZWQuXCJcblxuc3RhdGU0MSA8LSBleCgpICU+JSBjaGVja19mdW5jdGlvbihcInBpZVwiKVxuXG5zdGF0ZTQxICU+JSBcbiAgY2hlY2tfYXJnKFwieFwiKSAlPiUgXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDQxbmVxKVxuXG5zdGF0ZTQxICU+JSBcbiAgY2hlY2tfYXJnKFwibWFpblwiKSAlPiVcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4NDJuZXEpXG5cbnN0YXRlNDEgJT4lIFxuICBjaGVja19hcmcoXCJjb2xcIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXg0M25lcSlcblxuc3RhdGU0MSAlPiUgXG4gIGNoZWNrX2FyZyhcImxhYmVsc1wiKSAlPiUgXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDQ0bmVxKVxuXG5zdWNjZXNzX21zZyhcIlBlcmZlY3QhIFBpZSBjaGFydHMgYXJlIHZlcnkgcG9wdWxhci5cIikifQ==

Heat Map

Heat maps use colors to depict the counts of variables and are commonly found around election time to depict how precincts are voting, red for republican and blue for democrat. They are also routinely used on weather maps to depict areas with the greatest probability for rain or snow. While heat maps can be displayed in a geographical format where, for example, the various states are shaded to represent some factor, they are also commonly seen as a grid. The following figure shows a heat map of various socioeconomic indicators by province in Switzerland from 1888.

In a heat map produced by R, lighter colors represent larger numbers. Thus, the province with the highest fertility rate is Franches-Mnt since it has the lightest color and the province with the least agriculture is Courtelary since it has the darkest color for those variables. Interpreting the heat map can be a challenge for the researcher. In some cases a light color would be positive and in others negative. For example, the highest education level would be in Neuveville (positive) but the highest infant mortality would be in Porrentruy (negative). Also, the colors are often very similar and difficult to distinguish. For example, for “examination” Cossonay has a numeric value of 22 while Aigle has 21. These two colors are slightly different but it would be difficult to detect that from the image. Often, the best that can be done with a heat map is identifying broad generalizations.

Demonstration: Heat Map

The following script creates a heat map for the USJudgeRating data frame.

  • Line 1: R can store data in several different formats and many, like vector and data frame, are used by other tutorials in this series. Heat maps require data to be in a matrix format and this line converts the first 20 lines, out of 43, in the data frame into a matrix named hmap. Notice how the first twenty lines in that data frame are specified, using [20:1,]. Finally, “x” is the name of the parameter in the as.matrix function that receives the input data.
  • Line 2: This is the start of the heat map function. This line instructs R to create a heat map from the hmap matrix. R designates the input variable for heatmap to be “x.”
  • Line 3: The main title of the heatmap is “US Judge Ratings.”
  • Line 4: The x-axis is labeled “Characteristic.”
  • Line 5: This suppresses the row “dendogram” that is used to order the rows. The best way to see what this line does is to comment it out and re-run the script.
  • Line 6: This suppresses the column “dendogram.”
  • Line 7: Sets the heat map to scale the rows. In this way, the color for each row cell is calculated such that the entire row’s mean is zero and the standard deviation is one. The other option is to scale “column” and researchers would want to try both to see which provides a better heat map.
  • Line 8: This sets the right and bottom margins. The values were found by simple trial-and-error to produce the most legible heat map.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJobWFwIDwtIGFzLm1hdHJpeCh4ID0gVVNKdWRnZVJhdGluZ3NbMjA6MSxdKVxuaGVhdG1hcCh4ID0gaG1hcCxcbiAgbWFpbiA9IFwiVVMgSnVkZ2UgUmF0aW5nc1wiLFxuICB4bGFiID0gXCJDaGFyYWN0ZXJpc3RpY1wiLFxuICBSb3d2PU5BLFxuICBDb2x2PU5BLFxuICBzY2FsZT1cInJvd1wiLFxuICBtYXJnaW5zPWMoOCwzKVxuKSJ9

Skill Check: Heat Map

Create a heat map for the attitude data frame. The heat map should have a main title of “Clerical Employees Attitude”, the label for the X-axis should be “Characteristic” and for the y-Axis, “Department.” Specify that the Rowv and Colv are both “NA.” Scale by “row” and set the margins for “c(8,3).”

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IiMgTm8gcHJlLWV4ZXJjaXNlIGNvZGUgZm9yIHRoaXMgZXhlcmNpc2UiLCJzYW1wbGUiOiIjIENyZWF0ZSBhIGhlYXQgbWFwIGZvciBhdHRpdHVkZSB1c2luZyB0aGUgc3BlY2lmaWNhdGlvbnMgbGlzdGVkIGFib3ZlLiIsInNvbHV0aW9uIjoiaG1hcCA8LSBhcy5tYXRyaXgoeCA9IGF0dGl0dWRlWzE6MjAsXSlcblxuaGVhdG1hcCh4ID0gaG1hcCxcbiAgbWFpbiA9IFwiQ2xhcmljYWwgRW1wbG95ZWVzIEF0dGl0dWRlXCIsXG4gIHhsYWIgPSBcIkNoYXJhY3RlcmlzdGljXCIsXG4gIHlsYWIgPSBcIkRlcGFydG1lbnRcIixcbiAgUm93dj1OQSxcbiAgQ29sdj1OQSxcbiAgc2NhbGU9XCJyb3dcIixcbiAgbWFyZ2lucz1jKDgsMylcbikiLCJzY3QiOiJleDUxbmVxIDwtIFwiQ2hlY2sgdG8gYmUgc3VyZSB0aGUgeCB2YWx1ZSBpcyBhdHRpdHVkZVsxOjIwLF0gLS0gZG9uJ3QgZm9yZ2V0IHRoZSBjb21tYSBhZnRlciAyMCFcIlxuZXg1Mm5lcSA8LSBcIlRoZSB4IHZhbHVlIHNob3VsZCBiZSBobWFwLlwiXG5leDUzbmVxIDwtIFwiVGhlIG1haW4gdGl0bGUgc2hvdWxkIGJlIENsYXJpY2FsIEVtcGxveWVlcyBBdHRpdHVkZS5cIlxuZXg1NG5lcSA8LSBcIlRoZSB4LWF4aXMgbGFiZWwgc2hvdWxkIGJlIENoYXJhY3RlcmlzdGljLlwiXG5leDU1bmVxIDwtIFwiVGhlIHktYXhpcyBsYWJlbCBzaG91bGQgYmUgRGVwYXJ0bWVudC5cIlxuZXg1Nm5lcSA8LSBcIlJvd3Ygc2hvdWxkIGJlIE5BLlwiXG5leDU3bmVxIDwtIFwiQ29sdiBzaG91bGQgYmUgTkEuXCJcbmV4NThuZXEgPC0gXCJTY2FsZSBzaG91bGQgYmUgcm93LlwiXG5leDU5bmVxIDwtIFwiVGhlIG1hcmdpbnMgc2hvdWxkIGJlIHNldCBmb3IgOCwgMy5cIlxuXG5leCgpICU+JSBcbiAgY2hlY2tfb2JqZWN0KFwiaG1hcFwiKSAlPiVcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4NTFuZXEpXG5cbnN0YXRlNTEgPC0gZXgoKSAlPiUgXG4gIGNoZWNrX2Z1bmN0aW9uKFwiaGVhdG1hcFwiKVxuXG5zdGF0ZTUxICU+JSBcbiAgY2hlY2tfYXJnKFwieFwiKSAlPiUgXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDUybmVxKVxuXG5zdGF0ZTUxICU+JSBcbiAgY2hlY2tfYXJnKFwibWFpblwiKSAlPiVcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4NTNuZXEpXG5cbnN0YXRlNTEgJT4lIFxuICBjaGVja19hcmcoXCJ4bGFiXCIpICU+JSBcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4NTRuZXEpXG5cbnN0YXRlNTEgJT4lIFxuICBjaGVja19hcmcoXCJ5bGFiXCIpICU+JSBcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4NTVuZXEpXG5cbnN0YXRlNTEgJT4lIFxuICBjaGVja19hcmcoXCJSb3d2XCIpICU+JSBcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4NTZuZXEpXG5cbnN0YXRlNTEgJT4lXG4gIGNoZWNrX2FyZyhcIkNvbHZcIikgJT4lXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDU3bmVxKVxuXG5zdGF0ZTUxICU+JSBcbiAgY2hlY2tfYXJnKFwic2NhbGVcIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXg1OG5lcSlcblxuc3RhdGU1MSAlPiUgXG4gIGNoZWNrX2FyZyhcIm1hcmdpbnNcIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXg1OW5lcSlcblxuc3VjY2Vzc19tc2coXCJHb29kIEpvYiEgSGVhdCBtYXBzIGRpc3BsYXkgYSBsb3Qgb2YgZGF0YSBpbiBhIGNvbXBhY3Qgd2F5LlwiKSJ9

Mosiac Plot

A mosiac plot indicates the relative counts of items in a data frame by sizing areas on a grid. The following figure is a mosiac plot that indicates the relationship between the number of gears and cylinders in several cars. Notice that 8-cylinder cars overwhelmingly have three gears while 4-cylinder cars tend to have four gears. This plot gives a quick visual representation of the relationships between categorical variables, like a pie chart shows the relationship between continuous variables. A mosiac plot would suffer the same weaknesses as a pie chart and are, generally, rather difficult to interpret.

Demonstration: Mosaic Plot

The following script creates a mosiac plot.

  • Line 2: A mosiac plot requires the input to be in table format so this line creates a table from the gear and cyl variables. The data specification is entered as variable “x” in a plot.
  • Lines 3-6: These are similar to those used for other graphics functions and should be fairly easy to understand.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIE1vc2lhYyBQbG90cyBVc2luZyBNVENhcnNcbnBsb3QoeCA9IHRhYmxlKG10Y2FycyRnZWFyLCBtdGNhcnMkY3lsKSxcbiAgbWFpbiA9IFwiR2VhcnMgdnMgQ3lsaW5kZXJzXCIsXG4gIHhsYWIgPSBcIkdlYXJzXCIsXG4gIHlsYWIgPSBcIkN5bGluZGVyc1wiLFxuICBjb2wgPSB0b3BvLmNvbG9ycygzKVxuICApIn0=

Skill Check: Mosaic Plot

Create a mosaic plot for the infert data frame. The plot should have a main title of “Compare Induced and Education”, the label for the X-axis should be “Number Induced” and for the y-Axis, “Education Level.” Select three “terrain.colors” for the plot.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IiMgTm8gcHJlLWV4ZXJjaXNlIGNvZGUgZm9yIHRoaXMgZXhlcmNpc2UiLCJzYW1wbGUiOiIjIENyZWF0ZSBhIG1vc2FpYyBwbG90IGZvciBpbmZlcnQgdXNpbmcgdGhlIHNwZWNpZmljYXRpb25zIGxpc3RlZCBhYm92ZS4iLCJzb2x1dGlvbiI6InBsb3QoeCA9IHRhYmxlKGluZmVydCRpbmR1Y2VkLCBpbmZlcnQkZWR1Y2F0aW9uKSxcbiAgbWFpbiA9IFwiQ29tcGFyZSBJbmR1Y2VkIGFuZCBFZHVjYXRpb25cIixcbiAgeGxhYiA9IFwiTnVtYmVyIEluZHVjZWRcIixcbiAgeWxhYiA9IFwiRWR1Y2F0aW9uIExldmVsXCIsXG4gIGNvbCA9IHRlcnJhaW4uY29sb3JzKDMpXG4gICkiLCJzY3QiOiJleDYxbmVxIDwtIFwiQ2hlY2sgdG8gYmUgc3VyZSB0aGUgeCB2YWx1ZSBpcyBhIHRhYmxlIG9mIGluZHVjZWQgYW5kIGVkdWNhdGlvbi5cIlxuZXg2Mm5lcSA8LSBcIlRoZSBtYWluIHRpdGxlIHNob3VsZCBiZSBDb21wYXJlIEluZHVjZWQgYW5kIEVkdWNhdGlvbi5cIlxuZXg2M25lcSA8LSBcIlRoZSB4LWF4aXMgbGFiZWwgc2hvdWxkIGJlIE51bWJlciBJbmR1Y2VkLlwiXG5leDY0bmVxIDwtIFwiVGhlIHktYXhpcyBsYWJlbCBzaG91bGQgYmUgRWR1Y2F0aW9uIExldmVsLlwiXG5leDY1bmVxIDwtIFwiVGhlIGNvbG9yIHNob3VsZCBiZSB0ZXJyYWluLmNvbG9ycygzKS5cIlxuXG5zdGF0ZTYxIDwtIGV4KCkgJT4lIFxuICBjaGVja19mdW5jdGlvbihcInBsb3RcIilcblxuc3RhdGU2MSAlPiUgXG4gIGNoZWNrX2FyZyhcInhcIikgJT4lIFxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXg2MW5lcSlcblxuc3RhdGU2MSAlPiUgXG4gIGNoZWNrX2FyZyhcIm1haW5cIikgJT4lXG4gIGNoZWNrX2VxdWFsKGluY29ycmVjdF9tc2cgPSBleDYybmVxKVxuXG5zdGF0ZTYxICU+JSBcbiAgY2hlY2tfYXJnKFwieGxhYlwiKSAlPiVcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4NjNuZXEpXG5cbnN0YXRlNjEgJT4lIFxuICBjaGVja19hcmcoXCJ5bGFiXCIpICU+JVxuICBjaGVja19lcXVhbChpbmNvcnJlY3RfbXNnID0gZXg2NG5lcSlcblxuc3RhdGU2MSAlPiUgXG4gIGNoZWNrX2FyZyhcImNvbFwiKSAlPiVcbiAgY2hlY2tfZXF1YWwoaW5jb3JyZWN0X21zZyA9IGV4NjVuZXEpXG5cbnN1Y2Nlc3NfbXNnKFwiUGVyZmVjdCEgTW9zYWljIHBsb3RzIGFyZSBub3QgdG9vIGNvbW1vbiBidXQgbWF5IGJlIHVzZWZ1bCBpbiBzb21lIHNpdHVhdGlvbnMuXCIpIn0=