Box Plots in 8 Steps in Excel

Overview

A box plot is a simple method of displaying data by splitting the data in quartiles. Each quartile of a data set contains one quarter of the data points. The Q1 quartile contains the quarter of the data set that has the lowest values. Quartile Q4 contains the quarter of the data set that has the highest values. An example of a box plot is the following:

(Click On Image To See a Larger Version)

The box plot is sometimes called the box-and-whisker diagram because the highest and lowest quartiles of data are represented by whiskers above and below the boxes that contain quartiles 2 and 3 of the data set. The inner 50-percent of the data are contained in quartiles 2 and 3. The outer 50-percent of the data points are contained in quartiles 1 and 4. The border between the 2nd quartile (purple box) and the 3rd quartile (yellow box) is the data set’s median. One half of all data points are above and below the median. Sometimes box plots also plot the mean is done here.

The following data set can be used to create a box plot in Excel:

(Click On Image To See a Larger Version)

A box plot in Excel of the data would appear as follows:

(Click On Image To See a Larger Version)

The box plot was created with Excel charting tools using the following information:

(Click On Image To See a Larger Version)

The information is displayed in the box plot as follows:

(Click On Image To See a Larger Version)

Information That Box Plots Provide

Box plots provide basic information about the distribution of data. Box plots provide visual information:

1. The variation or spread of the data is indicated by the width of each quartile. The wider a quartile is, the greater the spread or variance in that quartile is likely to be.

2. Any skew of the data set is indicated by the lengths of the upper and lower whiskers relative to each other or by how close the median and mean are to each other. A data set is skewed in a direction if a histogram of the data set would show a long tail in that direction. The skew is in the direction of the long data tail. A longer upper whisker would indicate a skew in the positive direction while a longer lower whisker would indicate that the data is skewed in the negative direction. If the mean is greater than the median, the data likely has a longer tail in the positive direction and is therefore likely to be positively skewed. When the mean is less than the median, a negative skew is indicated.

3. The range of data is indicated by the difference between the end points of the upper and lower whiskers, which are the min and max data values.

4. The interquartile range (IQR) is the middle half of all data points. 50-percent of all data points lie in the 2nd and 3rd quartiles that are the boxes shown in the Q2 and Q3 boxes

5. Box plots provide the following measures of central tendency: the mean and the median.

6. Box plot provide visual indication of whether a factor is significant. A factor that is significant will display a visible difference in central tendency and dispersion in a box plot of sample groups of different levels of that factor. This is demonstrated as follows:

The following two images show box plots of three samples that have the same means in both images. The difference is the degree of dispersion. The small amount of within-group variance makes the differences between the sample groups readily apparent in a box plot.

(Click On Image To See a Larger Version)

The relatively larger amount of within-group variance in the sample groups in the second image that follows obscures the difference between the sample groups. ANOVA (Analysis of Variance) combines the differences in sample means and sample variances to determine whether at least one sample group is significantly different than the other sample groups and therefore likely to have come from a different population than the other sample groups.

(Click On Image To See a Larger Version)

Box plot are basic analysis tools and do not provide information such as the shape of the distribution of a data set. A histogram is a good tool to create a visual representation of that shape.

Below are the steps to create a box plot in Excel:

Step 1) Sort the Sample Groups in Descending Order

This is not absolutely necessary but simply makes the data a bit more intuitive to work with. The Excel Sort tool can easily be used for this.

(Click On Image To See a Larger Version)

Step 2) Calculate Measures of Central Tendency and Dispersion for Each Sample Group

(Click On Image To See a Larger Version)

Step 3) Create the Bottom Sections of a Stacked Column Chart

The box plots for the sample are created with a stacked column chart in Excel. The chart’s first data series becomes the bottom section of the stacked columns. The second data series that is established for the chart becomes the second section of the columns. The third data series that is established for the chart becomes the third section of the columns and so on.

The bottom section, which is created in this step, goes from 0 to Q1, the top of the first quartile. This bottom section will be configured to have no fill or outline and will therefore be invisible.

The next higher section will be the Q2 quartile. The section of the stacked column will be configured with purple fill and black outline.

The next higher section will be the Q3 quartile. This section of the stacked column will be configured with yellow fill and black outline.

The upper whisker is then created by attaching a positive error bar that has the length of the Q4 quartile to the Q3 column section.

The lower whisker is then created by attaching a negative error bar that has the length of the Q1 quartile to the bottom, invisible column section.

Create a stacked chart by selecting the Insert tab and then selecting the following 2-D Column, Stacked Column chart option:

(Click On Image To See a Larger Version)

This creates a blank chart. To add the first data series, which will become the bottom section of the columns, right-click on the blank chart and then click Select Data in the pop-up menu that appears with the right-click on the chart as shown below:

(Click On Image To See a Larger Version)

The Select Data dialogue box then appears. Add the first data series by clicking Add. This will bring up the Edit Series dialogue box as shown below. In this dialogue box input the location of the series name and the series data values. This will be a blank series and the name is designated by the blank cell F10. The series values are in cells G10:I10, which contains a copy of the Q1 values in row 6.

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

This produces the following graph, which contains the bottom section of the stacked columns. The default color for this section is blue.

(Click On Image To See a Larger Version)

To remove the blue color, right click on any of the blue columns and then select the No Fill option of the pop-up menu that appears with the right-click. It should also be noted that this data series has also been designated having no name by selecting cell F10, which is a blank cell, to provide the name for the data series.

(Click On Image To See a Larger Version)

After the blue color is removed, the bottom sections of the stacked columns are still there, but they are now invisible as shown in the following diagram:

(Click On Image To See a Larger Version)

Step 4) Create the Next Highest Section of the Stacked Columns

The next section of each stacked column will be the purple box representing the Q2 data. This section will sit on top of the invisible bottom column section that extends to the end of the Q1 quartile (which is the beginning of the Q2 quartile that is now being constructed).

This column section is created by right-clicking the chart and then clicking the Select Data from the pop-up menu that appears with the right-click.

(Click On Image To See a Larger Version)

The Select Data dialogue box then appears. Add the second data series by clicking Add. This will bring up the Edit Series dialogue box as shown below. In this dialogue box input the location of the series name and the series data values. This will be Q2 quartile series and the name is designated as 2Q Box from that value in cell the blank cell F11. The series values are in cells G11:I11, which contains a copy of the Q2 values in row 7.

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

The default color for these column sections is orange. This change be changed by right-clicking any one of the orange boxes on the chart and then selecting purple fill and a solid line outline in the dialogue box that appears with the right-click as shown below:

(Click On Image To See a Larger Version)

The current state of the graph is now shown as follows. The yellow boxes representing the Q3 quartile data will be created in the next step.

(Click On Image To See a Larger Version)

Step 5) Create the Next Highest Section of the Stacked Columns

The next section of each stacked column will be the yellow box representing the Q3 data. This section will sit on top of the purple column section that extends to the end of the Q2 quartile.

This column section is created by right-clicking the chart and then clicking the Select Data from the pop-up menu that appears with the right-click.

(Click On Image To See a Larger Version)

The Select Data dialogue box then appears. Add the second data series by clicking Add. This will bring up the Edit Series dialogue box as shown below. In this dialogue box input the location of the series name and the series data values. This will be Q3 quartile series and the name is designated as 3Q Box from that value in cell the blank cell F12. The series values are in cells G12:I12, which contains a copy of the Q3 values in row 8.

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

The chart now appears as follows. The default color of the third section of the stacked column is grey.

(Click On Image To See a Larger Version)

The upper box can now be converted from grey to yellow with a block outline by right-clicking on any of the grey boxes and select yellow fill and black outline from the dialogue box that appears with the right-click.

(Click On Image To See a Larger Version)

The stacked data columns only have the generic labels 1, 2, and 3 on the horizontal axis. Horizontal axis labels can be added by right-clicking the chart once again. This will bring up the Select Data dialogue box. Click the Edit button for the Horizontal Axis Labels as shown below:

(Click On Image To See a Larger Version)

Designate the horizontal axis labels to be the contents of cells G2:I2, which are Sample 1, Sample 2, and Sample 3.

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

The resulting chart is show as follows and is now ready to have the upper and lower whiskers installed.

(Click On Image To See a Larger Version)

Step 6) Create the Upper Whiskers

The whiskers will extend either up or down from the top of the data box that they are attached to. The upper whiskers, which represent the Q4 quartile, will extend from the top of the Q3 data box. Do that by first selecting any yellow Q3 box. This will highlight all three Q3 boxes along with the data row that they represent.

(Click On Image To See a Larger Version)

Add the Error bar by clicking Design / Add Chart Element / Error Bars / More Error Bar Options as shown in the following diagram:

(Click On Image To See a Larger Version)

Select Custom Error Bars from the dialogue box and then select the Whisker+ data in cells G14:I14 as show in the following diagram:

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

The final step is to format the Error Bars. The Format Error Bars dialogue box is on the right of the page as shown in the previous image. The upper Error Bar should be configured in the Plus direction, with a Cap, and using the values from the Whisker+ row as the values for the length.

(Click On Image To See a Larger Version)

The upper whiskers will now appear as follows:

(Click On Image To See a Larger Version)

Step 7) Create the Lower Whiskers

The whiskers will extend either up or down from the top of the data box that they are attached to. The lower whiskers, which represent the Q1 quartile, will extend from the top of the invisible Q1 data section of the stacked column. Do that by first selecting any transparent Q1 section. This will highlight all three transparent Q1 sections along with the data row that they represent.

(Click On Image To See a Larger Version)

Add the Error bar by clicking Design / Add Chart Element / Error Bars / More Error Bar Options as shown in the following diagram:

(Click On Image To See a Larger Version)

Set the Error Bar direction to Minus, add a Cap, and designate the Whisker- row data to be the length of these whiskers.

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

(Click On Image To See a Larger Version)

Here is the resulting box plot graph with the upper and lower whiskers attached to the Q2/Q3 boxes.

(Click On Image To See a Larger Version)

Excel Master Series Blog Directory

Statistical Topics And Articles In

This Blog

You Will Become an Excel Statistical Master!