Showing posts with label statistics excel. Show all posts
Showing posts with label statistics excel. Show all posts

Tuesday, December 27, 2011

Simplifying Excel Pivot Table and Pivot Chart Setup

Simplifying Excel Pivot

Table and Pivot Chart

Setup

Overview of Excel Pivot Tables

The Excel pivot table is an incredibly convenient tool to summarize and better understand large amounts of data by creating instant cross tabulations that can be quickly re-arranged to highlight different aspects of the data. Excel pivot tables give the user the ability to instantly summarize data in almost any possible configuration. Pivot tables allow you to quickly pull useful information out of large amounts of raw data. Important insights about the data can be readily obtained from Excel pivot tables that might be difficult to discover otherwise.

The most common use of pivot tables is to summarize data by all available categories and subcategories within the raw data. The incredible versatility of pivot tables allows you to expand and collapse all levels of data and instantly reshuffle all categories and subcategories in order to obtain just the right information and insights.

Overview of Excel Pivot Charts

The Excel pivot chart is an excellent supplement to any Excel pivot chart. The pivot chart provides a graphical representation of the pivot table's information that enables quicker comprehensive of trends and comparisons in the data. The pivot chart can instantly created in Excel in a single step after the pivot table has been constructed. The pivot chart a number of different chart types that facilitate even faster comprehension of data's messages.

All of that power and versatility can leave a person new to pivot tables wondering where to start. Fortunately, pivot tables are quite a bit more intuitive than they may initially appear. An easy, step-by-step process will be provided in this article to quickly create a pivot table with multiple configurations on the following data set:

Pivot Tables in Excel - Raw Data Table

Step 1 – Determine the Overall Objective

The most important aspect of creating an Excel pivot table is to establish the main goal and any lesser goals of the analysis. These goals will drive every aspect of the pivot table analysis going forward.

Two following goals will be established for this pivot table analysis:

  1. Compare sales results of the two salespeople

  2. Compare sales results of the two products

Step 2 – Determine the Query Question That Meets Each Goal

As often happens, getting the right answer depends on asking the right question. The query question determines how the data will be arranged and summarized within the pivot table. The following framework can be used to translate the goal determined in step 1 to a correct query question.

Here is a good way to structure the query question for each goal:

Compare _______________

For each _______________

For each _______________

Goal 1 – Compare Sales Results of the Two Salespeople

Compare: Sales Results of Each Salesperson

For each: Product

For each: Quarter

Goal 2 – Compare Sales Results of Each Product

Compare: Sales Results of Each Product

For each: Quarter

Step 3 – Create the Pivot Table That Implements Each Query Question

When the query question has been correctly created, it is usually a simple and straightforward to configure a pivot table that answers the query question. An empty pivot table dialogue box can be brought up in Excel as follows:

Pivot Tables in Excel - Insert Pivot Table

Insert / PivotTable will bring up the following empty pivot table dialogue box.

Pivot Tables in Excel - Empty Dialogue Box

The data table should be arranged in contiguous (touching) rows. The Table/Range consists of the entire table with headers.

Pivot Tables in Excel - Select Data

The location where the pivot table should be placed will be the cell address of the upper left corner of the output.

Pivot Tables in Excel - Completed Dialogue Box

Clicking OK brings up the following PivotTable Field dialogue box. All data column headers from the data table that was selected are shown in the empty PivotTable Field dialogue box.

Pivot Tables in Excel - Empty Dialogue Box

The column headers can now be arranged in the dialogue box as follows to answer the query question corresponding to each goal.

Goal 1 – Compare Sales Results of the Two Salespeople

Compare: Sales Results of Each Salesperson

For each: Product

For each: Quarter

Pivot Tables in Excel - 1st Pivot Table Field Arrangement

All columns headers have been checked. Headers of categorical (non-numeric) data are placed as Rows within the dialogue box. Categorical data include Quarter, Product, and Salesperson. These categorical data labels are placed in the dialogue box in the order that they appeared in the query question.

Numeric data to be evaluated is summed up under Values in the dialogue box.

The following pivot tables is now produced.

Pivot Tables in Excel - 1st Pivot Table

Step 4 – Create the Pivot Chart From the Pivot Table

As soon as the pivot table is created, the corresponding pivot chart for that pivot table can be created in a single step. Simply click anywhere on the pivot table, click on the PivotTable Tools Design tab and then select Pivot Chart as follows:

Pivot Charts in Excel - Insert Pivot Chart

This will bring up the the chart type dialogue box. In this case the 3-D Stacked Column chart was selected.

Pivot Charts in Excel - Chart Type

The resulting pivot chart is the following:

Pivot Charts in Excel - Pivot Chart 1

The chart provides a more intuitive comparison between sales of the two salespeople than does the table.

If the query question were changed to the following:

Compare: Sales Results of Each Salesperson

For each: Quarter

For each: Product

then the PivotTable Fields dialogue box would be reconfigured to implement the new query question by simply dragging the Product column label to the top of Rows to reflect the order in the new query question.

Pivot Tables in Excel - 2nd Pivot Table Field Arrangement

This would produce the following pivot table.

Pivot Tables in Excel - 2nd Pivot Table

These changes to the table are instantly reflected in the automatically updated pivot chart as follows:

Pivot Charts in Excel - Pivot Chart 2

Showing the data in separate columns in the pivot table for each quarter can be implemented by dragging the Quarter column header to Columns in the PivotTable Fields dialogue box as follows:

Pivot Tables in Excel - 3rd Pivot Table Field Arrangement

This instantly produces the following pivot table data for each quarter in its own column.

Pivot Tables in Excel - 3rd Pivot Table

When the data's arrangement is significantly changed within the pivot table, the pivot chart type sometimes should be changed to provide the most intuitive presentation of the data. In this case the pivot chart type was changed from a 3-D Stacked Column chart to a 3-D Clustered Column chart to create greater clarity. In the PivotChart Design tools tab, select Change Chart Type as follows:

Pivot Charts in Excel - Change Chart Type

The 3-D Clustered Chart is selected from the following Chart Type dialogue box:

Pivot Charts in Excel - Select Chart Type

This instantly produces the following pivot chart:

Pivot Charts in Excel - Chart 3

Goal 2 – Compare Sales Results of Each Product

Compare: Sales Results of Each Product

For each: Quarter

The Salesperson column label can be unchecked because the query question does not request any information about individual salespeople.

Pivot Tables in Excel - 4th Pivot Table Field Arrangement

The following pivot table is produced as a result of the Quarter and Product under Rows:

Pivot Tables in Excel - 4th Pivot Table

The following pivot chart more intuitively represents the data of this pivot table in the 3-D Stacked Column chart type:

Pivot Charts in Excel - Chart 4

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

Sunday, August 22, 2010

Use the Excel t Test To Find Out What the Best Days To Sell Are

The t Test in Excel

Can Determine What Your

Best Sales Days Are


It's always great to know what day of the week you can expect to get peak sales. The t test in Excel can provide that information for you. It's quite a simple test to run, as you will see. This blog article will walk you step-by-step through a t Test in Excel. The t Test compares two groups of samples and determines whether the mean of one sample is different than the other. Most types of t Tests require that each sample group have the same number of samples and also have the same variance. This Excel t Test has neither of those requirements.
 

Here is the scenario we are going to test: Suppose that you have been monitoring your daily sales for about a year. Your two best sales during the week are normally Monday and Wednesday. You would like to know which of those two days really does produce the best sales. You've tracked Monday sales for the last 40 weeks and Wednesday sales for the last 42 weeks. Mean sales for Wednesday is a bit higher than mean sales for Monday, but you would like to know with 95% certainty whether the difference in means is not just by chance and that Wednesday really is a better day for selling.

You can run your data through an Excel t Test and know within a minute whether Wednesday really is the best sales day. Excel has several built-in t Tests. The specific Excel t Test we will use is called the "t Test: Two Sample Assuming Unequal Variances." This t Test allows for two samples that have unequal sizes and variances. The only requirement is that both samples are Normally distributed. This will be discussed shortly. In Excel 2003, this test can be accessed through this menu path: Tools / Data Analysis / t Test: Two Sample Assuming Unequal Variances. Before we perform this t Test, we need to have a discussion of what the t Test is.


t Test - General Description

This test will tell you whether the difference between the before and after numbers is genuine or whether this difference could merely have been the result of chance. Overall a t-test compares two means and determines within a specified degree of certainty whether the two means really are different, or whether the difference might have occurred by chance.

t Test for Two Samples Having
Unequal Sizes and Variances

The t Test that can be applied to two samples with unequal sizes and unequal variances determines whether the means of both samples are the same.  In other words, this test evaluates within a specified degree of certainty whether the measured difference between the meaqns is real or could have occurred merely by chance.

Before we start discussing this specific test in detail, The t-test needs to be generally explained. The basic question to be answered is:


The t Test - What Is It?

The t test is a statistics test generally used to test whether means of populations are different. In the t test, a t value is calculated based upon the difference in the means and variances of the two populations. The greater the t value, the more certain it is that the means are different.

The t value can be generally described as follows:

t value = (Difference between the group means) / (Variability of the groups)

There are many variations of the t test. Each has its own specific formula for calculating a t value for the sampled data sets. All of the t value formulas can be described by the above formula.

The Higher the t Value - The More Likely the Groups Are Different

The higher the t value is, the more likely it is that the two means are different. If the two groups being compared have a high degree of variance (t value has a high denominator), it is much harder to tell them apart. On the other hand, if the two groups being compared have a low degree of variance (the t value has a low denominator), it is much easier to tell the two groups apart.

The Lower the Combined Variance, the Higher the t Value

The illustrations below should clarify how the degree of variance in the two groups determines how easy or difficult it is to state that the means of the two groups are really different. The t test quantifies this relationship and provides a way to determine whether the measured difference between two means can be considered real or not based upon the amount of variance in both groups. Here are illustrations that should clarify this relationship.
t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel

We can see that pair of data sets on the right are much easier to differentiate because they have much less overlap than the pair of data sets on the right. The overlap represents the overall variability between the two data sets in each pair. The higher the total variablility within the pair of data sets, the higher will be the denominator in the t value formula. The higher the denominator, the lower the t value for the pair of data sets. The lower the t value, the less likely it is that the two data sets are separate data sets with different means.



T-Test for Two Samples Having Unequal Sizes and Variances

The t Test for comparing two samples with unequal sizes and variance is a variation of the t Test called Welch's t Test. It is not the classic Student's t Test, which does not allow for samples having unqual variances.

We are going to use this t test to determine within 95% certainty whether the means sales from Wednesdays is different than the mean sales from Monday. We have measured from the last 42 Wednesdays and the last 40 Mondays and we will apply this Excel t test to determine whether the measured difference between the means is real or not.

A Little Bit More About This t Test

The t Test in general is a special case of one-way (sometimes called “single factor”) ANOVA. This paired two-sample student’s t test is applied when there is a natural pairing of samples. It is most often used to determine whether “before” and “after” means of a sample of the same objects have changed during an experiment. One really great thing about this t test is that the paired two-sample t test does not require that the variances of both populations to be the same.

Here is the formula to calculate the t value for a two-sample t test of unequal variances if you are testing to determine whether there difference between the two samples:

t value =      [ X1 - X2 ] / [ SQRT( (s1^2 / n1) + (s2^2/n2) ) ]

Degree of Freedom  = df =

[ [ (s1^2 / n1) + (s2^2 / n2) ]^2 ] / [ ( [ (s1^2 / n1)^2 ] / [n1 -1 ] ) + ( [ (s2^2 / n2)^2 ] / [n2 -1 ] ) ]

X1 and X2 are the sample means. s1 and s2 are the sample standard deviations, and n1 and n2 are sample sizes.

You can see that this follows the general formula for calculating the t value in a t test, which is:

t value = (Difference between the group means) / (Variability of the groups)


The t value is a specific point on the x-axis in the t distribution (student’s t distribution). If this t value falls outside the region of required certainty, it can be stated that the two means are probably different. If this t value falls within the region of required certainty, it cannot be stated that the two means are probably different.

The required region of certainty depends upon the degree of certainty required in the test. If 95% certainty is required, then the required region of certainty consists of 95% of the area under the student’s t distribution. The outer 5% is the region of uncertainty. This is also referred to as α (alpha) or the degree of significance. If the t value is large enough to be located all the way out on the x-axis in the 5% region of uncertainty, it can be stated within 95% certainty that the two means are different.

A t test can be a one-tailed test or a two-tailed test. A one-tailed test determines whether the means are different in one specific direction. For example, a one-tailed test could be used to determine only if the mean of the “after” measurements is greater than the mean of the “before” measurements. A two-tailed test determines whether the two means are merely different.



Two-Tailed t Test Is More Stringent


The two-tailed test is more stringent because the area in the outer tails outside of the region of required degree of certainty is split into two tails. For example, if the required degree of certainty is 95% on a two-tailed test, the calculated t value must be all the way out in the outer 2.5% of either tail for the t test to conclude within 95% certainty that the means are different.

One-Tailed t Test Is Less Stringent

A one-tailed test is less stringent. If the required degree of certainty is 95% on a one-tailed test, the calculated t value only has to be within the outer 5% of whatever tail is being tested to be able to state the two means are probably different.

Doing The Two-Sample t Test for Unequal Variances in Excel

We are testing to determine whether there really is a difference between mean sales on Monday and mean sales on Wednesday.

The data need to be arranged in Excel as follows:

t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel

Click on Image To See Enlarged View

The t Test we are about to use allows for different sample sizes and different variances, but that standard requirement for all t Tests is that both samples being compared are Normally distributed. There are a number of different ways of doing this. For brevity, we are going to do it the simplest possible way. We will make an Excel histogram of each sample's data and simple eyeball the shape of the histogram. If the shape of each histogram resembles the Normal curve, we will go with it.

There are a number of better ways of checking for Normality and here is a link to an article in this blog which describes how to do a simple but more accurate Excel Normality test called the Normal Probability Plot.

The Excel histogram is a simple thing to construct. If you haven't ever done one, here is a link to an article in this blog which shows how to create a histogram in Excel from sample data.
Completed histograms for each of the two samples are as follows:

histogram, t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel


histogram, t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel



Both histograms appear to be Normally distributed so we can use t Test to compare the two samples. If either sample is not Normally distributed, the t test cannot be used because the output is likely to be totally incorrect. If either sample is not Normally distributed, we must use a nonparametric test such as the Mann-Whitney U Test to compare the samples.

 
Before we run the t Test, we would like to take a look at a description of of each sample. In Exzcel 2003, this can be quickly done by the following tool: Tools / Data Analysis / Descriptive Statistics. The Descriptive Statistics for each sample are as follows:



t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel




Now, access this Excel t Test as follows (this is Excel 2003):

Tools / Data Analysis / t-Test: Two Sample Assuming Unequal Variances

This following dialogue box will appear:
t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel
Click on Image To See Enlarged View

Input the data as followings:

Variable 1 Range: Select everything that is highlighted yellow, including the label “Monday Sales.”

Variable 2 Range: Select everything that is highlighted tan, including the label “Wednesday Sales.”

Hypothesized Mean Difference: 0

Labels: Check the box because you included the labels for Variables 1 and 2.

Alpha: This depends on your desired degree of certainty. 0.05, if you desired 95% certainty. 0.20 if you desire 80% certainty.

Output Range: Select the cell that you want the upper left corner of the output to appear in.

Hit “OK” to run the analysis and the following Excel output appears:
t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel


Click on Image To See Enlarged View


This output can be interpreted as follows:

The t value is -6.088.

α = 0.05 = 1 - Required Degree of Certainty = 1 - 95%


p Value (1-Tailed) = 1.88E-08

p Value (2-Tailed) = 3.77E-08


One-tailed Test

This t value has a greater absolute value (6.088) than the critical t value for a one-tailed test (1.664). We can therefore state with 95% certainty that there really is a difference between Wednesday sales and Monday sales.

The above conclusion can also be reached because the p Value for the one-tailed test (highlighted in light red on the Excel output) is 1.88E-08. This is much less than alpha (0.05). The p Value being less than alpha is an equivalent result to the t value being greater than the t critical value.


Two-Tailed Test

The same result is arrived at for the two-tailed test. The two-tailed test is more stringent because the alpha region of uncertainty (5% of the area under the student’s t distribution curve) is now divided between both outer tails. The t value needs to be larger for the two-tailed test to wind up in the outer 2.5% area of either outer tail.

In this case, the t value was large enough to be positioned in the outer 2.5% of either outer tail. The absolute value of t value (6.088) is much larger than the critical t value for the two-tailed test (1.990). This indicates that it can be stated with 95% certainty that there really is a difference between Wednesday sales and Monday sales in this case as well.The p value calculated for the two-tailed test (3.77E-08) is much smaller than alpha (0.05). This is an equivalent result to the above.





Hand Calculation of the t Value and p Value

Let’s calculate the t value and p values for the one and two-tailed tests by hand to make sure that Excel has done a correct job. The t value is stated as the t statistic.

Here is the original test data Excel Descriptive Statistics: 
t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel

t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel

Click on Images To See Enlarged View


Here is the hand calculation of the t value and p values for the one and two-tailed tests for this Two-Sample t Test Assuming Unequal Variance. The hand calculations below of the t Value and p Values agree with the Excel outputs. There are very slight differences due to rounding differences:

t value = [ X1 - X2 ] / [ SQRT( (s1^2 / n1) + (s2^2/n2) ) ]


Degree of Freedom = df =

[ [ (s1^2 / n1) + (s2^2 / n2) ]^2 ] / [ ( [ (s1^2 / n1)^2 ] / [n1 -1 ] ) + ( [ (s2^2 / n2)^2 ] / [n2 -1 ] ) ]

The Degrees of Freedom calucation must be rounded to the nearest whole number, which in this case is 80.


X1 and X2 are the sample means. s1 and s2 are the sample standard deviations, and n1 and n2 are sample sizes.


p Value = TDIST ( T Statistic, df, Number of Tails )

Here are the actual calculations done by hand in Excel:
t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel


Click on Image To See Enlarged View

The Two Sample t Test Assuming Unequal Variances.is a very simple test to run in Excel and can be applied to nearly any aspect of your marketing program to see if one group of samples is different from another group of samples. One note: both sample groups must be continuous and measured using the using the same scale.


t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel
Please post any comments you have on this article. Your opinion is highly valued!


 

If You Like This, Then Share It...
t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel t tests in excel, t test, normal distribution, normally distributed, statistical teets in excel Technorati Reddit t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel t tests in excel, t test, normal distribution, normally distributed, statistical tests in excel


Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic