Saturday, October 30, 2010

Creating a Simple Moving Average in 3 Steps in Excel

Creating a Simple Moving

Average in 3 Steps in Excel

Simple Moving Average in Excel - 2-Interval, 3_Interval, 6-Interval Charts
(Click On Image To See a Larger Version)

This is one of the following three articles on Time Series Analysis in Excel

Forecasting With Exponential Smoothing in Excel

Forecasting With the Weighted Moving Average in Excel

Forecasting With the Simple Moving Average in Excel

Overview of the Moving Average

The moving average is a statistical technique used to smooth out short-term fluctuations in a series of data in order to more easily recognize longer-term trends or cycles. The moving average is sometimes referred to as a rolling average or a running average. A moving average is a series of numbers, each of which represents the average of an interval of specified number of previous periods. The larger the interval, the more smoothing occurs. The smaller the interval, the more that the moving average resembles the actual data series.

Moving averages perform the following three functions:

  1. Smoothing the data, which means to improve the fit of the data to a line.

  2. Reducing the effect of temporary variation and random noise.

  3. Highlighting outliers above or below the trend.

The moving average is one of the most widely used statistical techniques in industry to identify data trends. For example, sales managers commonly view three-month moving averages of sales data. The article will compare a two-month, three-month, and six-month simple moving averages of the same sale data. The moving average is used quite often in technical analysis of financial data such as stock returns and in economics to locate trends in macroeconomic time series such as employment.

There are a number of variations of the moving average. The most-commonly employed are the simple moving average, the weighted moving average, and the exponential moving average. Performing each of these techniques in Excel will be covered in detail in separate articles in this blog. Here is a brief overview of each of these three techniques.

Simple Moving Average

Every point in a simple moving average is the average of a specified number of previous periods. This blog article will provide a detailed explanation of the implementation of this technique in Excel.

Weighted Moving Average

Points in the weighted moving average also represent an average of a specified number of previous periods. The weighted moving average applies different weighting to certain previous periods; quite often the more recent periods are given greater weight. A link to another article in this blog which provides a detailed explanation of the implementation of this technique in Excel is as follows:

http://blog.excelmasterseries.com/2010/11/weighted-moving-average-accurate-simple.html

Exponential Moving Average

Points in the exponential moving average also represent an average of a specified number of previous periods. Exponential smoothing applies weighting factors to previous periods that decrease exponentially, never reaching zero. As a result exponential smoothing takes into account all previous periods instead of a designated number of previous periods that the weighted moving average does. A link to another article in this blog which provides a detailed explanation of the implementation of this technique in Excel is as follows:

http://blog.excelmasterseries.com/2010/11/excel-marketing-forecasting-technique-3.html

The following describes the 3-step process of creating a simple moving average of time-series data in Excel;

Step 1 – Graph the Original Data in a Time-Series Plot

The line chart is the most commonly-used Excel chart to graph time-series data. An example of such an Excel chart used to plot 13 periods of sales data is shown as follows:

Simple Moving Average in Excel - Raw Data
(Click On Image To See a Larger Version)

Step 2 – Create the Moving Average in Excel

Excel provides the Moving Average tool within the Data Analysis menu. The Moving Average tool creates a simple moving average from a data series.

Simple Moving Average in Excel - Data Analysis Tool
(Click On Image To See a Larger Version)

The Moving Average dialogue box should be filled out as follows in order to create a moving average of the previous 2 periods of data for each data point.

Simple Moving Average in Excel - 2-Interval Dialogue Box
(Click On Image To See a Larger Version)

The output of the 2-period moving average is shown as follows, along with the formulas that were used to calculate the value of each point in the moving average.

Simple Moving Average in Excel - Output
(Click On Image To See a Larger Version)

Step 3 – Add the Moving Average Series to the Chart

This data should now be added to the chart containing the original time line of sales data. The data will simply be added as one more data series in the chart. To do that, right-click anywhere on the chart and a menu will pop up. Hit Select Data to add the new series of data. The moving average series will be added by completing the Edit Series dialogue box as follows:

Simple Moving Average in Excel - Add Data Series to Chart0
(Click On Image To See a Larger Version)

Simple Moving Average in Excel - New Data Series Information
(Click On Image To See a Larger Version)

The chart containing the original data series and that data’s 2-interval simple moving average is shown as follows. Note that the moving average line is quite a bit smoother and raw data’s deviations above and below the trend line are much more apparent. The overall trend is now much more apparent as well.

Simple Moving Average in Excel - 2-Interval Chart
(Click On Image To See a Larger Version)

A 3-interval moving average can be created and placed on the chart using the same procedure as follows:

Simple Moving Average in Excel - 3-Interval Dialogue Box
(Click On Image To See a Larger Version)

Simple Moving Average in Excel - Output
(Click On Image To See a Larger Version)

It is interesting to note that the 2-interval simple moving average creates a smoother graph than the3-interval simple moving average. In this case the 2-interval simple moving average might be the more desirable than the 3-interval moving average.

Simple Moving Average in Excel - 3-Interval Chart
(Click On Image To See a Larger Version)

For comparison, a 6-interval simple moving average will be calculated and added to the chart in the same way as follows:

Simple Moving Average in Excel - 6-Interval Dialogue Box
(Click On Image To See a Larger Version)

Simple Moving Average in Excel - Output
(Click On Image To See a Larger Version)

Simple Moving Average in Excel - 2-Interval, 3-Interval, 6-Interval Chart
(Click On Image To See a Larger Version)

As expected,the 6-interval simple moving average is significantly smoother than the 2 or 3-interval simple moving averages. A smoother graph more closely fits a straight line.

Analyzing Forecast Accuracy

Accuracy can be described as goodness of fit. The two components of forecast accuracy are the following:

Forecast Bias – The tendency of a forecast to be consistently higher or lower than actual values of a time series. Forecast bias is the sum of all error divided by the number of periods as follows:

Bias = ∑Et/n = ∑(Yt-act – Yt-est)/n

A positive bias indicates a tendency to under-forecast. A negative bias indicates a tendency to over-forecast. Bias does not measure accuracy because positive and negative error cancel each other out.

Forecast Error – The difference between actual values of a time series and the predicted values of the forecast. The most common measures of forecast error are the following:

MAD – Mean Absolute Deviation

MAD calculates the average absolute value of the error and is computed with the following formula:

MAD = ∑ |Et| / n = ∑ |(Yt-act – Yt-est)| / n

Averaging the absolute values of the errors eliminates the canceling effect of positive and negative errors. The smaller the MAD, the better the model is.

MSE – Mean Squared Error

MSE is a popular measure of error that eliminates the cancelling effect of positive and negative errors by summing the squares of the error with the following formula:

MSE = ∑ Et2 / n = ∑ (Yt-act – Yt-est)2 / n

Large error terms tend to exaggerate MSE because the error terms are all squared. RMSE (Root Square Mean) reduces this problem by taking the square root of MSE.

MAPE – Mean Absolute Percent Error

MAPE also eliminates the cancelling effect of positive and negative errors by summing the absolute values of the error terms. MAPE calculates the sum of the percent error terms with the following formula:

MAPE = ∑ ( |Et| / Yt-act ) * 100% / n = ∑ ( |(Yt-act – Yt-est)| / Yt-act ) * 100% / n

By summing percent error terms, MAPE can be used to compare forecasting models that use different scales of measurement.

Calculating Bias, MAD, MSE, RMSE, and MAPE in Excel For the Simple Moving Average

Bias, MAD, MSE, RMSE, and MAPE will be calculated in Excel to evaluate the 2-interval, 3-interval, and 6-interval simple moving average forecast obtained in this article and shown as follows:



Simple Moving Average in Excel - Simple Moving Averages
(Click On Image To See a Larger Version)

The first step is to calculate Et, Et2, |Et|, |Et| / Yt-act , and then sum them as follows:

Simple Moving Average in Excel - 2-Interval Initial Calculations
(Click On Image To See a Larger Version)

Bias, MAD, MSE, MAPE and RMSE can be calculated as follows:

Simple Moving Average in Excel - 2-Interval Final Calculations
(Click On Image To See a Larger Version)

The same calculations are now performed to calculate Bias, MAD, MSE, MAPE and RMSE for the 3-interval simple moving average.

Simple_Mov_Av_15_3-Interval_Initial_Calculations_600
(Click On Image To See a Larger Version)

Simple_Mov_Av_16_3-Interval_Final_Calculations_600
(Click On Image To See a Larger Version)

The same calculations are now performed to calculate Bias, MAD, MSE, MAPE and RMSE for the 6-interval simple moving average.

Simple_Mov_Av_17_6-Interval_Initial_Calculations_600
(Click On Image To See a Larger Version)

Simple_Mov_Av_18_6-Interval_Final_Calculations_600
(Click On Image To See a Larger Version)

Bias, MAD, MSE, MAPE and RMSE are summarized for the 2-interval, 3-interval, and 6-interval simple moving averages as follows. The 3-interval simple moving average is the model that most closely fits that actual data.

Simple Moving Average in Excel - Error Summarization
(Click On Image To See a Larger Version)

 Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic

 

Monday, October 18, 2010

Taguchi Testing - Is It Good For Landing Page Optimization?

Taguchi Testing 

Is It Good For Landing

Page Optimization?



The short answer is: Taguchi Testing is Fractional Factorial Multivariate Testing and is therefore not a good tool for landing page optimization. Let's elaborate on that.

What Is Taguchi Testing?

Taguchi Testing is a variation of fractional factorial multivariate testing that was developed in the 1950's and 1960's by a Japanese mathematician named Genichi Taguchi. His testing methods, which were originally developed to improve manufacturing quality control, have gained popularity in the field of landing page optimization.

Multivariate tests are simply tests that have more than one input variable. Full factorial tests are tests that analyze every possible combination of inputs variables. Fractional factorial tests attempt to isolate and test only the subset of inputs that are deemed in advance to be important to the output. The Taguchi method is a set of partial factorial techniques that attempt to determine the best combination of attributes in the presence of a lot of variance or noise.

As mentioned, the Taguchi method was developed for use in manufacturing quality control. The Taguchi method has now been adopted as a tool for landing page optimization. The differences between a manufacturing floor and a landing page make Taguchi testing an incorrect tool for landing page optimization. This will be examined later in this article.  


Full Factorial Testing

The most well-known landing page optimization tool that uses full factorial testing is the Google Web Site Optimizer. This is probably the best free tool available for multivariate testing (testing lots of factors at once) on landing pages. The Google Web Site Optimizer will statistically determine which combination of landing page variables are most likely to produce the highest number of conversions. The Google Web Site Optimizer in its current state does not analyze interactions between the landing page variables being tested but only tells how each tested combination of variables performed in comparison with all others.

The Google Web Site Optimizer is also a great tool for A/B split-testing. This involves testing only one variable. Typically only two variations of one variable are being tested against each other. One variation is declared the winner when the Web Site Optimizer calculates that it has achieved an established percent certainty that it converts better than the opponent. For example, we might declare a variation to be the winner as soon as we are 80% certain that this variation outperforms its opponent.

For A/B split-testing, I prefer to use an Excel model that performs the same statistical test as the Google Web Site Optimizer (a one-tailed, two-sample, unpaired hypothesis test of proportion) but doesn't require any of the set-up steps that the Google Web Site Optimizer does. Here is a link to a blog article describing this Excel model with a video showing its use.


Taguchi Method Drawbacks

Taguchi's method of fractional factorial testing for landing page optimization has major drawbacks compared to full factorial landing page testing. They are as follows:

1) Fractional factorial methods assume that interactions between variables do not exist. This assumption is totally invalid for landing pages. Very strong variable interactions normally do exist on landing pages. For example, any landing page designer knows that a mismatch between a landing page headline and the body text will wipe out conversions.

Typically you will find lower order interactions occurring on landing pages. Lower order interactions are interactions that occur between a small number of variables, usually 2 or 3. Higher order interactions (interactions between more than 3 variables) are less common and usually much less important. In order for an interaction to be material and important, one of the factors usually has to be significant on its own   If you ignore interactions during landing page optimization, you will most likely not get the best results.

2) Fractional Factorial methods can only be used to test a small number of landing page combinations simultaneously. Typically the upper limit of the number of separate landing page combinations that can be tested simulataneously using fractional factorial methods is several hundred. Brainstorming marketers will quickly hit this limit after coming up with just a few factors and a couple of variations of each factor. 

Some landing page optimization terminology should be presented here. A variable or factor is an element on the landing page that you are varying during the test. A value is one of the states that a variable or factor (these two terms both mean the same) can take during your test. The branching factor is the number of values that an single variable or factor can take. Each variable has its own specific branching factor. A recipe is a unique combination of variable values available for a test. Another way of expressing this point (#2) would be to say "Fractional Factorial methods can only be used to test small number of recipes simultaneously."

3) Fractional Factorial methods are highly restrictive to test design. Fractional factorial methods do not allow the test designer much freedom when choosing the number of variables or the branching factor for each variable. The Taguchi method uses a matrix structure that works with less than two dozen very specific combinations of number of factors and branching level for the factors. The test designer must construct the test using one of those combinations of factor levels and branching factors. Full factorial methods have none of these restrictions.

4) Fractional Factorial methods require guessing at which factors to include in test. The restrictive nature of Fractional Factorial test design requires that the test designer pick the factors that he or she believes to be most important. The individual biases of the test designer will affect the selection of factors to include in the test.

5) Allocating more bandwidth to the baseline is not possible with Taguchi. The baseline is the current recipe that we are trying to beat with new recipes. It is very important that measurements of the baseline be valid because these measurements are the basis for comparison against results obtained for each recipe tested. To ensure validity of the baseline's measurements, it is a good idea to allocate at least 15% data collection (bandwidth) to sampling the baseline recipe. This type of data throttling is not possible with Fractional Factorial methods such as Taguchi. It is easily done with Full Factorial test methods.


Reasons for the Taguchi Mismatch

Genichi Taguchi developed his testing methods in the 1950's and 1960's to improve quality control on the manufacturing environment. His methods have become popular today in the field of landing page testing. The differences between manufacturing environment, for which the Taguchi method was intended, and today's landing page environment  create the mismatch that makes the Taguchi not the best choice for landing page optimization. Here are the main reasons for the mismatch:

1) Expensive manufacturing prototypes vs. free landing page prototypes. Retooling a production line for a new recipe is expensive. One of the major goals of Taguchi was to keep testing cost down by reducing the number of recipes to a minimum. In landing page testing, there is no additional cost to create more recipes (new variations of a landing page that will be shown to site visitors).

2) Manufacturing costs require a small test sizes vs. unlimited landing page test sizes. The high costs of manufacturing prototypes made small test sizes necessary. The Taguchi method keeps test size small by guessing at and testing only the most important factors. On the other hand, Full Factorial landing page testing methods and the low cost of creating new landing page recipes enables simultaneous testing of millions of recipes.

3) Small manufacturing test sizes could not test, and therefore did not assume, interaction between variables. Landing pages are known to have very strong interactions between variables. The Taguchi method was designed to assume no interaction between variables. That assumption can easily lead to incorrect results during landing page testing.

4) Manufacturing environment tests are smaller because statistical significance is normally reached quicker. Landing pages normally have low conversion rates and therefore require much larger test sizes to reach statistical significance. Manufacturing environment tests normally are designed to have a high probability of success. Landing page success rates (conversion rates) are typically below 1%.

5) Manufacturing test data is often continuous vs. Landing page data which is discrete and unrelated. Continuous data allows the test researcher to take a smaller number of samples and interpolate results for intermediate data points that were not collected.
 The possibility of interpolating continuous variable test results allows for smaller test sizes. Landing page variables are typically discrete, unrelated choices and therefore and do not allow interpolation for intermediate data ranges that were not collected.


Summary

Taguchi testing's origins in the manufacturing environment make it not the best tool for landing page optimization. Full Factorial methods should be used whenever possible to account for variable interaction and to allow for the widest possible number of recipes being tested.

Excel Master Series Blog Directory

Statistical Topics and Articles In Each Topic