Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
I have 5 independent datasets that I am analyzing with descriptive statistics
and Gaussian (normal) distributions to search for outliers. During the process, I decided to see if there is a way to €śtest€ť a sample for €śnormality€ť since Im getting some graphs that do not represent normal distributions. I want to know why. So, I found the functions KURT and SKEW, but I need some advice about their significance. I do not understand the values I am getting from these functions and the help files in Excel 2007 are not very specific. (I wish they could expand more the description listed under €śExcel Home Function reference Statistical€ť). This is what they say: // KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. Number1, number2, ... are 1 to 255 arguments for which you want to calculate kurtosis. You can also use a single array or a reference to an array instead of arguments separated by commas. // // SKEW(number1,number2,...) Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values. Number1, number2 ... are 1 to 255 arguments for which you want to calculate skewness. You can also use a single array or a reference to an array instead of arguments separated by commas. // For instance: My datasets range from 75,00 entries to 13,845 entries. I am getting a Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis = 0.59906 and Skewness = 0.91872 for another one, but I dont know what they mean. Can anybody help me to discuss/explain/expand these values? |
#2
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Wikipedia has a pretty good discussion of kurtosis. See, especially, the
section "Kurtosis of well known distributions": http://en.wikipedia.org/wiki/Kurtosis (standard disclaimers about relying on Wikipedia apply). A normal distribution should have a kurtosis of 0. A normal distribution should also have a skew of 0. So, based on the results you give, none of your distributions are normal. But they don't seem to be either extremely peaked or extremely skewed. Positive numbers for SKEW indicate that the distribution is skewed to the right and negative skew indicates that the distribution is skewed to the left. Does this help give context to the numbers you're getting? Dave -- A hint to posters: Specific, detailed questions are more likely to be answered than questions that provide no detail about your problem. "Quco" wrote: I have 5 independent datasets that I am analyzing with descriptive statistics and Gaussian (normal) distributions to search for outliers. During the process, I decided to see if there is a way to €śtest€ť a sample for €śnormality€ť since Im getting some graphs that do not represent normal distributions. I want to know why. So, I found the functions KURT and SKEW, but I need some advice about their significance. I do not understand the values I am getting from these functions and the help files in Excel 2007 are not very specific. (I wish they could expand more the description listed under €śExcel Home Function reference Statistical€ť). This is what they say: // KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. Number1, number2, ... are 1 to 255 arguments for which you want to calculate kurtosis. You can also use a single array or a reference to an array instead of arguments separated by commas. // // SKEW(number1,number2,...) Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values. Number1, number2 ... are 1 to 255 arguments for which you want to calculate skewness. You can also use a single array or a reference to an array instead of arguments separated by commas. // For instance: My datasets range from 75,00 entries to 13,845 entries. I am getting a Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis = 0.59906 and Skewness = 0.91872 for another one, but I dont know what they mean. Can anybody help me to discuss/explain/expand these values? |
#3
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Thank you for your super-fast response Dave!
Now, before I posted this today, I did some research on the internet and I did found the same document in Wikipedia. I wish it could be easier to determine if there is a limit for this values, let's say -1 to +1, where 0 is the ideal value for the normal distribution. The problem is, what can I conclude when I see things like Kurtosis = 2.94948 and Skewness = 1.01299, Kurtosis = 3.72673 and Skewness = 1.81848, etc... the purpose of this project is to test for normality as we move forward identifying and removing outliers (the given values do have outliers present). In theory, we should get a nearly close normal distribution as we remove outliers after careful examination. One more thing I am concerned of, is the Excel help file when it is limiting the values to n=255. If I have samples much greater than that, are the functions just taking the first 255 values??? that's a little bit confusing. Finally, is there any other rigorous test (from the ones listed in Wikipedia) built-in in Excel? "Dave F" wrote: Wikipedia has a pretty good discussion of kurtosis. See, especially, the section "Kurtosis of well known distributions": http://en.wikipedia.org/wiki/Kurtosis (standard disclaimers about relying on Wikipedia apply). A normal distribution should have a kurtosis of 0. A normal distribution should also have a skew of 0. So, based on the results you give, none of your distributions are normal. But they don't seem to be either extremely peaked or extremely skewed. Positive numbers for SKEW indicate that the distribution is skewed to the right and negative skew indicates that the distribution is skewed to the left. Does this help give context to the numbers you're getting? Dave -- A hint to posters: Specific, detailed questions are more likely to be answered than questions that provide no detail about your problem. "Quco" wrote: I have 5 independent datasets that I am analyzing with descriptive statistics and Gaussian (normal) distributions to search for outliers. During the process, I decided to see if there is a way to €śtest€ť a sample for €śnormality€ť since Im getting some graphs that do not represent normal distributions. I want to know why. So, I found the functions KURT and SKEW, but I need some advice about their significance. I do not understand the values I am getting from these functions and the help files in Excel 2007 are not very specific. (I wish they could expand more the description listed under €śExcel Home Function reference Statistical€ť). This is what they say: // KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. Number1, number2, ... are 1 to 255 arguments for which you want to calculate kurtosis. You can also use a single array or a reference to an array instead of arguments separated by commas. // // SKEW(number1,number2,...) Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values. Number1, number2 ... are 1 to 255 arguments for which you want to calculate skewness. You can also use a single array or a reference to an array instead of arguments separated by commas. // For instance: My datasets range from 75,00 entries to 13,845 entries. I am getting a Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis = 0.59906 and Skewness = 0.91872 for another one, but I dont know what they mean. Can anybody help me to discuss/explain/expand these values? |
#4
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
All good questions. If normal distribution should have a kurtosis and
skewness of 0, and your kurtosis and skewness numbers are not 0, or close to it, then you don't have a normal distribution. The greater the absolute value of your kurtosis and skewness numbers are, the more peaked/skewed your distributions are. As to Excel's limits for these two functions: I don't know the answer to that. Maybe someone else on these discussion groups does. One thing to consider is that there are software packages out there designed specifically for statistical analysis, which may be more accurate/powerful than Excel. Dave -- A hint to posters: Specific, detailed questions are more likely to be answered than questions that provide no detail about your problem. "Quco" wrote: Thank you for your super-fast response Dave! Now, before I posted this today, I did some research on the internet and I did found the same document in Wikipedia. I wish it could be easier to determine if there is a limit for this values, let's say -1 to +1, where 0 is the ideal value for the normal distribution. The problem is, what can I conclude when I see things like Kurtosis = 2.94948 and Skewness = 1.01299, Kurtosis = 3.72673 and Skewness = 1.81848, etc... the purpose of this project is to test for normality as we move forward identifying and removing outliers (the given values do have outliers present). In theory, we should get a nearly close normal distribution as we remove outliers after careful examination. One more thing I am concerned of, is the Excel help file when it is limiting the values to n=255. If I have samples much greater than that, are the functions just taking the first 255 values??? that's a little bit confusing. Finally, is there any other rigorous test (from the ones listed in Wikipedia) built-in in Excel? "Dave F" wrote: Wikipedia has a pretty good discussion of kurtosis. See, especially, the section "Kurtosis of well known distributions": http://en.wikipedia.org/wiki/Kurtosis (standard disclaimers about relying on Wikipedia apply). A normal distribution should have a kurtosis of 0. A normal distribution should also have a skew of 0. So, based on the results you give, none of your distributions are normal. But they don't seem to be either extremely peaked or extremely skewed. Positive numbers for SKEW indicate that the distribution is skewed to the right and negative skew indicates that the distribution is skewed to the left. Does this help give context to the numbers you're getting? Dave -- A hint to posters: Specific, detailed questions are more likely to be answered than questions that provide no detail about your problem. "Quco" wrote: I have 5 independent datasets that I am analyzing with descriptive statistics and Gaussian (normal) distributions to search for outliers. During the process, I decided to see if there is a way to €śtest€ť a sample for €śnormality€ť since Im getting some graphs that do not represent normal distributions. I want to know why. So, I found the functions KURT and SKEW, but I need some advice about their significance. I do not understand the values I am getting from these functions and the help files in Excel 2007 are not very specific. (I wish they could expand more the description listed under €śExcel Home Function reference Statistical€ť). This is what they say: // KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. Number1, number2, ... are 1 to 255 arguments for which you want to calculate kurtosis. You can also use a single array or a reference to an array instead of arguments separated by commas. // // SKEW(number1,number2,...) Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values. Number1, number2 ... are 1 to 255 arguments for which you want to calculate skewness. You can also use a single array or a reference to an array instead of arguments separated by commas. // For instance: My datasets range from 75,00 entries to 13,845 entries. I am getting a Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis = 0.59906 and Skewness = 0.91872 for another one, but I dont know what they mean. Can anybody help me to discuss/explain/expand these values? |
#5
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Biometrika Tables for Statisticians, vol 1, Tables 34B and 34C give critical
values for testing normality with the sample skewness and sample kurtosis. The largest sample size in the table is 2000, where |Kurtosis|0.3 represents a statistically significant departure from normality. However, with sample sizes in your range, you have tremendous power to detect minor departures of little or no practical meaning. Unless there is a useful alternative model for the analysis, I would tend to look a probability plot http://www.usfca.edu/~middleton/demand.pdf instead of a hypothesis test to assess normality. Jerry "Quco" wrote: Thank you for your super-fast response Dave! Now, before I posted this today, I did some research on the internet and I did found the same document in Wikipedia. I wish it could be easier to determine if there is a limit for this values, let's say -1 to +1, where 0 is the ideal value for the normal distribution. The problem is, what can I conclude when I see things like Kurtosis = 2.94948 and Skewness = 1.01299, Kurtosis = 3.72673 and Skewness = 1.81848, etc... the purpose of this project is to test for normality as we move forward identifying and removing outliers (the given values do have outliers present). In theory, we should get a nearly close normal distribution as we remove outliers after careful examination. One more thing I am concerned of, is the Excel help file when it is limiting the values to n=255. If I have samples much greater than that, are the functions just taking the first 255 values??? that's a little bit confusing. Finally, is there any other rigorous test (from the ones listed in Wikipedia) built-in in Excel? "Dave F" wrote: Wikipedia has a pretty good discussion of kurtosis. See, especially, the section "Kurtosis of well known distributions": http://en.wikipedia.org/wiki/Kurtosis (standard disclaimers about relying on Wikipedia apply). A normal distribution should have a kurtosis of 0. A normal distribution should also have a skew of 0. So, based on the results you give, none of your distributions are normal. But they don't seem to be either extremely peaked or extremely skewed. Positive numbers for SKEW indicate that the distribution is skewed to the right and negative skew indicates that the distribution is skewed to the left. Does this help give context to the numbers you're getting? Dave -- A hint to posters: Specific, detailed questions are more likely to be answered than questions that provide no detail about your problem. "Quco" wrote: I have 5 independent datasets that I am analyzing with descriptive statistics and Gaussian (normal) distributions to search for outliers. During the process, I decided to see if there is a way to €śtest€ť a sample for €śnormality€ť since Im getting some graphs that do not represent normal distributions. I want to know why. So, I found the functions KURT and SKEW, but I need some advice about their significance. I do not understand the values I am getting from these functions and the help files in Excel 2007 are not very specific. (I wish they could expand more the description listed under €śExcel Home Function reference Statistical€ť). This is what they say: // KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. Number1, number2, ... are 1 to 255 arguments for which you want to calculate kurtosis. You can also use a single array or a reference to an array instead of arguments separated by commas. // // SKEW(number1,number2,...) Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values. Number1, number2 ... are 1 to 255 arguments for which you want to calculate skewness. You can also use a single array or a reference to an array instead of arguments separated by commas. // For instance: My datasets range from 75,00 entries to 13,845 entries. I am getting a Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis = 0.59906 and Skewness = 0.91872 for another one, but I dont know what they mean. Can anybody help me to discuss/explain/expand these values? |
#6
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Thank you all for your help and references. I am starting to think if I am
using the correct approach, but before I explain this further, I have to say that a professor from the University of Hawai at Manoa has an in-depth explanation about Excel's Skewness and Kurtosis funcions in this website: http://jalt.org/test/bro_1.htm His article brings more significanse to the values we get from Excel. Now, what am I trying to do in this project is: I have two different sets of elevations represented by a series of points (13814 total). One set came from ground measurements, and another set came from aerial measurements. The datasets I am using to analyse the data are simply the absolute value of the differences between this two sets of elevations. I then subdivide them in four different groups to get descriptive statistics plus the Skewness and Kurtosis statistics. The groups a points falling on hard surfaces (3026 total), points falling on short grass (7645 total), points falling on tall grass (1583 total) and points falling on forested areas and canopies (1560 total). I would expect smaller differences in the first two groups, and greater differences in the last two groups, but that's not always the case because we have some outliers. I am trying to compare the distributions for each group with a normal distribution using the KURT and SKEW functions to come out with a firm and solid explanation. But now I am not sure if I am using the correct approach to do this. "Jerry W. Lewis" wrote: Biometrika Tables for Statisticians, vol 1, Tables 34B and 34C give critical values for testing normality with the sample skewness and sample kurtosis. The largest sample size in the table is 2000, where |Kurtosis|0.3 represents a statistically significant departure from normality. However, with sample sizes in your range, you have tremendous power to detect minor departures of little or no practical meaning. Unless there is a useful alternative model for the analysis, I would tend to look a probability plot http://www.usfca.edu/~middleton/demand.pdf instead of a hypothesis test to assess normality. Jerry "Quco" wrote: Thank you for your super-fast response Dave! Now, before I posted this today, I did some research on the internet and I did found the same document in Wikipedia. I wish it could be easier to determine if there is a limit for this values, let's say -1 to +1, where 0 is the ideal value for the normal distribution. The problem is, what can I conclude when I see things like Kurtosis = 2.94948 and Skewness = 1.01299, Kurtosis = 3.72673 and Skewness = 1.81848, etc... the purpose of this project is to test for normality as we move forward identifying and removing outliers (the given values do have outliers present). In theory, we should get a nearly close normal distribution as we remove outliers after careful examination. One more thing I am concerned of, is the Excel help file when it is limiting the values to n=255. If I have samples much greater than that, are the functions just taking the first 255 values??? that's a little bit confusing. Finally, is there any other rigorous test (from the ones listed in Wikipedia) built-in in Excel? "Dave F" wrote: Wikipedia has a pretty good discussion of kurtosis. See, especially, the section "Kurtosis of well known distributions": http://en.wikipedia.org/wiki/Kurtosis (standard disclaimers about relying on Wikipedia apply). A normal distribution should have a kurtosis of 0. A normal distribution should also have a skew of 0. So, based on the results you give, none of your distributions are normal. But they don't seem to be either extremely peaked or extremely skewed. Positive numbers for SKEW indicate that the distribution is skewed to the right and negative skew indicates that the distribution is skewed to the left. Does this help give context to the numbers you're getting? Dave -- A hint to posters: Specific, detailed questions are more likely to be answered than questions that provide no detail about your problem. "Quco" wrote: I have 5 independent datasets that I am analyzing with descriptive statistics and Gaussian (normal) distributions to search for outliers. During the process, I decided to see if there is a way to €śtest€ť a sample for €śnormality€ť since Im getting some graphs that do not represent normal distributions. I want to know why. So, I found the functions KURT and SKEW, but I need some advice about their significance. I do not understand the values I am getting from these functions and the help files in Excel 2007 are not very specific. (I wish they could expand more the description listed under €śExcel Home Function reference Statistical€ť). This is what they say: // KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. Number1, number2, ... are 1 to 255 arguments for which you want to calculate kurtosis. You can also use a single array or a reference to an array instead of arguments separated by commas. // // SKEW(number1,number2,...) Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values. Number1, number2 ... are 1 to 255 arguments for which you want to calculate skewness. You can also use a single array or a reference to an array instead of arguments separated by commas. // For instance: My datasets range from 75,00 entries to 13,845 entries. I am getting a Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis = 0.59906 and Skewness = 0.91872 for another one, but I dont know what they mean. Can anybody help me to discuss/explain/expand these values? |
#7
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
I would tend to expect the differences (instead of the absolute differences)
to be normally distributed. If the differences are normally distributed, then the absolute differences whould tend to have a skewness around 0.995, and a kurtosis around 0.869. Jerry "Quco" wrote: Thank you all for your help and references. I am starting to think if I am using the correct approach, but before I explain this further, I have to say that a professor from the University of Hawai at Manoa has an in-depth explanation about Excel's Skewness and Kurtosis funcions in this website: http://jalt.org/test/bro_1.htm His article brings more significanse to the values we get from Excel. Now, what am I trying to do in this project is: I have two different sets of elevations represented by a series of points (13814 total). One set came from ground measurements, and another set came from aerial measurements. The datasets I am using to analyse the data are simply the absolute value of the differences between this two sets of elevations. I then subdivide them in four different groups to get descriptive statistics plus the Skewness and Kurtosis statistics. The groups a points falling on hard surfaces (3026 total), points falling on short grass (7645 total), points falling on tall grass (1583 total) and points falling on forested areas and canopies (1560 total). I would expect smaller differences in the first two groups, and greater differences in the last two groups, but that's not always the case because we have some outliers. I am trying to compare the distributions for each group with a normal distribution using the KURT and SKEW functions to come out with a firm and solid explanation. But now I am not sure if I am using the correct approach to do this. "Jerry W. Lewis" wrote: Biometrika Tables for Statisticians, vol 1, Tables 34B and 34C give critical values for testing normality with the sample skewness and sample kurtosis. The largest sample size in the table is 2000, where |Kurtosis|0.3 represents a statistically significant departure from normality. However, with sample sizes in your range, you have tremendous power to detect minor departures of little or no practical meaning. Unless there is a useful alternative model for the analysis, I would tend to look a probability plot http://www.usfca.edu/~middleton/demand.pdf instead of a hypothesis test to assess normality. Jerry "Quco" wrote: Thank you for your super-fast response Dave! Now, before I posted this today, I did some research on the internet and I did found the same document in Wikipedia. I wish it could be easier to determine if there is a limit for this values, let's say -1 to +1, where 0 is the ideal value for the normal distribution. The problem is, what can I conclude when I see things like Kurtosis = 2.94948 and Skewness = 1.01299, Kurtosis = 3.72673 and Skewness = 1.81848, etc... the purpose of this project is to test for normality as we move forward identifying and removing outliers (the given values do have outliers present). In theory, we should get a nearly close normal distribution as we remove outliers after careful examination. One more thing I am concerned of, is the Excel help file when it is limiting the values to n=255. If I have samples much greater than that, are the functions just taking the first 255 values??? that's a little bit confusing. Finally, is there any other rigorous test (from the ones listed in Wikipedia) built-in in Excel? "Dave F" wrote: Wikipedia has a pretty good discussion of kurtosis. See, especially, the section "Kurtosis of well known distributions": http://en.wikipedia.org/wiki/Kurtosis (standard disclaimers about relying on Wikipedia apply). A normal distribution should have a kurtosis of 0. A normal distribution should also have a skew of 0. So, based on the results you give, none of your distributions are normal. But they don't seem to be either extremely peaked or extremely skewed. Positive numbers for SKEW indicate that the distribution is skewed to the right and negative skew indicates that the distribution is skewed to the left. Does this help give context to the numbers you're getting? Dave -- A hint to posters: Specific, detailed questions are more likely to be answered than questions that provide no detail about your problem. "Quco" wrote: I have 5 independent datasets that I am analyzing with descriptive statistics and Gaussian (normal) distributions to search for outliers. During the process, I decided to see if there is a way to €śtest€ť a sample for €śnormality€ť since Im getting some graphs that do not represent normal distributions. I want to know why. So, I found the functions KURT and SKEW, but I need some advice about their significance. I do not understand the values I am getting from these functions and the help files in Excel 2007 are not very specific. (I wish they could expand more the description listed under €śExcel Home Function reference Statistical€ť). This is what they say: // KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. Number1, number2, ... are 1 to 255 arguments for which you want to calculate kurtosis. You can also use a single array or a reference to an array instead of arguments separated by commas. // // SKEW(number1,number2,...) Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values. Number1, number2 ... are 1 to 255 arguments for which you want to calculate skewness. You can also use a single array or a reference to an array instead of arguments separated by commas. // For instance: My datasets range from 75,00 entries to 13,845 entries. I am getting a Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis = 0.59906 and Skewness = 0.91872 for another one, but I dont know what they mean. Can anybody help me to discuss/explain/expand these values? |
#8
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
are those values you would expect from a typical gaussian (normal)
distribution? "Jerry W. Lewis" wrote: I would tend to expect the differences (instead of the absolute differences) to be normally distributed. If the differences are normally distributed, then the absolute differences whould tend to have a skewness around 0.995, and a kurtosis around 0.869. Jerry "Quco" wrote: Thank you all for your help and references. I am starting to think if I am using the correct approach, but before I explain this further, I have to say that a professor from the University of Hawai at Manoa has an in-depth explanation about Excel's Skewness and Kurtosis funcions in this website: http://jalt.org/test/bro_1.htm His article brings more significanse to the values we get from Excel. Now, what am I trying to do in this project is: I have two different sets of elevations represented by a series of points (13814 total). One set came from ground measurements, and another set came from aerial measurements. The datasets I am using to analyse the data are simply the absolute value of the differences between this two sets of elevations. I then subdivide them in four different groups to get descriptive statistics plus the Skewness and Kurtosis statistics. The groups a points falling on hard surfaces (3026 total), points falling on short grass (7645 total), points falling on tall grass (1583 total) and points falling on forested areas and canopies (1560 total). I would expect smaller differences in the first two groups, and greater differences in the last two groups, but that's not always the case because we have some outliers. I am trying to compare the distributions for each group with a normal distribution using the KURT and SKEW functions to come out with a firm and solid explanation. But now I am not sure if I am using the correct approach to do this. "Jerry W. Lewis" wrote: Biometrika Tables for Statisticians, vol 1, Tables 34B and 34C give critical values for testing normality with the sample skewness and sample kurtosis. The largest sample size in the table is 2000, where |Kurtosis|0.3 represents a statistically significant departure from normality. However, with sample sizes in your range, you have tremendous power to detect minor departures of little or no practical meaning. Unless there is a useful alternative model for the analysis, I would tend to look a probability plot http://www.usfca.edu/~middleton/demand.pdf instead of a hypothesis test to assess normality. Jerry "Quco" wrote: Thank you for your super-fast response Dave! Now, before I posted this today, I did some research on the internet and I did found the same document in Wikipedia. I wish it could be easier to determine if there is a limit for this values, let's say -1 to +1, where 0 is the ideal value for the normal distribution. The problem is, what can I conclude when I see things like Kurtosis = 2.94948 and Skewness = 1.01299, Kurtosis = 3.72673 and Skewness = 1.81848, etc... the purpose of this project is to test for normality as we move forward identifying and removing outliers (the given values do have outliers present). In theory, we should get a nearly close normal distribution as we remove outliers after careful examination. One more thing I am concerned of, is the Excel help file when it is limiting the values to n=255. If I have samples much greater than that, are the functions just taking the first 255 values??? that's a little bit confusing. Finally, is there any other rigorous test (from the ones listed in Wikipedia) built-in in Excel? "Dave F" wrote: Wikipedia has a pretty good discussion of kurtosis. See, especially, the section "Kurtosis of well known distributions": http://en.wikipedia.org/wiki/Kurtosis (standard disclaimers about relying on Wikipedia apply). A normal distribution should have a kurtosis of 0. A normal distribution should also have a skew of 0. So, based on the results you give, none of your distributions are normal. But they don't seem to be either extremely peaked or extremely skewed. Positive numbers for SKEW indicate that the distribution is skewed to the right and negative skew indicates that the distribution is skewed to the left. Does this help give context to the numbers you're getting? Dave -- A hint to posters: Specific, detailed questions are more likely to be answered than questions that provide no detail about your problem. "Quco" wrote: I have 5 independent datasets that I am analyzing with descriptive statistics and Gaussian (normal) distributions to search for outliers. During the process, I decided to see if there is a way to €śtest€ť a sample for €śnormality€ť since Im getting some graphs that do not represent normal distributions. I want to know why. So, I found the functions KURT and SKEW, but I need some advice about their significance. I do not understand the values I am getting from these functions and the help files in Excel 2007 are not very specific. (I wish they could expand more the description listed under €śExcel Home Function reference Statistical€ť). This is what they say: // KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. Number1, number2, ... are 1 to 255 arguments for which you want to calculate kurtosis. You can also use a single array or a reference to an array instead of arguments separated by commas. // // SKEW(number1,number2,...) Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values. Number1, number2 ... are 1 to 255 arguments for which you want to calculate skewness. You can also use a single array or a reference to an array instead of arguments separated by commas. // For instance: My datasets range from 75,00 entries to 13,845 entries. I am getting a Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis = 0.59906 and Skewness = 0.91872 for another one, but I dont know what they mean. Can anybody help me to discuss/explain/expand these values? |
#9
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
If X follows the Normal(0,1) distribution, then the population parameters are
Skewness(X)=0 Excess Kurtosis(X)=0 Skewness(|X|)=0.995... Excess Kurtosis(|X|)=0.869... Jerry "Quco" wrote: are those values you would expect from a typical gaussian (normal) distribution? "Jerry W. Lewis" wrote: I would tend to expect the differences (instead of the absolute differences) to be normally distributed. If the differences are normally distributed, then the absolute differences whould tend to have a skewness around 0.995, and a kurtosis around 0.869. Jerry "Quco" wrote: Thank you all for your help and references. I am starting to think if I am using the correct approach, but before I explain this further, I have to say that a professor from the University of Hawai at Manoa has an in-depth explanation about Excel's Skewness and Kurtosis funcions in this website: http://jalt.org/test/bro_1.htm His article brings more significanse to the values we get from Excel. Now, what am I trying to do in this project is: I have two different sets of elevations represented by a series of points (13814 total). One set came from ground measurements, and another set came from aerial measurements. The datasets I am using to analyse the data are simply the absolute value of the differences between this two sets of elevations. I then subdivide them in four different groups to get descriptive statistics plus the Skewness and Kurtosis statistics. The groups a points falling on hard surfaces (3026 total), points falling on short grass (7645 total), points falling on tall grass (1583 total) and points falling on forested areas and canopies (1560 total). I would expect smaller differences in the first two groups, and greater differences in the last two groups, but that's not always the case because we have some outliers. I am trying to compare the distributions for each group with a normal distribution using the KURT and SKEW functions to come out with a firm and solid explanation. But now I am not sure if I am using the correct approach to do this. "Jerry W. Lewis" wrote: Biometrika Tables for Statisticians, vol 1, Tables 34B and 34C give critical values for testing normality with the sample skewness and sample kurtosis. The largest sample size in the table is 2000, where |Kurtosis|0.3 represents a statistically significant departure from normality. However, with sample sizes in your range, you have tremendous power to detect minor departures of little or no practical meaning. Unless there is a useful alternative model for the analysis, I would tend to look a probability plot http://www.usfca.edu/~middleton/demand.pdf instead of a hypothesis test to assess normality. Jerry "Quco" wrote: Thank you for your super-fast response Dave! Now, before I posted this today, I did some research on the internet and I did found the same document in Wikipedia. I wish it could be easier to determine if there is a limit for this values, let's say -1 to +1, where 0 is the ideal value for the normal distribution. The problem is, what can I conclude when I see things like Kurtosis = 2.94948 and Skewness = 1.01299, Kurtosis = 3.72673 and Skewness = 1.81848, etc... the purpose of this project is to test for normality as we move forward identifying and removing outliers (the given values do have outliers present). In theory, we should get a nearly close normal distribution as we remove outliers after careful examination. One more thing I am concerned of, is the Excel help file when it is limiting the values to n=255. If I have samples much greater than that, are the functions just taking the first 255 values??? that's a little bit confusing. Finally, is there any other rigorous test (from the ones listed in Wikipedia) built-in in Excel? "Dave F" wrote: Wikipedia has a pretty good discussion of kurtosis. See, especially, the section "Kurtosis of well known distributions": http://en.wikipedia.org/wiki/Kurtosis (standard disclaimers about relying on Wikipedia apply). A normal distribution should have a kurtosis of 0. A normal distribution should also have a skew of 0. So, based on the results you give, none of your distributions are normal. But they don't seem to be either extremely peaked or extremely skewed. Positive numbers for SKEW indicate that the distribution is skewed to the right and negative skew indicates that the distribution is skewed to the left. Does this help give context to the numbers you're getting? Dave -- A hint to posters: Specific, detailed questions are more likely to be answered than questions that provide no detail about your problem. "Quco" wrote: I have 5 independent datasets that I am analyzing with descriptive statistics and Gaussian (normal) distributions to search for outliers. During the process, I decided to see if there is a way to €śtest€ť a sample for €śnormality€ť since Im getting some graphs that do not represent normal distributions. I want to know why. So, I found the functions KURT and SKEW, but I need some advice about their significance. I do not understand the values I am getting from these functions and the help files in Excel 2007 are not very specific. (I wish they could expand more the description listed under €śExcel Home Function reference Statistical€ť). This is what they say: // KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis characterizes the relative peakedness or flatness of a distribution compared with the normal distribution. Positive kurtosis indicates a relatively peaked distribution. Negative kurtosis indicates a relatively flat distribution. Number1, number2, ... are 1 to 255 arguments for which you want to calculate kurtosis. You can also use a single array or a reference to an array instead of arguments separated by commas. // // SKEW(number1,number2,...) Returns the skewness of a distribution. Skewness characterizes the degree of asymmetry of a distribution around its mean. Positive skewness indicates a distribution with an asymmetric tail extending toward more positive values. Negative skewness indicates a distribution with an asymmetric tail extending toward more negative values. Number1, number2 ... are 1 to 255 arguments for which you want to calculate skewness. You can also use a single array or a reference to an array instead of arguments separated by commas. // For instance: My datasets range from 75,00 entries to 13,845 entries. I am getting a Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis = 0.59906 and Skewness = 0.91872 for another one, but I dont know what they mean. Can anybody help me to discuss/explain/expand these values? |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
graph a normal distribution with skew and kurtosis | Charts and Charting in Excel | |||
SKEW and KURT functions | Excel Discussion (Misc queries) | |||
efficiency: database functions vs. math functions vs. array formula | Excel Discussion (Misc queries) | |||
Looking for a site with functions that substitute the ATP functions | Excel Worksheet Functions | |||
Nesting functions in the functions dialog box | Excel Worksheet Functions |