Home |
Search |
Today's Posts |
#1
|
|||
|
|||
Median Formula not working properly
Hi,
I used the following formula to find the median for a range of numbers. I assume variable p in the below formula as 0.5. =PERCENTILE(A1:A7,PERCENTRANK(PROB(A1:A7,B1:B7/SUM(B1:B7),,A1:A7),p,20)) The problem is for some range of data the above formula is not working properly. And for some other range of data this formula seems working fine. I took the formula from one of the posts in this forum and the link is, http://www.excelbanter.com/showthread.php?t=135558 I have attached a sheet with some sample data. Please help me to resolve this issue. Many Thanks in advance. |
#2
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Median Formula not working properly
"RK_Excel" wrote:
I used the following formula to find the median for a range of numbers. I assume variable p in the below formula as 0.5. =PERCENTILE(A1:A7,PERCENTRANK(PROB(A1:A7,B1:B7/SUM(B1:B7),,A1:A7),p,20)) [....] |Download: http://www.excelbanter.com/attachment.php?attachmentid=663| The short answer: this is a problem in PROB. It is arguable whether or not to call it a defect. (Although I think it is avoidable.) And I do not think __we__ can do anything to avoid the problem reliably because it arises from a sum that is computed internally. Ergo, this paradigm is not a reliable way to compute the median of grouped data. There are steps that we can take that might mitigate the problem with specific sets of data. See the details below. But first.... ----- You have some mistakes. But ironically, they help to identify the root cause of the problem. First, the formula in G3 is not array-entered (you pressed just Enter instead of ctrl+shift+Enter). It should be, just as the formula in G8 is. Ironically, that is unrelated to the #NUM error in G3. But if PROB had not found the conditions that triggered the #NUM error, PERCENTRANK would have returned a #N/A error. Second, the debug formulas in D3:D198 are incorrect. That is, they are a misinterpretation of the array-entered formula in G3. The correct interpretation is to normally-enter the following formula into E3 (just press Enter), then copy down through E197: =PROB($A$3:$A$197,$B$3:$B$197/SUM($B$3:$B$197),,A3) Note: I changed the range from rows 3:198 to rows 3:197 to avoid the PROB defect, just for discussion purposes. Alternatively, you can select E3:E197 and array-enter the following formula (press ctrl+shift+Enter instead of just Enter): =PROB(A3:A197,B3:B197/SUM(B3:B197),,A3:A197) This has the same effect as the normally-entered formulas. But it is easier to create directly from the original formula in G3. I could explain the difference in interpretation, if you wish. But it is not germane to the central issue, namely: why does PROB return #NUM errors? ----- Now select E3:E198 and array-enter the same formula (press ctrl+shift+Enter instead of just Enter), extending the ranges to row 198, to wit: =PROB(A3:A198,B3:B198/SUM(B3:B198),,A3:A198) Note that PROB returns an array of #NUM errors. Why? Because PROB has determined that SUMPRODUCT(B3:B198/SUM(B3:B198)) is not 1. And indeed it is not. Note that =SUMPRODUCT(B3:B198/SUM(B3:B198))-1-0 returns about -1.22E-15, indicating that SUMPRODUCT(B3:B198/SUM(B3:B198)) is infinitesimally less than 1. (Note: The "redundant" -0 is needed to ensure that Excel does not arbitrarily "correct" the true arithmetic result to be exactly zero, based the dubious heuristic poorly described under the misleading title "Example When a Value Reaches Zero" at http://support.microsoft.com/kb/78113.) The inaccuracy is a normal by-product of the native binary computer arithmetic, which Excel relies on. Such infinitesimal differences are very common. Usually they simply go unnoticed. Given that they are "very common", the real mystery is: why does PROB raise a #NUM error for some infinitesimal differences, but not for others? I believe your incorrect formulas in D3:D198 provide a hint. Note that some of the formulas in D3:D198 return a #NUM error. Copying the text of the second PROB parameter in D3, if we normally-enter the formula =SUMPRODUCT($B$3:B3/SUM($B$3:B3))-1-0 into F3 and copy down through F198, some analysis reveals the following observations: 1. Only 32 differences are exactly zero. 2. Of the remaining 166 differences, the min difference is about -2.33E-15, and the max difference is about about 2.66E-15. 3. But there are only 46 #NUM errors in D3:D198. 4. Among the rows with #NUM errors, the min difference is about -2.33E-15, and the max difference is about -5.55E-16 (closer to zero). Ergo, PROB does tolerate some infinitesimal differences from 1 for the sum of the probabilities, as it really should, IMHO. But it draws the line at "large" infinitesimal differences. What is "large"? I speculate that it is any infinitesimal difference that causes the result to be something other than 1 when rounded to 15 significant digits. Specifically: -5*2^-53 (about -5.55E-16) or smaller (further from zero); and about 23*2^-52 (about 5.11E-15) or larger. ------ I believe that explains the anomalous #NUM errors from PROB. The question is: can we avoid them? In general, I do not believe there is any reliable way to avoid them, since the sum is computed internally. But the following steps, together or separately, might remedy #NUM errors for __specific__ sets of data. 1. Reduce the number of line items (rows). For example, in RK-Excel's data, several rows of data could be combined because the prices are identical. This is even more true if the prices were rounded to the cent, as I believe they should be. The fewer the number of items summed, the smaller the accumulated infinitesimal differences. 2. Instead of entering the array expression B3:B198/SUM(B3:B198) as a parameter of the PROB function, enter the normally-entered formula (just press Enter) into J4 and copy down through J198: =B4/SUM($B$3:$B$198) Then normally-enter the following formula into J3 (just press Enter): =1-SUMPRODUCT(B4:B198/SUM($B$3:$B$198))-0 Then array-enter the following formula in G3 (press ctrl+shift+Enter, not just Enter): =PERCENTILE(A3:A198,PERCENTRANK(PROB(A3:A198,J3:J1 98,,A3:A198),0.5,20)) Caveat: Although I suspect this will work 99.9999% :-) of the time, it might not always work because of the specific way that Excel might compute the sum within the PROB function. 3. I might add that the following "common sense" alternative probably will __not__ remedy the problem reliably. In fact, it might exacerbate it. (In fact, it does not remedy the problem with RK-Excel's data.) That would be: replace that array expression B3:B198/SUM(B3:B198) with the array expression array expression --TEXT(B3:B198/SUM(B3:B198),"0.00000000000000E+0"), which rounds each probability to 15 significant digits. Although it might seem to remedy the problem with some data sets, that would be a coincidence. There still might be too large a cumulative infinitesimal difference in the sum due to the fact that Intel-compatible CPUs actually do arithmetic with larger precision. |
#3
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
Median Formula not working properly
PS.... I wrote:
In general, I do not believe there is any reliable way to avoid them, since the sum is computed internally. But the following steps, together or separately, might remedy #NUM errors for __specific__ sets of data. 1. Reduce the number of line items (rows). As a demonstration of the quirkiness of this approach (by itself), it might be noted that in RK-Excel's workbook, PROB has no problem with the 346 lines of data in M3:N346, much more than the 198 lines of data in A3:B198. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Median If in Excel not working | Excel Worksheet Functions | |||
Formula Not Working Properly | Excel Programming | |||
Formula and worksheet will not working properly | Excel Worksheet Functions | |||
formula not working properly | Excel Worksheet Functions | |||
But not working properly | Excel Discussion (Misc queries) |