;+ ; NAME: ; robomean ; PURPOSE: (one line) ; Robust statistical moments of the data. ; DESCRIPTION: ; This routine computes the average, average deviation, standard ; deviation, variance, skew and kurtosis of the input data. The various ; output quantities are always returned as floating point scalars. ; The statistics are computed with no regard for the dimensionality of ; the input data. ; ; The statistics are robust in that the data is searched for outliers ; after the moments are computed. If outliers are found they are ; removed and the statistics are computed again. This continues until ; either no outliers are found or if the removal of outliers has an ; insignificant effect on the statistics. ; CATEGORY: ; Statistics ; CALLING SEQUENCE: ; robomean,data,thresh,eps,avg,avgdev,stddev,var,skew,kurt,nfinal ; INPUTS: ; data - Input data to be analyzed. ; thresh - Deviation from the mean to signify an outlier. ; eps - Smallest significant change in mean in units of std dev. ; OPTIONAL INPUT PARAMETERS: ; None. ; INPUT KEYWORD PARAMETERS: ; bad - Array of flags, 0=good, 1=bad. Any points removed from the sample ; will have this flag set to 1 upon return. Note that if you ; do not supply this array, you will not get the resulting bad ; flag output even if you supply the keyword. ; OUTPUT KEYWORD PARAMETERS: ; STDMEAN : Optional return of standard deviation of the mean. ; OUTPUTS: ; avg - Sample mean. ; avgdev - Average deviation of the data from the mean. ; stddev - Standard deviation of the data from the mean. ; var - Variance of the data from the mean. ; skew - Skewness, third statistical moment. ; kurt - Kurtosis, fourth statistical moment. ; nfinal - Number of points used in the final result. ; new - Vector of 'clean' pixels (optional). ; COMMON BLOCKS: ; None. ; SIDE EFFECTS: ; None. ; RESTRICTIONS: ; None. ; PROCEDURE: ; Standard techniques, see section 13.1 in Numerical Recipies. The ; thresh and eps values are not tremendously important. Thresh=5.0 ; and eps=0.5 appear to work pretty well most of the time. ; MODIFICATION HISTORY: ; Written by Marc W. Buie, Lowell Observatory, 1992 Jan 20. ; Fix - nfinal was not being returned to calling program. ; Doug Loucks, Lowell Observatory, 1992 Oct 22. ; Fix - Sense of test to determine additional refinement was ; not correct for some cases. Changed logic to parallel ; the C version, since the WHILE statement is now available ; in IDL. ; Mod - Added by Marc Buie: Argument 'new,' allowing the 'clean' pixels ; to be returned to the caller. ; Mod - 2/11/94, DWL, Added keyword STDMEAN to permit return of this ; value to the caller. ; 97/03/23 - MWB, added BAD keyword and operation. This program's operation ; should not be any different than what it was if you don't supply BAD. ; 98/07/08, MWB, changed so that NaN values are automatically flagged as bad. ; 2000/11/2, MWB, cosmetic rewrite. ;- pro robomean, data, thresh, eps, avg, avgdev, stddev, var, skew, kurt, $ nfinal, new, STDMEAN=r_stdmean, BAD=bad ; figure out if there are input bad flags, if there are input bad flags ; use them, and save any changes to be returned. Without input flags ; the work to remember what's trimmed out is not done. savebad = n_elements(bad) eq n_elements(data) ; First, bypass anything in the input data that is not finite. if savebad then begin nans = where(finite(data) eq 0,count) if count ne 0 then bad[nans]=1 good = where(bad eq 0,count) endif else begin good = where(finite(data) eq 1,count) endelse ; Do the first pass, watch out for pathological cases too. if count eq 0 then begin print, 'All points removed.' avg=0. avgdev=0. stddev=0. var=0. skew=0. kurt=0. return endif else if count eq 1 then begin avg=data[good[0]] avgdev=0. stddev=0. var=0. skew=0. kurt=0. return endif else begin Moment4, data[good], avg, avgdev, stddev, var, skew, kurt n = count endelse ; Compute errors on the statistical moments. skewerr = SQRT(6.0/n) kurterr = SQRT(24.0/n) ; Safety, stddev should really never be zero. if stddev eq 0 then stddev=max(data) ; compute standard deviation of the mean if n gt 1 then $ stdmean = stddev / SQRT(n-1.0) $ else $ stdmean = stddev ; First weeding pass, anything that is more than thresh sigma from the ; mean is marked as bad. The sigma used is the standard deviation of ; the data. if savebad then begin nogood = where( abs(data[good]-avg)/stddev gt thresh, badcount) if badcount ne 0 then bad[good[nogood]]=1 good = WHERE( bad eq 0, count) endif else begin good0 = WHERE( ABS( data[good]-avg ) / stddev LE thresh, count) good = good[good0] endelse ; Safety valve, if nothing good here, return results prior to the ; first weeding pass. if count eq 0 then begin new = data nfinal = n_elements(new) r_stdmean = stddev return endif ; Restrict data to just the good data. new = data[ good ] ; Set nfinal to the number before first weeding nfinal = n ; set the new length to the nuber after first weeding newlen = count ; set the oldmean to a high value to get things started, this value ; ensures that oldmean won't pass the test and the following loop ; will run at least once. oldmean = avg + 2.0*eps*stddev ; Loop to whittle away at the data. This loop will continue to run as ; long as there is data to work with and the termination conditions have ; not yet been met. If the number of points has changed AND the average ; has changed by more than eps*stdmean then another iteration will be ; done. Another iteration will also be done if the kurtosis changes by ; more than thresh times the kurtosis error. The kurtosis based test is ; quite important in this process. In some cases, you would quit too ; early if you just looked at the change in the mean. while ( (newlen ne n) and ( (ABS( oldmean-avg ) / stdmean gt eps) or $ ( kurt gt thresh*kurterr ) ) ) do begin ; save the results from the last pass oldmean = avg n = newlen ; Compute new statistical moments Moment4, new, avg, avgdev, stddev, var, skew, kurt ; safety valve if stddev eq 0 then stddev=max(data) ; errors skewerr = SQRT(6.0/n) kurterr = SQRT(24.0/n) if n gt 1 then $ stdmean = stddev / SQRT(n-1.0) $ else $ stdmean = stddev ; The next filtering pass if savebad then begin nogood = where( abs(data[good]-avg)/stddev gt thresh, badcount) if badcount ne 0 then bad[good[nogood]]=1 good = WHERE( bad eq 0, count) if count ne 0 then new = data[good] endif else begin good = WHERE( ABS( new-avg ) / stddev le thresh, count) if count ne 0 then new = new[ good ] endelse if count eq 0 then MESSAGE, 'All points removed.' newlen = count endwhile nfinal = n r_stdmean = stdmean end