ReGenesees-package {ReGenesees}R Documentation

ReGenesees: a Package for Design-Based and Model-Assisted Analysis of Complex Sample Surveys

Description

ReGenesees is an R package for design-based and model-assisted analysis of complex sample surveys. It handles multistage, stratified, clustered, unequally weighted survey designs. Sampling variance estimation for nonlinear (smooth) estimators is obtained by Taylor-series linearization. Sampling variance estimation for multistage designs can be obtained both under the Ultimate Cluster approximation or by means of an actual multistage computation. Estimates, standard errors, confidence intervals and design effects are provided for: Totals, Means, absolute and relative Frequency Distributions (marginal, conditional and joint), Ratios, Shares and Ratios of Shares, Multiple Regression Coefficients and Quantiles (variance via the Woodruff method). ReGenesees also handles Complex Estimators, i.e. any user-defined estimator that can be expressed as an analytic function of Horvitz-Thompson or Calibration estimators of Totals or Means, by automatically linearizing them. The Design Covariance and Correlation between Complex Estimators is also provided. All analyses above can be carried out for arbitrary subpopulations. In addition, ReGenesees can trim calibration weights while preserving all the calibration constraints. Lastly, ReGenesees offers a Generalized Variance Functions (GVF) infrastructure, i.e. facilities for defining, fitting, testing and plotting GVF models, and to exploit them to predict variance estimates.

The ReGenesees package is the fundamental building block of a full-fledged R-based software system: the ReGenesees System. The latter has a clear-cut two-layer architecture. The application layer of the system is embedded into package ReGenesees. A second R package, called ReGenesees.GUI, implements the presentation layer of the system, namely a user-friendly Tcl/Tk GUI.

A Quick Reading Guide to the Reference Manual

This reference manual reports a documentation entry for each (user visible) function of package ReGenesees. As you may have noticed by reading section ‘R topics documented’ (page 1 of the pdf manual), these documentation entries are automatically sorted according to the alphabetic ordering of the names of the functions. Such an ordering doesn't provide any clue about where should a user start reading, nor on the best way to proceed further.

In section ‘Table of Contents’, I tried to cluster the most important topics documented in the reference manual into few broad groups, based on both the statistical goals and on the software design of the underlying functions.

Moreover, I provided a relevance code for each documented topic/function. The meaning of such codes, along with the corresponding reading suggestions, are reported in the following table:

Relevance Codes Legend

CODE    RELEVANCE           READING SUGGESTION
 ***    Very Important......Read these topics as soon as possible. A clear
                            understanding of these functions is mandatory
                            in order to start using profitably the package.

  **    Important...........Read these topics once you have been experiencing
                            for a while with (at least some of) the 'Very
                            Important' functions.

   *    Useful..............These functions are ancillary (albeit in
                            different ways) to the 'Very Important' and
                            'Important' ones (and their usage is generally
                            simpler).

   .    Advanced............These topics are very relevant but, unfortunately
                            quite difficult. As they involve technical
                            details, you should postpone their reading until
                            you become familiar with the package.

Important Notice
It goes without saying that the ‘Examples’ sections at the end of each documented topic represent a crucial part of this reference manual.

TABLE OF CONTENTS

Survey Design

***  e.svydesign..........Specification of a Complex Survey Design
  *  weights..............Retrieve Sampling Units Weights
  *  find.lon.strata......Find Strata with Lonely PSUs
 **  collapse.strata......Collapse Strata Technique for Eliminating
                          Lonely PSUs
  *  des.addvars..........Add Variables to Design Objects
  *  des.merge............Merge New Survey Data into Design Objects

Calibration

 **  pop.template.........Template Data Frame for Known Population Totals
  *  population.check.....Compliance Test for Known Totals Data Frames
  *  pop.desc.............Natural Language Description of Known Totals
                          Templates
 **  fill.template........Fill the Known Totals Template for a
                          Calibration Task
  *  bounds.hint..........A Hint for Range Restricted Calibration
***  e.calibrate..........Calibration of Survey Weights
  *  check.cal............Calibration Convergence Check
 **  trimcal..............Trim Calibration Weights while Preserving
                          Calibration Constraints
  *  g.range..............Range of g-Weights
  *  get.residuals........Calibration Residuals of Interest Variables
  *  ext.calibrated.......Make ReGenesees Digest Externally Calibrated
                          Weights
  .  contrasts.RG.........Set, Reset or Switch Off Contrasts for
                          Calibration Models
  .  %into%...............Compress Nested Factors

Estimates and Sampling Errors

***  svystatTM............Estimation of Totals and Means in
                          Subpopulations
***  svystatR.............Estimation of Ratios in Subpopulations
***  svystatS.............Estimation of Shares in Subpopulations
***  svystatSR............Estimation of Share Ratios in Subpopulations
***  svystatB.............Estimation of Population Regression Coefficients
***  svystatQ.............Estimation of Quantiles in Subpopulations
***  svystatL.............Estimation of Complex Estimators in
                          Subpopulations
 **  aux.estimates........Quick Estimates of Auxiliary Variables Totals
 **  CoV, Corr............Design Covariance and Correlation of Complex
                          Estimators in Subpopulations
  *  write.svystat........Export Survey Statistics
  *  extractors...........Extractor Functions for Variability Statistics
  .  ReGenesees.options...Variance Estimation Options for the ReGenesees
                          Package

Generalized Variance Functions Method

***  GVF.db...............Archive of Registered GVF Models
***  gvf.input............Prepare Input Data to Fit GVF Models
***  svystat..............Compute Many Estimates and Errors in Just a
                          Single Shot
***  fit.gvf..............Fit GVF Models
 **  plot.gvf.fit.........Diagnostic Plots for Fitted GVF Models
 **  drop.gvf.points......Drop Outliers and Refit a GVF Model
  *  getR2, AIC, BIC......Quality Measures on Fitted GVF Models
  *  getBest..............Identify the Best Fit GVF Model
***  predictCV............Predict CV Values via Fitted GVF Models
  *  gvf.misc.............Miscellanea: Methods for Fitted GVF Models 
  *  estimator.kind.......Which Estimator Did Generate these
                          Survey Statistics?

Utilities

  *  Zapsmall..............Zapsmall Data Frame Columns and Numeric Vectors

Data Sets

 **  data.examples........Artificial Household Survey Data
 **  fpcdat...............A Small But Not Trivial Artificial Sample
                          Data Set
 **  sbs..................Artificial Structural Business Statistics Data
 **  AF.gvf...............Example Data for GVF Model Fitting

The ordering of the above ‘Table of Contents’ reflects only loosely the procedural sequence in which functions could be used. For instance, while you cannot apply function e.calibrate unless you have previously built a design object by using e.svydesign, you can exploit, e.g., function collapse.strata also after calibration. As a further example, all functions in group ‘Estimates and Sampling Errors’ can be used on objects created by e.svydesign (yielding estimates and sampling errors for functions of Horvitz-Thompson estimators), as well as on objects created by e.calibrate (yielding estimates and sampling errors for functions of Calibration estimators).


[Package ReGenesees version 2.0 Index]