Program PRESENCE ver 12.31

Synopsis

This software was developed to enable estimation of the Proportion of Area Occupied (PAO), or similarly the probability that a site is occupied, by a species of interest according to the model presented by MacKenzie et al. (2002) 1.

Typically, species are not guaranteed to be detected even when present at a site, hence the naïve estimate of PAO given by:

                # sites where species detected
         PAO = --------------------------------
                    total # sites surveyed

will underestimate the true PAO. MacKenzie et al. (2002) propose that by repeated surveying of the sites, the probability of detecting the species can by estimated which then enables unbiased estimation of PAO. This model has been extended by MacKenzie et al. (2003)2 that also enables estimation of colonization and local extinction probabilities. These models are briefly discussed here.

Contents

The Basic Sampling Scheme

N sites are surveyed over time where the intent is to establish the presence or absence of a species. The sites may constitute a naturally occurring sampling unit such as a discrete pond or patch of vegetation; a monitoring station; or a quadrat chosen from a predefined area of interest. The occupancy state of sites may change over time, however during the study there are periods when it is reasonable to assume that, for all sites, no changes are occurring, (e.g, within a single breeding season for migratory birds). The study therefore comprises of K primary sampling periods (seasons), between which changes in the occupancy state of sites may occur. Within each season, investigators use an appropriate technique to detect the species at kj surveys of each site.

The species may or may not be detected during a survey and is not falsely detected when absent (except false-positive model). The resulting detection history for each site may be expressed as T vectors of 1's and 0's, indicating detection and nondetection of the species respectively. We denote the detection history for site i at primary sampling period j as Xi,j, and the complete detection history for site i, over all primary periods, as Xi. The single season model results when K=1, and the multiple season model for K>1.


Single Season Model

MacKenzie et al. (2002)1 present a model for estimating the site occupancy probability (or PAO) for a target species, in situations where the species is not guaranteed to be detected even when present at a site. Let ψ be the probability a site is occupied and p[j] be the probability of detecting the species in the jth survey, given it is present at the site. They use a probabilistic argument to describe the observed detection history for a site over a series of surveys. For example the probability of observing the history 1001 (denoting the species was detected in the first and fourth surveys of the site) is:

ψ x p[1](1-p[2])(1-p[3])p[4].

The probability of never detecting the species at a site (0000) would therefore be,

ψ x (1-p[1])(1-p[2])(1-p[3])(1-p[4]) + (1-ψ),

which represents the fact that either the species was there, but was never detected, or the species was genuinely absent from the site (1-ψ). By combining these probabilistic statements for all N sites, maximum likelihood estimates of the model parameters can be obtained.

The model framework of MacKenzie et al. (2002) is flexible enough to allow for missing observations: occasions when sites were not surveyed. Missing observations may result by design (it is not logistically possible to always sample all sites), or by accident (a technicians vehicle may breakdown enroute). In effect, a missing observation supplies no information about the detection or nondetection of the species, which is exactly how the model treats such values.

The model also enables parameters to be function of covariates. For example, occupancy probability may be a function of habitat, while detection probability is a function of environmental conditions such as air temperature. The model therefore allows relationships between occupancy state and site characteristics to be investigated. Covariates are entered into the model by way of the logistic model (or logit link).

A key assumption of the single season model is that all parameters are constant across sites. Failure of this creates heterogeneity. Unmodeled heterogeneity in detection probabilities will cause occupancy to be underestimated. If there is unmodelled heterogeneity in occupancy probabilities, then it is believed that the estimates will represent an average level of occupancy, provided detection probabilities are not directly related to the probability of occupancy.

Another major assumption of the MacKenzie et al. (2002) model (Single-season) is that the occupancy state of the sites does not change for the duration of the surveying. Situations where this may be violated, for instance, would be for species with large home ranges, where the species may temporarily be absent from the site during the surveying. If this process of temporary absence from the site may be viewed as a random process, (e.g., the species tosses a coin to decide whether it will be present at the site today), then this assumption may be relaxed. However, this will alter the interpretation of the model parameters ("occupancy" should be interpreted as "use" and "detection" as "in the site and detected"). More systematic mechanisms for temporary absences may be more problematic and create unknown biases. Although, users are reminded that the model assumes closure of the sites at the species level, not at the individual level, so there may be some movement of individuals to/from sites without overly affecting the model.

Model Overview

Currently, there are many types of models can be fit to detection/nondetection data within Program PRESENCE.

In all models, estimated parameters (ψ, p, γ, ε, ...) may be modelled as functions of site-specific, or site-survey-specific covariates.

Downloading Program PRESENCE

Installing Program PRESENCE

Here is the procedure to install on a Windows© system:

Running the program

This model is a special case of the multi-season-multi-state model and can be run in PRESENCE as a multi-season-multi-state model with only one season.


Royle N-Mixture Model

The Royle N-Mixture model (Royle, 2004) 10 estimates population size from temporally replicated point-count data at a number of sample sites. The variation in these point-counts provides information about the distribution of site-specific population size (N). Input data for this model are the counts of the number of individuals observed at each survey (instead of the usual '1' or '0') at each sample site.

Parameters estimated under the assumption of a Poisson distribution:


Multiple Season Model or Dynamic Occupancy model

The multiple season model (MacKenzie et al., 2003) 2 extends the single season model by introducing two additional parameters, ε[t] and γ[t]. These parameters are, respectively, the probability a species becomes locally extinct or colonizes a site between seasons t and t+1.

Parameters:

For example, if the detection history 101 000 was observed at a site (denoting the species was detected in the first and third survey of the site in the first season; not detected otherwise), the probability of this occurring could be expressed as;

ψ x p1,1(1-p1,2)p1,3 x {(1-ε1) (1-p2,1) (1-p2,2) (1-p2,3) + ε1}.

This represents the fact that after the first season, the species may have not gone locally extinct (1-ε1), but was undetected in the 3 surveys in season 2 ((1-p2,1)(1-p2,2)(1-p2,3)) or the species did go locally extinct (ε1) between the first and second seasons.

The model may also be reparameterized in terms of ψt and εt; or ψt and γt, as in some situations this may be a more meaningful parameterization (in terms of overall occupancy) than in terms of the underlying processes. As in the single season model, parameters may be functions of covariates using the logit link.

Note this model does not allow for a so-called "rescue effect", where the local extinction of a colony is "rescued" by the re-colonization of the site before the unoccupied site can be observed (i.e., the site becomes unoccupied then re-occupied all between a single season). Such an effect is sometimes included in metapopulation models, however while a rescue effect is biologically plausible, it can not be estimated (without some potentially unrealistic strict assumptions) from the type of data we are considering here, nor from the type of data often collected in metapopulation studies. The main argument for not including a rescue effect is: why should the rescue of the colony be limited to an arbitrary single event, when possibly there may be a number of opportunities between two seasons for the rescue to occur? To reduce the possibility of having unobserved changes in the occupancy state of sites, the sampling scheme should be designed to reflect the appropriate time scale of the system under study.

Alternate Parameterizations

The initial parameterizaton uses a single initial occupancy paramter, k-1 extinction parameters (assuming k seasons), k-1 colonization parameters, and T detection parameters (assuming T surveys). Once these parameters are estimated, other quantities of interest can be computed. Occupancy in other seasons can be computed as:
ψ2 = ψinitial*(1-ε1) + (1-ψinitial)*γ1
ψ3 = ψ2*(1-ε2) + (1-ψ2)*γ2
ψ4 = ψ3*(1-ε3) + (1-ψ3)*γ3
 :         :        :            :        :
 
Sometimes, it is desirable to model seasonal occupancy as a function of some covariates. Since seasonal occupancy is computed from εi and γi, this cannot be done with these parameters.

An alternate parameterization in PRESENCE uses k occupancy parameters, k-1 extinction parameters, and T detection parameters. The k-1 colonization parameters are then computed from the seasonal ψ's and ε's by solving the above equations for γi.

           ψ2 - ψinitial*(1-ε1)
 γ1 = ---------------------------------
                 (1-ψinitial)
 
By selecting this parameterization, it's now possible to build a model where seasonal occupancy (ψi) is a function of a seasonal covariate.

Similarly, we could have estimated the colonization parameters and computed the extinction parameters. This parameterization is sometimes useful if the above parameterization fails to converge on reasonable estimates.

Finally, PRESENCE can model extinction and colonization in such a way that the proportion that go locally extinct is the same as the proportion that don't colonize (ε=1-γ).


Multi-season-staggered-entry model

This model relaxes closure assumption such that a site may locally colonize and go locally extinct once during the surveys (ie., delayed arrival and/or early departure). It is descirbed by Kendall et. al 2013.17 and is a simple extension of the single-season-staggered-entry model. Input data is of the same format as the multi-season model.

Parameters:


Multi-season False-positive detection model

The False-positive detection model (Miller et. al. (2013). 17 ) extends the single season-false-positive model to allow change in occupancy between seasons. Input format is similar to the single-season-false-positive model.

Parameters:


Multi-season Spatial Dependence Model

This model is an extension of the Single-season Spatial Dependence model. The parameters for the single-season Spatial Dependence model are replicated for each season, with additional parameters for colonization (γ) and extinction (ε) for each interval between seasons.

Auto-logistic models

With the multi-season model, it may be of interest to model extinction or colonization between seasons t-1 and t as a function of site occupancy for neighboring sites in season t-1.

If you're willing to assume that all sites are 'neighbors' of each other (i.e., if the individual organisms can move anywhere within the study area), then this model can be run by simply using psi1 as a covariate name in the design matrix. When PRESENCE goes to compute colonization or extinction, it will compute the average "conditional" occupancy of neighboring sites in season t-1 and use that value as a covariate in computing colonization/extinction in season t. By specifying "upsi" instead of "psi1", PRESENCE will compute average "unconditional" occupancy of neighboring sites instead of "conditional" occupancy.

Note: "Conditional occupancy" in this case, means conditional on the detection history for the season, not on the entire detection history.

In the case where it is desired to define neighbors specifically for each site, PRESENCE can read a file which defines the neighbors of each site. This file should be a text file containing 1's and 0's where 1 denotes that site s is a neighbor of site r. The format of the file is rows of contigous 1's and 0's, one row for each focal site. Each row should contain a string of k characters (where k = number of sites in the study area). For example if there were 10 sites, the neighbor file might look like this:

 0100000000
 1000000000
 0001100000
 0010100000
 0011000000
 0000001111
 0000010111
 0000011011
 0000011101
 0000011110
 
In this example, sites 1 and 2 are neighbors of each other, sites 3,4,5 are neighbors, and sites 6-10 are neighbors. So, if colonization between seasons t-1 and t is modeled as a function of average neighborhood occupancy at time t-1, colonization for site 2 in season t will be a function of the average occupancy for sites 1 and 2 in season t-1.

In some cases, sites may not all be of the same size or habitat quality. In these cases, it would be preferable to weight the average occupancy of neighboring sites by a value indicating the size/quality of each neighboring site. For example, if all sites except site 5 are nearly the same size, but site 5 is twice as large as the other sites, we would want the occupancy for site 5 to have more influence on colonization/extinction of other sites than the occupancy of other sites. So, the neighbor file would be:

 0100000000 1.0
 1000000000 1.0
 0001100000 1.0
 0010100000 1.0
 0011000000 2.0
 0000001111 1.0
 0000010111 1.0
 0000011011 1.0
 0000011101 1.0
 0000011110 1.0
 
In this example, sites 3,4,5 are neighbors, but the average occupancy used in the computation of colonization for those sites will be computed as:

logit(γsite4,t) = β0 + β1*Xsite4,t-1+...

where X is the auto-logistic covariate and is computed as:

Xsite4,t-1 = (Wsite3 * ψsite3,t-1 + Wsite5 * ψsite5,t-1) / (Wsite3 + Wsite5 )

Using the weights in the example:

Xsite4,t-1 = (1.0 * ψsite3,t-1 + 2.0 * ψsite5,t-1) / (1.0 + 2.0)

(note: X = auto-logistic covariate and W = weight.)

If PRESENCE finds this auto-logistic covariate name, 'psi1', in the design matrix, it will ask for a neighbor text file. If you don't specifiy a file, PRESENCE will assume all sites are neighbors of each other and all have equal weight (as described initially). If a file is specified, the filename will be saved in the results file (*.pa3), and will be used in all future auto-logistic models. If the file is specified and does not contain a column of weights, all sites will be assumed to have equal weight.


Multi-season-Multi-state Model

This is an extension of the Single-season multi-state model. After the first season, occupancy status can change between each season.

Parameters:

A more general way of parameterizing this model is as follows:

Like the single-season-multi-state model, the parameter p21i would be zero since it would be impossible to observe breeding (state 2) if the species is in state 1 (non-breeding).

So, the first parameterization assumes that there are only 3 occupancy states:

The other parameterization can also be used and the relationship between the parameters is:

This parameterization is more general than the first parameterization, but needs constraints like the ones mentioned above in order to be identifiable. The advantage of this parameterization is that it is possible to have more than 3 occupancy states.

Integrated-habitat-occupancy models

This model generates estimates of changes in occupancy state in relation to changes in habitat state.

Input data consists of the following codes:

0=species not detected at site, habitat state=A,
1=species detected at site, habitat state=A,
2=species not detected at site, habitat state=B,
3=species detected at site, habitat state=B.

Parameters:

Note: X = habitat state can be A or B. All parameters except π and ψ can be indexed by survey (subscript i,j in multi-season model framework).

Multi-season Two-Species Model

This is an extension of the two-species model (MacKenzie et al., 2004) 16allowing the computation of occupancy, colonization, extinction and detection parameters of two species along with conditional probabilities when the other species is present or detected.

Parameters:

Input data for this model is in the same form as the single-species, single-season model except that the first half of the detection history records are assumed to be species A, and the second half of the records are assumed to be species B. So, if there are 60 sites, the input would consist of 120 detection history records. Records 1-60 would be the site-detection history records for sites 1-60, species A, and records 61-120 would be the site-detection history records for sites 1-60, species B.

Alternate Input for 2-species model-
Instead of repeating each site for each species, the following codes can be used for this model:

If there are any 2's or 3's in the data, PRESENCE will assume this form of input.

WinBugs Analysis

A rudimentry capability has been programmed into PRESENCE to do an analysis for select models using the Baysian statistical package, WinBugs. For certain model types, a button will appear on the 'Run model' dialog window with the caption 'Run WinBugs'. When this button is clicked, PRESENCE will generate the model code and data files needed by WinBugs to analyze the data. PRESENCE will then call WinBugs to do the analysis, then display a Notepad window with the results. Eventually, PRESENCE will be able to do more models, but at the moment, only the single-season model is available. Note: WinBugs must be installed for this feature to work.

Model Run Options


Single Season Output

The results for fitting the single season model to the data are stored in the results database. To view the output of a specific model, position the cursor over the desired model name, click with the right mouse button, then select 'view output' with the left mouse button. If the 'list data' option had been selected, the input data will appear at the beginning of the output. Next, the number of sites, sampling occasions and missing observations in the dataset; followed by the number of parameters in the model, twice the negative log-likelihood and Akaike's Information Criterion (AIC), (e.g., For simple single-season models, after the model AIC has been output, the estimated coefficients for the logistic model and their variance-covariance matrix are printed. These are the 'beta' or untransformed estimates which can be used to compute the 'real' parameters (psi,p) in the model.
Untransformed Estimates of coefficients for covariates (Beta's)
==============================================================================
                                            estimate   std.error
A1     : psi                                1.098612 (0.248070)
B1     : p1                                -0.000000 (0.112774)

Variance-Covariance Matrix of Untransformed estimates (Beta's):
              A1         B1  
     A1    0.061539  -0.004103
     B1   -0.004103   0.012718
------------------------------

============================================================

   Individual Site estimates of <Psi>
        Site         Survey         Psi     Std.err     95% conf. interval
1             0         survey_1:  0.7500   0.0465     0.6485 - 0.8299 

============================================================

   Individual Site estimates of <p>
        Site         Survey           p     Std.err     95% conf. interval
1             0         survey_1:  0.5000   0.0282     0.4450 - 0.5550 
1             0         survey_2:  0.5000   0.0282     0.4450 - 0.5550 
1             0         survey_3:  0.5000   0.0282     0.4450 - 0.5550 
1             0         survey_4:  0.5000   0.0282     0.4450 - 0.5550 
1             0         survey_5:  0.5000   0.0282     0.4450 - 0.5550 

============================================================
 DERIVED parameter - Psi-conditional : [Pr(occ | detection history)]

        Site                     psi-cond  Std.err     95% conf. interval
     1        0                    0.0857  0.0315     0.0240 - 0.1474 
     2   site_2                    1.0000  0.0000     1.0000 - 1.0000 
     3   site_3                    1.0000  0.0000     1.0000 - 1.0000 
     4   site_4                    1.0000  0.0000     1.0000 - 1.0000 

Note that if no covariates are used in the estimation of a real parameter, all sites will have the same value for that parameter and PRESENCE will only print the estimate for the 1st site.

After printing the 'real' parameters, a 'derived' parameter (psi-conditional) is printed. This parameter is the probability that a site is occupied, given it's particular detection history. So, all sites where a detection occured, will have a value of 1.0 for this parameter. Sites with no detections will have a value of less than or equal the unconditional occupancy estimate (psi). "Conditional occupancy" can be computed as:

        ψ Π(1-pj)
ψc =  ----------------   if no detections, = 1 if at least 1 detection
      ψ Π(1-pj) + 1-ψ

This parameter is useful for generating maps of species occurence.

Multiple Season Output

The results for fitting the multiple season model to the data will appear in a Notepad window. As for single season models, the output begins with a listing of the input data (if desired), followed by the number of sites, sampling occasions and missing observations in the dataset. The design matrices are printed followed by the number of parameters in the model, twice the negative log-likelihood and Akaike's Information Criterion (AIC). The untransformed parameter estimates and their associated variance-covariance matrix are printed, followed by the 'real' parameter estimates.
Untransformed Estimates of coefficients for covariates (Beta's)
==============================================================================
                                            estimate   std.error
A1     : psi                                1.098612 (0.411930)
B1     : gam1                              -2.197225 (2.394989)
C1     : eps1                              -1.386294 (0.468723)
D1     : p1                                -0.000000 (0.159541)

Variance-Covariance Matrix of Untransformed estimates (Beta's):
              A1         B1         C1         D1  
     A1    0.169686  -0.670645  -0.036365  -0.033936
     B1   -0.670645   5.735971   0.459619   0.191898
     C1   -0.036365   0.459619   0.219701   0.027272
     D1   -0.033936   0.191898   0.027272   0.025453
------------------------------
Model has been fit using the logistic link.

   Individual Site estimates of Psi:
        Site         Survey         Psi     Std.err     95% conf. interval
1             0         survey_1:  0.7500   0.0772     0.5723 - 0.8706 

   Individual Site estimates of Gamma:
        Site         Survey       Gamma     Std.err     95% conf. interval
1             0         survey_1:  0.1000   0.2155     0.0010 - 0.9239 

   Individual Site estimates of Eps:
        Site         Survey         Eps     Std.err     95% conf. interval
1             0         survey_1:  0.2000   0.0750     0.0907 - 0.3852 

   Individual Site estimates of p:
        Site         Survey           p     Std.err     95% conf. interval
1             0         survey_1:  0.5000   0.0399     0.4225 - 0.5775 
1             0         survey_2:  0.5000   0.0399     0.4225 - 0.5775 
1             0         survey_3:  0.5000   0.0399     0.4225 - 0.5775 
1             0         survey_4:  0.5000   0.0399     0.4225 - 0.5775 
1             0         survey_5:  0.5000   0.0399     0.4225 - 0.5775


Tools and Settings

Single Season Simulation

This simple simulation routine is included so that users may get a general feel for how the model of MacKenzie et al. (2002)1 performs under a specific set of circumstances and sampling designs. Scenarios may either be entered from a tab-delimited ASCII text file (see below for details), or by entering the scenario directly.

Where the previously described simulation procedure is intended as a basic learning tool, this procedure is designed to address a specific question.

There are two general sampling designs that can be investigated; sampling only a subset of sites more intensively to estimate detection probabilities; or halting the repeated sampling of sites after the species is first detected. Both designs are compatible with the MacKenzie et al. (2002)1 model. A single-group model with constant p is fit to the simulated data. Results are written to a file named 'presence.out' and loaded into the Notepad editor when completed.

This simulation file should be set up as follows (see SimExample.txt).

The first line should consist of 4 integer values;

The next N lines of the file hold the true occupancy and detection probabilities for each site. The first column in each line is the occupancy probability, and the following T columns contain the probability of detecting the species (given presence) during each survey.

The first NI of these lines represent the sites that will be sampled more intensively. For the remaining sites, if T0 < T then PRESENCE is still expecting to read in T detection probabilities, however these will not be used.

The final line of the file consists of 3 integer values;

SimExample.txt - sample simulation input file

20	10	5	3		
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
1	0	500			

Model averaging

Just as there is uncertainty in model parameters, there is also uncertainty in model selection. There may be several models with (relatively) equal support, according to AIC values. In this case, parameters may be averaged among all models, such that models with greater support have more weight in the computed average than models with little support.

Model averaged parameters may be computed in PRESENCE by selecting the "Model averaging" menu from the "Tools" menu. PRESENCE will obtain estimates for each parameter for each site (and survey if applicable) along with the weight of each model and compute a weighted average estimate. The two options for output are:


Help menu


Problems/Questions

The most frequent question/problem that occurs is about a message in the output warning that the optimization routine has possibly not reached the maximum likelihood value. The program then prints the number of significant digits achived when the optimization stopped. This situation occurs when the optimization routine is attempting to find the maximum of a function, when the function is relatively 'flat' (thinking in two-dimensions). If you look at the logit transformation described in this help file, you'll see that this can occur if a parameter is near zero or one. When a parameter is near zero, the 'untransformed' or 'beta' parameter associated with it is a large negative number. To get the 'real' parameter, the untransformed parameter is plugged into the logit equation (exp(x)/(1+exp(x))). So, plugging in -30 for x in this equation gives almost the same value as -40. This causes the optimization routine to think the function is 'flat' and gives the warning message.

Examining many simulated and real datasets, we have found that this message can be safely ignored if the number of significant digits is 3 or larger. The number of significant digits it reports is not the number of digits you can trust in the parmeter estimates. We have found that even when it reports 2 significant digits, the esitmates are accurate to 4 or more decimal places.

When the message occurs with the number of significant digits less than two, it usually indicates insufficient data for the desired model, or model overparameterization. This is sometimes accompanied by a warning about the variance-covariance matrix. If this happens, the model may need to be simplified.

In some cases, poor starting values for the parameters can cause the problems noted above. This can be solved by giving better initial values to the program when running the model. For example, if detection probabilities are very small, and the default starting values of 0.5 are far away from the final expected parmaeter values, the optimization routine may fail. The solution would be to input small initial values (on the logit scale) for the model so the optimization routine does not have to search very far. Since simpler models converge more readily than complex ones, it is usually best to start with simple models, so you have starting values for complex ones if needed.

Another indication of a problem with convergence is strange standard error values. PRESENCE may or may not print the error message described above, but the model may still be over-parameterized. It is very important to check estimates and standard errors for all models to make sure they are reasonable. Note that large standard errors of the 'real' parameters (psi,p,eps,gam,...) indicate problems, but large standard errors for the untransformed (beta) parameters are not necessarily a problem. Particularly, when a beta value is large (>10 or <-10), the standard error is usually large also.


Resources

Credits/Acknowledgments

PRESENCE was developed by Darryl MacKenzie of Proteus Research & Consulting Ltd. under contract to U.S. Geological Survey as part of their Amphibian Research and Monitoring Initiative.

Versions 2 and up of PRESENCE were developed by Jim Hines of the U.S. Geological Survey.

Currently, We don't know of any bugs in PRESENCE, although that doesn't mean there aren't any (yes, detection probability is less than 1.0!). If you find some, feel free to let us know.

Jim Hines  jhines@usgs.gov
Darryl MacKenzie  darryl@proteus.co.nz

Appendix

Covariates

Program PRESENCE makes the distinction between two types of covariates. Site-specific covariates are covariates that are constant for a site within a season. Examples would be habitat type, patch size, distance to nearest patch, or generalized weather patterns such as drought or El Niño years. Sampling-occasion covariates are covariates that may change with each survey of a site, for example local environmental conditions such as temperature, precipitation or cloud cover; time of day; or observer. Covariates are entered into the models using the logistic model.

Detection probabilities may be functions of either site-specific or sampling occasion covariates, while all other parameters may be functions of site-specific covariates only.

Sampling-occasion covariates may be missing, and are assumed to correspond to a missing detection/nondetection observation. When a covariate is being used that has missing values that do not correspond with a missing detection/nondetection observation, the detection/nondetection data is also treated as missing. Site-specific covariates can not have missing values, unless the site was never surveyed during that season.

An important note about continuous covariates! Because of the way the logit-link works, if the average value of a covariate is a long way from zero, then PRESENCE may not be able to find the true maximum likelihood estimates of the model parameters, which will give you bogus results. An indication that there might be a problem is that the estimates themselves look suspicious, the variance-covariance matrix might include a huge value, and/or you get a warning about a non-invertible variance-covariance matrix. The best approach is to transform your data onto another scale which is still meaningful to you. You could divide the covariate values by some constant (i.e., rather than entering 80% humidity as 80.0, use 0.80); subtract the average of the covariates from each observed value (i.e., X* = X - average(X's)); or some combination of the two. Such transformations can be carried out by PRESENCE (in the edit menu) or done easily with a spreadsheet and the modified values pasted back into the Data Window.

Logistic Model or Logit Link

The logistic model can be used to investigate potential relationships between probabilities (the response) and covariates (the explanatory variables), as it ensures response values stay between 0 and 1. The logistic model is defined as;

loge(y/(1-y)) = Xβ,

where y is the probability; X is a row vector containing the covariate values; and β is a column vector of coefficient values that are to be estimated. An alternative definition for the model is, y = exp(Xβ) / (1+exp(Xβ)).

Large positive values for Xβ make y tend to 1, while large negative values make y tend to 0. If Xβ = 0, then y = 0.5.


Literature Cited

1MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. A. Royle and C. A. Langtimm. 2002. Estimating site occupancy rates when detection probabilities are less than one. Ecology 83(8): 2248-2255.

2MacKenzie, D. I., J. D. Nichols, J. E. Hines, M. G. Knutson and A. B. Franklin. Estimating site occupancy, colonization and local extinction probabilities when a species is not detected with certainty. Ecology, 84, 2200-2207.

3Pledger, S. 2000. Unified maximum likelihood estimates for closed capture-recapture models using mixtures. Biometrics 56: 434-442.

4Burnham, K. P. and D. R. Anderson. 1998. Model selection and inference. Springer-Verlag, New York, USA

5MacKenzie, D.I., J.D. Nichols, M.E. Seamans and R.J. Gutiérrez. 2009. Modeling species occurence dynamics with multiple states and imperfect detection, Ecology 90, pp. 823-835.

6Royle, J.A. 2004. N-Mixture Models for Estimating Population Size from Spatially Replicated Counts. Biometrics 60, 108-115.

7Royle, J.A., and J.D. Nichols. 2003. Estimating Abundance from Repeated Presence-Absence Data or Point Counts. Ecology 84(3):777-790.

8MacKenzie, D. I., J. D. Nichols, J. A. Royle, J.A., K. Pollock, L. Bailey and J. E. Hines. 2006. Occupancy Estimation and Modeling - Inferring Patterns and Dynamics of Species Occurrence. Elsevier Publishing.

9 Nichols, J.D., L.L. Bailey, A.F. O'Connell Jr., N.W.Talancy, E.H.C. Grant, A.T. Gilbert, E.M. Annand, T.P. Husband and J.E. Hines. 2008. Multi-scale occupancy estimation and modelling using multiple detection methods. Journal of Applied Ecology 45(5):1321-1329

10 Royle, J. A. 2004. N-mixture models for estimating population size from spatially replicated counts. Biometrics 60:108-115.

11 J. E. Hines, J. D. Nichols, J. A. Royle, D. I. MacKenzie, A. M. Gopalaswamy, N. Samba Kumar, and K. U. Karanth. 2010. Tigers on trails: occupancy modeling for cluster sampling. Ecological Applications 20:1456-1466.

12 Miller, D.A., J.D. Nichols, B.T. Mcclintock, E.H.C. Grant, L.L. Bailey, and L. Weir. 2011. Improving Occupancy Estimation When Two Types Of Observational Error Occur: Non-Detection And Species Misidentification. Ecology 92:1422-1428.

13 Kendall, W.L., J.E. Hines, J.D. Nichols, and E.H. Grant. 2013. Relaxing The Closure Assumption In Single-Season Occupancy Models: Staggered Arrival And Departure Times. Ecology 94(3):610-617.

14 D.I. MacKenzie, J.D. Nichols, L.L. Bailey, and J.E. Hines 2011. An Integrated Model of Habitat and Species Occurrence Dynamics. Methods in Ecology and Evolution 2:612-622.

15 Royle, J. A., and J. D. Nichols. 2003. Estimating abundance from repeated presence-absence data or point counts. Ecology 84:777-790.

16MacKenzie, D.I., L. L. Bailey, and J.D. Nichols. 2004. Investigating species co-occurence patterns when species are detected imperfectly, Journal of Animal Ecology 73, pp. 546-555.

17 Miller, D.A.W., J.D. Nichols, J.A. Gude, L.N. Rich, K.M. Podruzny, J.E. Hines, and M.S. Mitchell. 2013. Determining occurrence dynamics when false positives occur: estimating the range dynamics of wolves from public survey data. PLoS ONE 8(6): e65808. doi:10.1371/journal.pone.0065808.

18 Chambert, T., E.H. Campbell-Grant, D.A.W. Miller, J.D. Nichols, K.P. Mulder and A.B. Brand. 2018. Two-species occupancy modelling accounting for species misidentification and non-detection. Methods in Ecology and Evolution, https://doi.org/10.1111/2041-210X.12985