MAXWELL INSTITUTE for MATHEMATICAL SCIENCES
Statistics Seminars
Abstracts


Nial Friel: Marginal likelihood estimation via power posteriors

Model choice plays an increasingly important role in Statistics. From a Bayesian perspective a crucial goal is to compute the marginal likelihood of the data for a given model. This however is typically a difficult task since it amounts to integrating over all model parameters. The aim of this paper is to illustrate how this may be achieved using ideas from thermodynamic integration or path sampling. We show how the marginal likelihood can be computed via MCMC methods on modified posterior distributions for each model. This then allows Bayes factors or posterior model probabilities to be calculated. We show that this approach requires very little tuning, and is straightforward to implement. The new method is illustrated in a variety of challenging statistical settings. This work is in collaboration with Prof Tony Pettitt, QUT, Brisbane.

Robert Puch-Solis: Statistical evaluation of partial fingerprints

Recent court challenges have highlighted the need for statistical research on fingerprint identification. This paper proposes a model for computing likelihood ratios to assess the evidential value of comparisons with any number of minutiæ. In contrast with previous research, the model, in addition to minutia type and direction, incorporates spatial relationships of minutiæ without introducing probabilistic independence assumptions. The model has a very promising discriminating power and the likelihood ratios that it provides are very indicative of identity of source. Furthermore, the model is able to support very strongly or strongly identity of source on a significant proportion of cases, even when considering configurations with few minutiæ.

Alex Cook: Return of the Giant Hogweed: modelling the invasion of Britain by a dangerous alien plant

The presentation is given as part of Heriot-Watt's mathematical biology talks and is based on the joint work of Alex Cook, Glenn Marion, Gavin Gibson. I'll be talking about the work done during the last year of my PhD as part of BioSS's contribution to the cross-disciplinary, inter-national EU-funded ALARM project. This has involved modelling the spread of alien plants in GB: one such plant is Giant Hogweed (Heracleum mantegazzianum) from SW Asia, which has been damaging Britain's biodiversity since it was introduced in the 19th C and which is dangerous to human health. After constructing a spatio-temporal stochastic model for its spread that takes account of covariates such as the heterogeneous land-cover and climate of the island, we then fit the model directly to observed data. Fitting the model was non-trivial and involved the use of Markov chain Monte Carlo techniques. The nature of the approach taken means that temporal predictions of the future spread of the weed can be made, consistent with the invasion history. This does not appear to be possible with existing statistical techniques, which assume the process has reached equilibrium. The approach we have taken can be generalised to other biological systems exhibiting stochastic variability.

Adam Butler: Conditional extremes of a Markov chain

Catastrophic events often occur when two or more processes simultaneously become extreme - for example, floods in coastal towns may only occur when tide and river levels are both unusually high, and a simple machine may only fail when all of the components within the machine fail at roughly the same time. Quantification of the risks associated with such events requires that we understand dependence between random variables at extreme levels, and the development of models for multivariate extremes is currently an area of active research within both statistics and probability. In this talk we present a new model to describe dependence at extreme levels within a Markov chain, and discuss potential statistical applications of this model. The key features of our model are that it is able to allow for asymptotic independence as well as asymptotic dependence, and that it provides a parsimonious description of extremal dependence within a fairly broad class of Markov models. The talk does not require any prior knowledge of extreme value theory.

Yuanhua Feng: A Slowly Changing Vector Random Walk Model

This talk is based on a recent joint work with Keming Yu (Brunel University, London). A multivariate random walk model with slowly changing parameters is introduced and investigated in detail. The proposed model is not jointly stationary but locally jointly stationary, where not only the drifts and the cross-covariances but also the cross-correlations between single series are allowed to change slowly over time. Hence the proposed model is particularly useful for modelling and forecasting the value of financial portfolios under very complex market conditions. Local linear and kernel approaches are proposed for estimating the drift and the cross-covariance functions, respectively. The asymptotic biases, variances and convariances of the proposed estimators are obtained. The properties of the estimated value of a given portfolio are also studied in detail. Results on the estimated prediction errors for future values of a single stock or of a portfolio are derived. Practical relevance of the proposal is illustrated by application to several foreign exchange rates.

Brian Francis: Classifying criminal activity: A latent variable approach to longitudinal event data

This work is concerned with classifying longitudinal event data for a set of individuals, specifically identifying homogeneous periods of activity within such a longitudinal time series, and in classifying these periods of activity. Such data can arise in many circumstances. The talk concentrates on offending data but such data can also occur in education and psychology, for example. While much work has been carried out related to the classification of offenders, we are interested in local classification, examining which offences co-occur at the same point in time. Latent class and other latent models will be presented, and the results illustrated by a large sample of offenders taken from the Offenders Index. The benefits and problems of such analyses will be presented.

Stijn Bierman: The uses of Bayesian image restoration techniques in the analysis of species atlas survey data with spatially varying non-detection probabilities

Species atlases are, because of their wide geographical and taxonomical coverage, one of the most important sources of information on the distribution of species over large spatial scales. These atlases are essentially databases that consist of (date-stamped) records of observed presences of plant species in cells of a regular grid that has been superimposed on the landscape. A pervasive problem in the interpretation of these data, is that it cannot be assumed that an absence of a record of a species from a grid cell means that the species does not occur somewhere in that grid cell, unless detection probabilities equal one. In addition, coverage of the grid is often uneven so that detection probabilities can be expected to be spatially varying.

We propose to use Bayesian image restoration techniques to incorporate spatially varying detection probabilities in the analysis of species atlas data. These techniques are based upon the parameterisation of presence/absence models using Markov Chain Monte Carlo techniques, while treating the occupancy states of grid cells with no recorded presences as unobserved (latent) variables. I will present several examples of the application of this methodology in the analysis of the British and German atlases of vascular plants.

Christl Donnelly: Bovine TB: Science, policy and dogma

Recently publications have helped to resolve contradictions in past evidence on the impact of badger culling on TB incidence in cattle. In this controversial area, mathematical and statistical analyses are providing valuable new insights into the complex disease system involving cattle, wildlife and potentially humans.

Donnelly CA et al. Positive and negative effects of widespread badger culling on tuberculosis in cattle. Nature advance online publication, 14 December 2005 (doi:10.1038/nature04454). http://www.nature.com/nature/journal/vaop/ncurrent/pdf/nature04454.pdf

Woodroffe R, Donnelly CA et al. Journal of Applied Ecology online early, 14 December 2005 (doi:10.1111/j.1365-2664.2005.01144.x). http://www.blackwell-synergy.com/doi/abs/10.1111/j.1365-2664.2005.01144.x

Cox DR, Donnelly CA, et al. Simple model for tuberculosis in cattle and badgers. PNAS, 102, 17588-17593, 6 December 2005. http://www.pnas.org/cgi/content/abstract/102/49/17588

Nick Fieller: Statistical facial identification

There are many existing systems for automatic facial recognition which select the best available match to a questioned image of a human face to one or more images selected from a database of known people. These systems are successful and widely used in areas such as security surveillance. However, they do not attempt to provide any quantitative measure of quality of match but only give the best available match. In this respect they fall short of a facial identification which can give evidential information of use in a court of law. The work described here provides a statistically based method which can remedy these defects. It is based on landmark identification of facial features and routine techniques of shape analysis to provide measures of inter and intra variability of measured facial features, thus allowing a more statistical assessment of facial identification.

Nuala Sheehan: Mendelian randomisation: causal inference using instrumental variables

In epidemiological research, the effect of a potentially modifiable phenotype or ``exposure'' on a ``disease'' is often of public health interest. Inferences on this effect can be distorted in the presence of confounders affecting both phenotype and disease. Issues of confounding require causal rather than associational arguments. Mendelian randomisation (see Davey Smith & Ebrahim 2003, for example) is a method for deriving unconfounded estimates of such causal relationships and basically exploits the fact that a gene known to affect the phenotype can often be reasonably assumed not to be itself associated with any confounding factors and thus has an indirect effect on the disease. It is well known in the economics and causal literature (e.g. Pearl, 2000) that these properties define an instrumental variable but they are minimal in the sense that they only permit unique identification of the causal effect of the phenotype on the disease status in the presence of additional and fairly strong assumptions. These assumptions relate to the distributions of the variables e.g. multivariate normality, and the nature of the dependencies between them, e.g. linear. These assumptions are explored in the context of standard epidemiological applications and the ideas illustrated using directed acyclic graphs.

David Elston: Applications of mixed models in ecology

Ecological data sets are often highly structured, and so complex modelling of random effects is necessary to make valid inferences. Fortunately, ecologists as a whole have been receptive to the need to use mixed models so that random variation is properly described, creating opportunities for  fruitful collaborations with statisticians. After giving a brief overview of random effect models and their estimation using the method of residual maximum likelihood, I will describe a number applications I have developed in my scientific collaborations that raise interesting methodological issues. These will include: a generalised linear mixed model for overdispersed counts of ticks on grouse chicks; a random coefficients model for within-year growth rate of sand-eels; using random effects to smooth an ordered sequence of regression coefficients; and a multivariate random effect models for compositional data on diet selection.

David Hand: Size matters

The ideas of measurement are so ubiquitous that we often fail to notice them: they are simply parts of the conceptual universe in which we function. However, it has not always been thus and sometimes, even now, rips in this usually unnoticed background fabric appear, casting doubts on one's view of the way the world works. Occasionally these tears have serious, even fatal consequences. This talk looks at the conceptual infrastructure of quantification, showing how humans have constructed it, how it can be interpreted, and how it is manipulated to make valid inferences about the real world. The talk is illustrated with measurement tools from psychology, medicine, physics, economics and other areas.

Mark Heiler: A nonparametric regression cross spectrum for multivariate time series

We consider dependence structures in multivariate time series that are characterized by deterministic trends. Results from spectral analysis for stationary processes are extended to deterministic trend functions. A regression cross covariance and spectrum are defined. Estimation of these quantities is based on wavelet thresholding.

Adam Butler: Analysing trends in the magnitude & frequency of extremes events

Extreme value theory provides a relatively robust, asymptotically motivated, basis for drawing statistical inferences about the magnitude & frequency of events which are extreme and rare. Extreme value methods are widely used in discplines such as hydrology, finance and engineering to quantify the risks associated with catastrophic events such as the breach of a sea wall, the failure of a component within a machine, or a collapse in the value of a portfolio of investments.

Primary interest often lies in describing changes in the magnitude and frequency with which extreme events occur, and we review parametric and nonparametric approaches to the detection and quantification of trends in the characteristics of extreme events. We illustrate the different approaches by analysing changes in the characteristics of North Sea storm surges during the second half of the 20th century, a substantive application which requires us to adapt and extend the existing statistical methodology.

Alex Cook: Optimal design of experiments in stochastic population processes

When designing an epidemiological experiment in which we expose plants or animals to disease, we make certain choices---number of hosts, timing of observations, duration of the experiment, etc---and typically would be guided by intuition and resource constraints only. We have shown previously that this may result in considerable wastage of resources.

In this talk I introduce the topic of design of experiments for stochastic systems---such as epidemics---that are highly non-linear, and have been heretofore neglected by much of the optimal design literature. We adapt the idea introduced by Müller and colleagues to sample from the utility space using MCMC, in conjunction with inferential moment closure techniques developed by Krishnarajah and colleagues. We show how observation times may be selected to maximise the expected information from two stochastic systems---a death process (for which existing results have been found) and an SI epidemic, which has been used previously to model a plant-pathogen system studied by our collaborators in Cambridge.

Aris Perperoglou: Overdispersion modelling with individual deviance effects and penalized likelihood

Overdispersion is common when modelling discrete data like counts or fractions. We propose to introduce and explicitly estimate individual deviance effects (one for each observation), constrained by a ridge penalty. This turns out to be an effective way to absorb overdispersion, to get correct standard errors and to detect systematic patterns. Large but very sparse systems of penalized likelihood equations have to be solved. We present fast and compact algorithms for fitting, estimation of standard errors and computation of the effective dimension. We will present our methodology with applications to counts, binomial, survival data as well as smoothing of mortality surfaces.

Roger Koenker: Parametric Links for Binary Response Models (jointly with Jungmo Yoon)

The familiar logit and probit models provide convenient settings for many binary response applications, but a larger class of link functions may be occasionally desirable. Two parametric families of link functions are suggested: the Gosset link based on the Student t latent variable model with the degrees of freedom parameter controlling the tail behavior, and the Pregibon link based on the (generalized) Tukey $\lambda$ family with two shape parameters controlling skewness and tail behavior. Both Bayesian and maximum likelihood methods for estimation and inference are explored. Implementations of the methods are illustrated in the R environment.

John Nelder: Double hierarchical generalized linear models, a broad class with a new fitting algorithm based on extended likelihood

The class of double hierarchical generalized linear models is based on GLMs, and allows three extensions: (1) Joint estimation of models for mean and dispersion (2) Random effects in the linear predictor for the mean with a distribution that is any conjugate of a GLM distribution (3) Random effects similarly for the linear predictor of the dispersion model Spline terms and correlations between random effects may be specified. Fitting is done by maximizing a form of extended likelihood called the h-likelihood. The algorithm reduces to fitting an interconnected set of GLMs. The algorithm does not require prior probabilities, quadrature, or the EM algorithm, and is much faster than many existing methods.

Chris Jones: Parametric Families of Distributions and, Inter Alia, Nonparametric Quantile

The more major part of this talk is about conceptual and theoretical aspects of some new parametric families of distributions and the more minor part is about some of their intriguing practical ramifications for "nonparametric" quantile estimation and regression. I will first describe a couple of novel ways of generating three- and/or four-parameter families of continuous univariate distributions. These will afford skewness and the first a variety of tail weights, heavy if required, with obvious consequences for providing alternatives for practical statistical modelling, circumventing the need for some of the more ad hoc methods of robust statistics. I will then consider some aspects of the interaction between the second of these families of distributions and kernel-based quantile estimation and quantile regression. While I won't be addressing problems in either Finance or Actuarial Science directly, heavy-tailed distributions have particular links with the former and nonparametric regression with the latter!

Tereza Neocleous: A likelihood ratio approach to evidence evaluation in the form of discrete data

Likelihood ratios provide a natural way of computing the value of evidence under competing propositions. Models for likelihood ratios have been developed for continuous data (e.g. glass refractive index). Such methods are also desirable for discrete data such as pollen counts or gun-shot residue particles. Challenges in development of discrete models include the presence of zeros and the lack of sufficient amounts of background data. In this talk a number of approaches to obtaining likelihood ratios for count data will be discussed and illustrated using pollen data.

Adrian Bowman and Sarah Barry: Statistics with a Human Face

Stereo-photogrammetry provides high-resolution data defining the    shape of three-dimensional objects.  One example of its application    is in a collaborative study of the growth of children's faces.  The    clinical aims of the study are to describe the facial shape and    growth of healthy children and to contrast this with the shape and    growth of children who have been born with a cleft lip and/or palate    and who have subsequently undergone surgical repair.  Information can    be extracted in a variety of forms.  Methods of analysing landmark    shape data are well developed but landmarks alone clearly do not    adequately represent the very much richer information present in each    digitised face.  Facial curves with clear anatomical meaning have    also been extracted.  In order to exploit the full extent of the    information present in the images, standardised meshes, whose nodes    correspond across individuals, have also been fitted.  Some of the    issues involved in analysing data of these types will be discussed    and illustrated on the facial growth study.  These include graphical    exploration, the measurement of asymmetry and longitudinal modelling.

Iain Currie: From Yates algorithm to multidimensional smoothing with GLAM

Data with an array structure are common in statistics. An early example is the factorial design and Yates (1937) gave an efficient algorithm for computing the factorial effects in such a design. The generalized linear model (GLM) of Nelder & Wedderburn (1972) gives a unified approach to analysing regression problems with non-normal error structure. However, this analysis ignores any array structure in the data or the model. We develop an arithmetic of arrays which generalizes Yates algorithm and which allows us to define the expectation of a data array as a sequence of linear operations on a coefficient array. This arithmetic also leads to low storage, high speed computation in the scoring algorithm of the GLM. We call such a model a generalized linear array model or GLAM. We apply the method to the smoothing of multidimensional arrays. Some examples are presented.

This seminar will take place at 3.15 pm in Room 6206, James Clerk Maxwell Building at the King's Buildings site in Mayfield Road. Tea and coffee will be available after the seminar in the Mathematics School, Staff Common Room (5212).

Ruth King: A Bayesian S.tudy of O.rnithological S.urveys

Over recent years, there has been increasing concern relating to many wildlife species leading to surveys being undertaken to study many of these populations. We will focus on data typically collected on UK bird populations: survey data and ring-recovery data. Interest lies is both estimating the change in population size over time, and identifying the factors that contribute to this changing population.

We consider a state-space approach to take into account that the survey data are only estimates of the total population size. In addition we demonstrate the increased precision that can be obtained when jointly analysing survey data with ring-recovery data often available. Finally, we wish to discriminate between competing biological hypotheses, in order to explain the changes in population size over time. We consider a Bayesian approach and use reversible jump MCMC to simultaneously explore model and parameter space to obtain both parameter estimates and posterior model probabilities. The methods are applied to a real data set relating to the UK lapwing population and a variety of interesting results presented.

David Elston: Some ecological applications of models with random effects

Most scientific studies concentrate on factors affecting the mean value of a response variable, described by the fixed effects in a mixed model. The role of random effects is then to ensure correct inference about the fixed effects. However, in some situations, the scientific hypothesis relates not to the fixed effects but to the variances of the random effects, whilst in other situations it is the random effect assumption that makes useful modelling possible. I will describe analyses of some ecological data sets to illustrate these possibilities, making some observations about random effect models including the relationship between inference based on REML estimation and MCMC sampling from the full likelihood.

Paul Garthwaite: Selection of weights for weighted model averaging

Suppose a quantity is to be predicted and various models could be used. The approach in model selection is to use just the prediction of the model that appears to be best. An alternative is to form a weighted average of the predictions given by the different models. But what weights should be given to the different models? Should the weight given to a model be reduced if it is very similar to another model? What if two models are virtually identical - should they each be given half the weight that they would otherwise receive?

This talk considers methods of assigning weights on the basis of the correlation structure between models. Different weighting strategies are proposed and desirable properties in a weighting scheme are suggested. Simulation is used to compare the weighting schemes in situations where optimal weights can be determined.

Frank Ball: Statistical inference for epidemics within a population of households

This talk is concerned with a stochastic model for the spread of an SIR (susceptible → infective → removed) epidemic among a closed, finite population that contains several types of individual and is partitioned into households. A pseudolikelihood framework is presented for making statistical inference about the parameters governing such epidemics from final outcome data, when possibly only some of the households in the population are observed. The framework includes parameter estimation, hypothesis tests and goodness-of-fit. Asymptotic properties of the procedures are derived when the number of households in both the sample and the population are large, which correctly account for dependencies between households. The methodology is illustrated by applications to data on a variola minor outbreak in Sao Paulo and to data on influenza outbreaks in Tecumseh, Michigan. (joint work with Owen Lyne)

David Firth: Working with over-parameterized models

A parametric representation of a statistical model may involve some redundancy; that is, the mapping from parameter space to family of distributions may be many-to-one. Such over-parameterized representations are often very useful conceptually, but can cause computational and inferential problems (ridges in the likelihood, non-estimable parameter combinations). For linear and generalized-linear models, well-known approaches use either a reduced basis or a generalized matrix inverse. In this talk I will discuss how to work with over-parameterized nonlinear models. Aspects covered will include maximum-likelihood computation, detection of non-identifiability, and presentation of results. Some implications for Bayesian analysis will also be touched upon. The work is motivated by the design of an R package to specify and fit general regression models involving multiplicative interaction terms; these include the (G)AMMI models that are used for example in crop science to represent genotype-by-environment effects, as well as various much-used models for categorical data in social research.

Peter Avery: The analysis of a family study ascertained through hypertensive probands

The problems that can arise in the analysis of a large family study will be discussed and exemplified using a study where various quantitative variables associated with hypertension were measured. Possible solutions will be discussed. Issues examined will be: normalisation; adjustment for covariates; allowance for the effect of drug treatment; possible biases in assays; multiple testing problems when looking at genome wide scans and candidate genes; the problem that some genetic markers (SNPs) are more informative than others and the impact this has for localisation of a possible gene affecting a character. (Joint work with Bernard Keavney)

Agostino Nobile: Bayesian analysis of finite mixtures using the allocation sampler

Statistical finite mixtures have been receiving much attention as a conceptually simple way of relaxing distributional assumptions. Within the Bayesian approach, MCMC methods, notably Green's reversible jump methodology, have made feasible to estimate finite mixtures with an unknown number of components. Some issues remain open, among them are the choice of prior distribution for the number k of mixture components, and the necessity of designing efficient reversible jump moves for each parametric family of components being mixed.

In this talk I will provide support for the use of a Poi(1) prior distribution for k. I will also present a new MCMC scheme, the allocation sampler, which can be applied, with minimal changes, to any family of component distributions, under the assumption that the component parameters can be integrated out of the model analytically. Artificial and real data sets will be used to illustrate the method. In part, this is joint work with Alastair Fearnside.

Alexander McNeil: Statistical inference for dependent defaults and credit migrations

Any portfolio credit risk model that is to be used to calculate a loss distribution associated with defaults and changes in rating must address the challenge of modeling dependent defaults and dependent rating migrations. Most industry models (such as KMV, CreditMetrics, CreditRisk+) incorporate mechanisms for modeling this dependence, generally by assuming conditional independence of defaults and migrations given common economic factors. However, the calibration of these mechanisms is often quite ad hoc, despite the fact that the tail of the portfolio loss distribution is extremely sensitive to small changes in the parameters governing dependence.

We consider the problem of making formal statistical inference for such models based on historical default and rating migration data. In the solution we propose, portfolio credit models are represented as generalized linear mixed models (GLMMs) and inference is made using Markov chain Monte Carlo (MCMC) techniques. This general framework allows quite complex models where the random effects essentially play the role of unobserved latent factors influencing default and migration rates; to capture economic cycle effects the latent factors are allowed to have a dynamic time-series structure. An empirical study of Standard and Poors data shows strong evidence for economic cycles and also reveals pronounced sectoral heterogeneity in default and migration rates.

Andrew Patton: Data-Based Ranking of Realised Volatility Estimators

I propose a feasible method of ranking RV estimators based on actualreturns data. In contrast, most rankings of RV estimators currently in the literature are either graphical in nature, most notably the "volatility signature plot", or rely on asymptotic approximations of the mean-squared errors of the estimators, or on simulations. The proposed method relies on the existence of a volatility proxy that is unbiased for the variable of interest, and satisfies a certain "zero correlation" condition. The zero correlation condition has some similarities with instrumental variables estimation. The volatility proxy must be unbiased but it does not need to be very precise; a simple and widely-available proxy for conditional variance is the daily squared return. In an application to IBM equity return volatility, I find that a simple realised volatility estimator based on 5-minute returns performs as well as a wide variety of competing estimators.

Amanda Hepler: Object-oriented graphical representations of complex patterns of evidence

(not available)

Catalina Stefanescu: The credit rating process: a Bayesian approach

Credit risk arises as obligors have a likelihood of defaulting on pre-arranged payments. The New Basel Capital Accord has set a new framework for credit risk management for financial institutions, but its implementation in any financial institution raises a number of technical modelling challenges. In this talk we focus on a new methodology for modeling and estimating transition probabilities between credit classes in a bank's rating system. First, we develop a new statistical model that describes the typical credit rating process that most major banks employ. Second, we describe a Bayesian hierarchical framework for model calibration, using Markov Chain Monte Carlo techniques implemented through Gibbs sampling. This approach allows us to address the technical issues related to the estimation of default probabilities from low default portfolios. Third, we apply this methodology to the analysis of an extended rating transitions data set from Standard and Poor's between 1981--2004, and we examine both the in-sample and out-of-sample performance of the credit rating process models. The results of this research provide a framework that banks and other financial institutions can use to show that their internal rating systems produce estimates of rating transition probabilities that are reasonable from a regulatory perspective.

Shiqing Ling: A General Approach to Goodness-of-fit Tests for Time Series Models

We provide a general approach to goodness-of-fit tests across a wide variety of time series models, including for example linear models, nonlinear models, short memory models, long memory models, constant-variance models and ARCH/GARCH models. The test statistic is generically based on a linear transformation of a score-marked empirical process, which converges to the maximum of the square of a standard Brownian motion under the null hypothesis; its critical values are easy to obtain. Our test has nontrivial local power under local alternative hypotheses. (This is a joint work with Howell Tong in London School of Economics and Political Science.)

Christine Hackett: Linkage Analysis in a Mixed Population of Blackcurrant: Some Statistical Detective Work

The estimation of a linkage map of molecular markers is a prerequisite of studies to locate genes affecting important quantitative traits. The estimation is straightforward if markers can be scored on a population derived from a cross between two inbred parents, but this is not possible in many plant species, especially bushy or tree species. This talk focuses on the analysis of a mapping population in one such species, blackcurrant, and uses some exploratory statistics and simple genetic models to uncover some interesting features of the population.

Ashay Kadam: Issuer Heterogeneity in Credit Ratings Migration

Sources of heterogeneity in rating migration behavior are explored using a continuous time Markov chain based framework. Working in continuous time circumvents the embedding problem, mitigates the censoring effect and facilitates term structure modelling with arbitrary prediction horizons. Classical estimation provides ample evidence of heterogeneity. However, adopting a Bayesian estimation procedure can help mitigate the problems arising from data sparsity and reduce estimation error. The transition probability matrices estimated for different issuer profiles can be quite different from each other. Using the CreditRisk+ framework, and a sample credit portfolio, it can be shown that ignoring heterogeneity may give erroneous estimates of VaR and a misleading picture of the risk capital.

Jan Hoem: The Second Demographic Transition in countries in transition, as seen by an extension of a demographic method of longitudinal analysis

In a study of young women's start of first conjugal unions in four selected countries in Central and Eastern Europe (Russia, Romania, Bulgaria, and Hungary) in 1980 through 2004, based on data from the national Gender and Generations Surveys, we use an extension of piecewise-constant hazard regression to analyze jointly the competing risks of entry into a cohabitational and a marital union. This extension allows us to compare trends over time and relative risks of covariates ACROSS the two competing risks, a comparison which is infeasible otherwise. In this manner we find, among many other things, that marriage risks have dropped dramatically in all four countries since well before the fall of communism and thus before the societal transition to a market economy got underway around 1990. There has also been a counterpart increase in cohabitation in Russia, Romania, and Hungary, which is a clear manifestation of the Second Demographic Transition in these countries. In Bulgaria, entry rates into cohabitation have been surprisingly stable during the last twenty years of the twentieth century, and they actually FELL during the last years of our period of observation. This does not preclude that the Second Demographic Transition has reached Bulgaria also, as is seen when we discover that rates of conversion of cohabitations into marriages have declined radically ever since the 1980s, which means that consensual unions have lasted progressively longer. Evidently, the Second Demographic Transition is not a unitary movement that reached all countries in Central and Eastern Europe roughly at the same time; it is present but its manifestation depends on national circumstances.

Iain Currie: Smooth models of mortality with period shocks

We suppose that we have mortality data arranged in two-way tables of deaths and exposures classified by age at death and year of death.  It is natural to suppose that there is a smooth underlying force of mortality, the mortality surface, that varies with age and year (or period).  However, observed mortality is subject to more than stochastic deviation from this smooth surface; for example, flu epidemics, hot summers or cold winters can disproportionately effect the mortality of certain age groups in particular years.  We call such an effect a period shock.

We describe the mortality surface with an additive model with two components: the underlying smooth surface is modelled with 2-dimensional $P$-splines; the period shocks
are modelled with a 1-dimensional $P$-spline in the age direction for each year.  This large regression model is expressed as an additive generalized linear array model (Currie, Durban and Eilers, 2006).  The method is illustrated with Swedish mortality data taken from the Human Mortality Database.

Anthony Pettitt: Statistical Inference for Assessing Infection Control Measures for the Transmission of Pathogens in Hospitals

Patients can acquire infections from pathogen sources within hospitals and certain pathogens appear to be found mainly in hospitals. Methicillin-resistant Staphylococcus Aureus (MRSA) is an example of a hospital acquired pathogen that continues to be of particular concern to patients and hospital management. Patients infected with MRSA can develop severe infections which lead to increased patient morbidity and costs for the hospital. Pathogen transmission to a patient occurs indirectly via health-care workers that do not regularly perform hand hygiene. Infection control measures that can be considered include quarantine for colonised patients and improved hand hygiene for health-care workers.

The talk develops statistical methods and models in order to assess the effectiveness of the two control measures (i) isolation and (ii) improved hand hygiene. For isolation, data from a prospective study carried out in a London hospital is considered and statistical models based on detailed patient data are used to determine the effectiveness of isolation. The approach is Bayesian and involves Monte Carlo sampling.

For hand hygiene it is not possible, for ethical and practical reasons, to carry out a prospective study to investigate various levels of hand hygiene. Instead hand hygiene effects are investigated by simulation using parameter values estimated from data on health-care worker hand hygiene and weekly colonisation incidence collected from a hospital ward in Brisbane. Utilising a deterministic model for vector borne transmission of diseases, a Markov model is developed and used to estimate important transmission parameters. Unfortunately for one transmission parameter there is little information available and an alternative approach based on the deterministic model eliminates this parameter so allowing the effects of changing hand hygiene to be investigated using simulation.

Conclusions about the effectiveness of the two infection control measures will be discussed and, from a modelling point of view, some conclusions will be made contrasting simulation models with statistical studies.

The talk involves collaborative work with Marie Forrester, Emma McBryde, Ben Cooper, Gavin Gibson and Sean McElwain.

Rainer Schulz: Renting versus Owning and the Role of Income Risk: The Case of Germany

For most households, choosing whether to rent or buy a home is a difficult, multifaceted problem. Not only do households have to grapple with the uncertainties of future movements of rents and house prices and the substantial cost of changing residence. Housing tenure decisions are further complicated if households' exposure to labour income risk varies across occupations, industries and regions. Then, potential correlations with these background risks may influence the rent or buy decision. In this study, we present preliminary empirical evidence, derived from the German Socio-Economic Panel (GSOEP), that both labour income growth and rent growth varies across industries and regions. We find that income-rent correlations have a statistically significant influence on industry-specific average rental shares in West-German federal states. However, the economic significance of the relationship between real rent growth and real income growth on the decision to rent or own is rather low. A one standard deviation of the income-rent correlation implies an increase in rental shares of about 1.75 percentage points. (Work in collaboration with Martin Wersing and Axel Werwatz).

Malcolm Faddy: Analysing hospital length of stay data: models that fit, models that don't and does it matter?

Hospital length of stay data typically show a distribution with a mode near zero and a long right tail, and can be hard to model adequately. Traditional models include the gamma and log-normal distributions, both with a quadratic variance-mean relationship. Phase-type distributions which describe the length of time to absorption of a Markov chain with a single absorbing state also have a quadratic variance-mean relationship. Covariates of interest include an estimate of the length of stay for an uncomplicated admission, with excess length of stay modelled relative to this quantity either multiplicatively or additively. A number of different models can therefore be constructed, and the results of fitting these models will be discussed in terms of goodness of fit, significance of covariate effects and estimation of quantities of interest to health economists.

Peter Hall: Robustness of Multiple Hypothesis Testing Procedures Against Dependence

Problems involving classification of high-dimensional data, and ‘highly multiple’ hypothesis testing, arise frequently in the analysis of genetic data and complex signals. In this talk we show that, in the context of multiple hypothesis testing, the assumption of independence is much less of an issue in high-dimensional settings than in conventional, low-dimensional ones. This is particularly true when the null distributions of test statistics are relatively light-tailed, for instance when they can plausibly be based on Normal approximations. These issues are related to the `upper tail independence' property, which is familiar in problems involving risk analysis. Similar methods and ideas also lead to new insights for heavy-tailed data.

Guy Nason: Costationarity and Tests of Stationarity for Locally Stationary Time Series with Applications to Econometrics

Many real-world time series are often assumed to be stationary even when they are not. Sometimes this has disastrous consequences. This talk introduces some new tests for time series stationarity. Given two time series it is often interesting to ask whether there is any association between them. Various methods have been invented to ask this question (mostly for stationary series): cross-correlation, cross-spectral analysis and cointegration. We introduce a new concept, called costationarity, which looks for linear combinations of locally stationary time series that are stationary. If two time series are costationary then there exists a non-trivial, stochastic relationship that can be exploited. We explain how our costationarity determination works and apply it to the FTSE100 and SP500 time series and show how the log-returns of these series are costationary.

Stephen Richards: Applying survival models to pensioner mortality

Data from insurance portfolios and pension schemes lend themselves particularly well to the application of survival models. In addition to the traditional actuarial risk-rating factors of age, gender and policy size, we find that the use of geodemographic profiles based on postcode provide a major boost in explaining risk variation. The use of heterogeneity or frailty models can determine whether there are further rating factors supported by a data set. The use of bootstrapping can determine if there might be further financially important variation not accounted for in a mortality model, while the use of weights in model-fitting can help limit any mis-statement of financial risk.

David Spiegelhalter: League tables, rankings and regression-to-the-mean: using animations and graphics for public communication

to follow

Natalia Bochkina: Univariate hypotheses testing for gene expression data in a hierarchical Bayesian framework

To compare the means between two conditions (such as disease versus healthy samples) for a large number of variables given a small number of replicates, we consider two types of Bayesian hierarchical models: with noninformative ("objective") and mixture priors for the difference between the means. In the mixture model, we study sensitivity to the choice of prior on simulated data and choose the best model using mixed posterior predictive checks. In the model with noninformative prior, we propose to conduct the inference using adaptive interval hypothesis testing where the interval depends on variability of each variable. These approaches will be illustrated on gene expression data sets produced by BAIR consortium (www.bair.org.uk). (Joint work with Alex Lewin and Sylvia Richardson, Imperial College London)

David Spiegelhalter: League tables, rankings and regression-to-the-mean: using animations and graphics for public communication

to follow

John Haslett: Space-Time Modelling in the Reconstruction of the European Palaeoclimate

The authors are engaged in an ambitious research programme whose objective is the use of statistical models to make inferences concerning the climate of Europe for the past 15,000 years. We have reported in detail Haslett et al (2006) on the methods used to reconstruct climate at a single site - Sluggan Moss in Ireland. The modelling approach is that of Bayesian Space-Time processes. The basic data are multivariate counts of pollen at different depths in lake sediment. Modern data on climate and vegetation provide the essential 'training' information which allows inference on past climate. The essential hypothesis is that climates at times past in eg Ireland corresponds to climates somewhere on the Earth in modern times.

Considerable uncertainty surrounds the entire exercise: radiocarbon dating and sediment depth allow inference about the calendar age of the samples; the multivariate 'response' of vegetation (and thus pollen) to multivariate climate does not lend itself to parametric modelling; pollen counts are hugely zero-inflated and are also subject to sampling variation; several species comprise sub-types which favour separate climate regimes, but have pollen that are indistinguishable. The likelihoods for pollen given palaeo-climate are thus rich with challenge. A joint prior for palaeoclimate space-time history provides possibilities, when inferring the palaeoclimate corresponding to a single sample, for borrowing strength from other samples at other locations in space-time. The essential feature of the joint prior is a model in which climate changes 'smoothly' in space and time; however there is strong evidence of occasional very rapid episodes of past climate change.

The computational methodology relies on MCMC and, for large models such as this, convergence is not assured even by very long runs, of days and weeks. Here we report on progress with (a) avoiding MCMC and (b) uncertainties in modelling chronologies. Here we focus exclusively on the time domain and discuss reconstruction at a single site.

David Banks: Inference on Graphs, Trees, and Partitions

In a surprising range of situations, one encounters samples in which the observations are combinatorial objects (e.g., in social network theory, cluster analyses, multiple protein phylogeny, and card-sorting experiments). This talk shows how a number of standard statistical goals, such as inference on centre and dispersion, confidence regions, and a version of the linear model, can all be realized in such circumstances.

Caitlin Buck: Estimating radiocarbon calibration curves

In addition to being crucial to the establishment of archaeological chronologies, radiocarbon dating is vital to the establishment of time lines for many Holocene and late Pleistocene palaeoclimatic studies and palaeoenvironmental reconstructions. The calibration curves necessary to map radiocarbon to calendar ages were originally estimated using only measurements on known age tree-rings. More recently, however, the types of records available for calibration have diversified and a large group of scientists (known as the IntCal Working Group, IWG) with a wide range of backgrounds has come together to create internationally-agreed estimates of the calibration curves. In 2002, I was recruited to the IWG and asked to offer advice on statistical methods for curve construction. In collaboration with Paul Blackwell, I devised a tailor-made Bayesian curve estimation method which was adopted by the IWG for making all of the 2004 internationally-agreed radiocarbon calibration curve estimates. In this talk I will report on that work and on the on-going work that will eventually provide models, methods and software for rolling updates to the curve estimates.


Thomas Richardson: Estimation of the relative risk and risk difference

I will first review well-known differences between odds ratios, relative risks and risk differences. These results motivate the development of methods, analogous to logistic regression, for estimating the latter two quantities. I will then describe simple parametrizations that facilitate maximum-likelihood estimation of the relative risk and risk-difference. Further, these parametrizations allow for doubly-robust g-estimation of the relative risk and risk difference.


Tim Bedford: Constructing multivariate minimum information distributions with copulas

The presentation looks at ways to build multivariate distributions given only a partial specification. The problem of constructing such distributions arises naturally in risk analysis when using expert generated data. The basic tool used to build up these distributions is the copula, a model for bivariate dependency. Vines provide a structure to extend bivariate models to multivariate models, incorporating additional information about multivariate dependency. We consider ways of eliciting information about the copulas and how the minimum information principle can be used to find specific distributions meeting the partial specifications. This talk covers joint work of the presenter together with collaborators Roger Cooke, Hans Meeuwissen, Daniel Lewandowski and Dorota Kurowicka over a number of years.


Murray Lark: Spatial variation in the soil: pollutants, greenhouse gases and forensic intelligence

Properties of the soil are influenced by processes that occur over a wide range of scales from the molecule to the globe. As a result the spatial variation of soil is substantial, and poses a challenge if we want to predict its behaviour or monitor its condition. In this seminar I will show how geostatistical methods, based on the assumption that soil properties can be treated as realizations of spatially correlated random functions, have been applied to a range of problems in soil science. The assumptions of stationarity in the variance, which underpin most standard geostatistical methods, can be relaxed when estimation and prediction is recast in terms of the linear mixed model. I shall show how geostatistical approaches have been applied to problems such as mapping pollutants around a smelter, predicting greenhouse gas emissions from the soil and using a rich soil database to provide forensic intelligence.


Tertius de Wet: LULU Smoothers and Some of their Distributional Properties

LULU smoothers is a class of non-linear smoothers based on compositions of minima and maxima over different window sizes. They have been shown to possess very attractive mathematical properties compared to other non-linear smoothers - see e.g. Rohwer [2]. Although they have been studied fairly extensively for their mathematical properties, little attention has been given to their distributional and statistical properties. However, some progress has recently been made with the latter - see Conradie, de Wet and Jankowitz [1]. In this talk LULU smoothers will be introduced and a short review given of their mathematical properties. We will then give some new results on the distributions of these smoothers, for the most simple ones as well as for more complex ones in the class. Furthermore, we will derive some asymptotic results of the smoothers when the window size tends to infinity. These limiting distributions are given in terms of the class of extreme value distributions. We also give its limiting behaviour in terms of that of the second largest order statistic. Finally, some numerical and simulation results will also be given to illustrate their behaviour.

Tony Robinson: The Popularity and problems of Mixture Modelling

Over the last decade or so, interest in using mixtures of statistical models for all types of applications has risen rapidly from both the frequentist and Bayesian points of view. What is so appealing about such models? Are they always good news? Mainly through examples, I shall discuss the reasons for this interest and try to point out what you might gain, what you might lose and the care you should take when adopting a mixture modelling approach.

Mark Briers: Particle smoothing for non-linear non-Gaussian state space models

Two-filter smoothing is a principled approach for performing optimal smoothing in non-linear non-Gaussian state-space models where the smoothing distributions are computed through the combination of `forward' and `backward' time filters. The `forward' filter is the standard optimal Bayesian filter but the `backward' filter, generally referred to as the backward information filter, is not a probability measure on the space on the hidden Markov process. In cases where the backward information filter can be computed in closed form, this technical point is irrelevant. However, for general state-space models where there is no closed form expression, this prohibits the use of flexible numerical techniques such as Sequential Monte Carlo (SMC) to approximate the two-filter smoothing formula. We propose here a generalised two-filter smoothing formula which only requires approximating probability distributions and applies to any state-space model, removing the need to make restrictive assumptions used in previous approaches to this problem. SMC algorithms are proposed to implement this recursion and we illustrate their performance on various problems.

Dimitris Fouskakis: Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of health care

In the field of quality of health care measurement, one approach to assessing patient sickness at admission involves a logistic regression of mortality within 30 days of admission on a fairly large number of sickness indicators (on the order of 100) to construct a sickness scale, employing classical variable selection methods to find an "optimal" subset of 10--20 indicators. Such "benefit-only" methods ignore the considerable differences among the sickness indicators in cost of data collection, an issue that is crucial when admission sickness is used to drive programs (now implemented or under consideration in several countries, including the U.S. and U.K.) that attempt to identify substandard hospitals by comparing observed and expected mortality rates (given admission sickness). When both data-collection cost and accuracy of prediction of 30-day mortality are considered, a large variable-selection problem arises in which costly variables that do not predict well enough should be omitted from the final scale.

In this work (a) we develop a method for solving this problem based on posterior model odds, arising from a prior distribution that (1) accounts for the cost of each variable and (2) results in a set of posterior model probabilities which corresponds to a generalized cost-adjusted version of the Bayesian information criterion (BIC), and (b) we compare this method with a decision-theoretic cost-benefit approach based on maximizing expected utility. We use reversible-jump Markov chain Monte Carlo (RJMCMC)  methods to search the model space, and we check the stability of our findings with two variants of the MCMC model composition (MC3) algorithm. We find substantial agreement between the decision-theoretic and  cost-adjusted-BIC methods; the latter provides a principled approach to
performing a cost-benefit trade-off that avoids ambiguities in identification of an appropriate utility structure. Our cost-benefit approach results in a set of models with a noticeable reduction in cost and
dimensionality, and only a minor decrease in predictive performance, when compared with models arising from benefit-only analyses.
Two-filter smoothing is a principled approach for performing optimal smoothing in non-linear non-Gaussian state-space models where the smoothing distributions are computed through the combination of `forward' and `backward' time filters. The `forward' filter is the standard optimal Bayesian filter but the `backward' filter, generally referred to as the backward information filter, is not a probability measure on the space on the hidden Markov process. In cases where the backward information filter can be computed in closed form, this technical point is irrelevant. However, for general state-space models where there is no closed form expression, this prohibits the use of flexible numerical techniques such as Sequential Monte Carlo (SMC) to approximate the two-filter smoothing formula. We propose here a generalised two-filter smoothing formula which only requires approximating probability distributions and applies to any state-space model, removing the need to make restrictive assumptions used in previous approaches to this problem. SMC algorithms are proposed to implement this recursion and we illustrate their performance on various problems.

Ioannis Ntzoufras (Athens University of Economics and Business, Greece)

The measurement and improvement of the quality of health care are important areas of current research and development. An indirect way to evaluate the quality of hospital care is to compare the observed
mortality rates at each of a number of hospitals with their expected rates, given the sickness at admission of their patients. Patient sickness at admission is often assessed by using logistic regression of mortality, for example within 30 days of admission, on a fairly large number of sickness indicators to construct a sickness scale, employing classical variable selection methods --- which trade off prediction accuracy against parsimony --- to find an "optimal" subset of 10--20 indicators. When the goal is the creation of a sickness scale that may be used prospectively to measure quality of care on a new set of patients in a cost-effective manner, traditional variable selection methods can produce sub-optimal subsets, since they do not account for differences in the data collection costs of the available predictors.

In settings of this type, with two desirable criteria that compete --- here, high predictive accuracy and low cost --- a method must be found to achieve a joint optimisation. Here we present a computational strategy to search the model space and select variables under the restriction of an upper cost bound imposed by the management of the project. The practical relevance of the selected variable subsets using the method of this paper is ensured by enforcing an overall limit on the total data collection cost of each subset: the search is conducted only among models whose cost does not exceed this budgetary restriction.

Conventional model search algorithms in our setting will fail if the best model under no cost restrictions lies outside the imposed cost limit and when collinear predictors with high predictive ability are present. The reason for this failure is the existence of multiple modes with movement paths  that are forbidden due to the cost restriction. To solve this problem, in this paper we develop a population-based trans-dimensional reversible-jump Markov chain Monte Carlo (population RJMCMC) algorithm, in which ideas from the population-based MCMC and simulated tempering algorithms are combined.  Comparing our method with standard RJMCMC, we find that the population-based RJMCMC algorithm moves successfully and more efficiently between distant neighbourhoods of "good" models, achieves convergence faster and has smaller Monte Carlo standard errors for a given amount of time. In a case study of n = 2, 532 pneumonia patients on whom p = 83 sickness indicators were measured, with marginal costs varying from smallest to largest across the predictor variables by a factor of 20, the final model chosen by population RJMCMC, both on the basis of highest posterior probability and specifying the median probability model, is clinically sensible for pneumonia patients and achieves good predictive ability while capping data collection costs.
Two-filter smoothing is a principled approach for performing optimal smoothing in non-linear non-Gaussian state-space models where the smoothing distributions are computed through the combination of `forward' and `backward' time filters. The `forward' filter is the standard optimal Bayesian filter but the `backward' filter, generally referred to as the backward information filter, is not a probability measure on the space on the hidden Markov process. In cases where the backward information filter can be computed in closed form, this technical point is irrelevant. However, for general state-space models where there is no closed form expression, this prohibits the use of flexible numerical techniques such as Sequential Monte Carlo (SMC) to approximate the two-filter smoothing formula. We propose here a generalised two-filter smoothing formula which only requires approximating probability distributions and applies to any state-space model, removing the need to make restrictive assumptions used in previous approaches to this problem. SMC algorithms are proposed to implement this recursion and we illustrate their performance on various problems.

Francisco-José Pérez-Reche: Statistical models for complex systems: from avalanches in solids to epidemics in complex networks

The behaviour of many systems is complex in the sense that it cannot be understood by just knowing the properties of simpler elementary constituents. It is often the case that the large scale behaviour is mainly dictated by interactions on microscopic scales. Examples of such systems include human economies, fungal colonies, the earth crust or shape-memory metal alloys used for making devices. Science of complexity defines a multidisciplinary field which is interesting both from fundamental and applied viewpoints. In particular, understanding complex systems is essential to predict catastrophic events such as epidemic outbreaks or avalanches leading to material rupture.

In this seminar, we will mainly focus on epidemic modelling in systems with stochasticity and heterogeneity. The presented approach is based on the SIR (Susceptible/Infected/Removed) model analysed by means of percolation theory and methods from statistical mechanics of lattice systems.  We will illustrate the interdisciplinary character of the presented ideas by suggesting some analogies with similar approaches for description of complexity in materials. In particular, the presented formalism will be used for description of how different types of heterogeneity affect the possibility of an epidemic outbreak in realistically complex systems including root systems, neural ensembles, and soil. Finally, a method for estimation of the probability of an epidemic outbreak and its final size in realistically complex networks is suggested. abstract to come

Adam Kleczkowski: Modelling framework for prediction and control of epidemic outbreaks

One of the main goals of epidemiological modelling is to assess risks of large disease outbreaks and to find strategies to prevent and control them. The decision-making depends, in the first instance, on understanding the risk and the associated losses, which subsequently can be traded against the cost of treatment. Although many epidemics are characterized by large variability among individual outbreaks, individual epidemics often follow a well-defined trajectory which is much more predictable in the short term than the ensemble (collection) of potential epidemics. We introduce a modelling framework that allows us to deal with individual replicated outbreaks, based upon a Bayesian hierarchical analysis. Information about ‘similar’ replicate epidemics can be incorporated into a hierarchical model, allowing both ensemble and individual parameters to be estimated. We use the modelling framework to analyse two replicated experiments involving spread of a common plant pathogen Rhizoctonia solani on radish. In the first experiment we study the response of the pathogen to a biocontrol agent, Trichoderma viride. The rate of primary (soil-to-plant) infection is found to be the most variable factor determining the final size of epidemics. Breakdown of biological control in some replicates results in high levels of primary infection and increased variability. Subsequently we expand the model to include pre-symptomatic stages and quantify the rates of transition between unobserved classes. Subsequently we consider various control strategies aiming at reducing the risk and take into account trade-offs with the loss of healthy plants. Although we use a specific system to illustrate our approach, the modelling framework is generic and can be applied to any system in which groups of individuals move between locations and can carry disease agents without symptoms. The results have important consequences for parameter estimation, inference and prediction for emerging epidemic outbreaks.

Darren Wilkinson: Stochastic modelling and Bayesian inference for biochemical network dynamics

This talk will provide an overview of computationally intensive methods for stochastic modelling and Bayesian inference for problems in computational systems biology. Particular emphasis will be placed on the problem of inferring the rate constants of mechanistic stochastic biochemical network models using high-resolution time course data, such as that obtained from single-cell fluorescence microscopy studies. The computational difficulties associated with "exact" methods make approximate techniques attractive. There are many possible approaches to approximation, including methods based on diffusion approximations, and methods exploiting stochastic model "emulators".

Theodore Kypraios: Statistical Analysis of Hospital Infection Data: Models, Inference and Model Choice

High-profile hospital "superbugs" such as methicillin-resistant Staphylococcus aureus ( MRSA) etc have a major impact on healthcare within the UK and elsewhere. Despite enormous research attention, many basic questions concerning the spread of such pathogens remain unanswered. For instance what value do specific control measures such as isolation have? how the spread in the ward is related to ``colonisation pressure``? what role do the antibiotics play? how useful it is to have new molecular rapid tests instead of conventional culture-based swab tests?
A wide range of biologically-meaningful stochastic transmission models that overcome unrealistic assumptions of methods which have been previously used in the literature are constructed, in order to address specific scientific hypotheses of interest using detailed data from hospital studies. Efficient Markov Chain Monte Carlo (MCMC) algorithms are developed to draw Bayesian inference for the parameters which govern transmission. The extent to which the data support specific scientific hypotheses is investigated by considering and comparing different models under a Bayesian framework by employing a trans-dimensional MCMC algorithm while a method of matching the within-model prior distributions is discussed how to avoid miscalculation of the Bayes Factors. Finally, the methodology is illustrated by analysing real data which were obtained from a hospital in Boston.

Daniel Clark: First-moment filtering for spatial independent cluster processes

Dynamic clustering requires the estimation of the evolution of a spatial dynamic cluster process in time based on a sequence of partial observation sets. A suitable generalisation of the Bayes fi lter to this system would provide us with an optimal estimate of the multi- cluster multi-object state based on measurements received up to the current time step and an analogous forward-backward smoother could re ne previous estimates based on current measurements. Based on the assumption of independent cluster processes, we describe a generalisation of the optimal Bayes filter and forward-backward smoother for dynamic clustering. The full independent-cluster Bayes filter requires the association of all possible measurements to potential points within potential clusters. This approach is computationally infeasible in almost all situations and so we derive a fi rst-moment approximation of the independent-cluster Bayes filter, inspired by the first-moment multi- object Bayes filter derived by Mahler in the aerospace community.