Heriot-Watt University

School of Mathematical and Computer Sciences
Actuarial Mathematics and Statistics


BSc in Actuarial Mathematics and Statistics + main subject version
BSc in Statistics + main subject version
BSc in Financial Mathematics


CATEGORICAL DATA ANALYSIS - F73SJ2 - SPRING 2005


DATASETS

You can download some principal data sets here:


THERE ARE TWO WORKBOOKS FOR THIS MODULE. THEY CONTAIN BACKGROUND THEORETICAL MATERIAL, DATA SETS, COMPUTING HELP, DISPLAYS, RESULTS OF FITTING MODELS, AND TUTORIALS



REPORT ON 2005 PROJECTS

Project 1 Report available here

Project 2 Report available here



HELP WITH USING R
The latest (2005) version of RJG's guide to using R is available here
R Reference Sheets


PROJECTS

There will be 2 projects - these contribute to the assessment for the module stream F73SF1, 3SJ2, 3SM3. You will work on those during weeks 4-5 and 9-10.
Project 1 will be given out on the Wednesday of Week 3.
Project 2 will be given out on the Wednesday of Week 8.

You can download the plagiarism declaration forms here
Project 1 plagiarism declaration form
Project 2 plagiarism declaration form


2004 PROJECTS AND REPORTS ARE AVAILABLE HERE
Project 1: A model for claim numbers Project available here
RJG's Report on the project Report available here
Project 2: Smoking prevalence Project available here
RJG's Report on the project Report available here

SOME EARLIER PROJECT REPORTS ARE ALSO HERE
2003 COURSE

Project 1: Birds in hedges/Publish and be modelled
This project is discussed as two examples in Workbook 1.
A commentary/report on the project is also available here (pdf file):
Report available here

Project 2: Red Kites in Wales
The project is available here (pdf file):
Red Kites project
A commentary/report on the project is now available (pdf file):
Report available here


HAND-OUTS

These are all already incorporated in the workbooks.


TUTORIALS

These are all already incorporated in the workbooks. Here is a description of their content.

Tutorial 1: Simulations using R - various distributions, Poisson process; inference for Poisson process; dispersion and likelihood ratio statistics, tests of homogeneity
Tutorial 2: Goodness-of-fit tests, trend test, log-linear trend and cyclic models
Tutorial 3: Two-way tables (2 x 2, r x s) - tests and model fitting; quasi-independence and other situations
Tutorial 4: Offsetting a deterministic vector, logistic regression
Tutorial 5: Three-way tables


TUTORIAL SOLUTIONS are available here (activated at appropriate times)

Tutorial 1 Solutions
Tutorial 2 Solutions
Tutorial 3 Solutions
Tutorial 4 Solutions
Tutorial 5 Solutions




AIMS OF MODULE
  • to develop students' abilities in understanding and solving practical statistical problems involving categorical data
  • to present theory and techniques for the analysis of categorical data
  • to enable students to learn how to choose appropriate techniques, to analyse categorical data, and present results

LEARNING OUTCOMES
At the end of the module, students should be able to:
  • recognise data as being categorical data and summarise data as categorical data where appropriate
  • appreciate the methods available for the analysis of such data and distinguish between methods which are appropriate and those which are not
  • demonstrate skills in using standard analytic methods for single and two-way classification data, including the use of Poisson and multinomial models for data, odds ratios, and Pearson's chi-squared and likelihood ratio statistics
  • appreciate the need for, the structure, and the usefulness of, generalised linear models
  • fit, and interpret the results of fitting, generalised linear models, including log-linear models (for example trend models) and logistic regression models
  • select and fit good models in three-way situations using a formal hierarchical approach
SUMMARY

The module is based on two workbooks, which will be given out to all students and which contain necessary background and theory, data sets, and worked examples. There will be 3 lectures per week - the emphasis will be on your learning at your own pace and in your chosen manner rather than every detail being taught. In lectures we will consider the main points from the theory and applications described in the workbooks and we will go over the worked illustrations and examples.

Practical applications will be emphasised throughout - most of the practical work will be done on computer using R (and very occasionally Minitab).

Note that you will be expected to learn some of the material for yourself both from studying the content of the workbooks and from doing the practical work contained in the tutorials. Not every part of the work will be taught as such.

Each student will be expected to attend one computer lab (at which RJG will be present) in week 2, week 4 or 5 (or both), week 7, and week 9. These will be held at 12.15hrs and 14.15hrs in the main lab on the ground floor of the Scott Russell building. In addition labs are reserved for students taking this module at other times (see the timetable on the door of the lab).


TIMETABLE

Class times - including lectures, tutorials, and worked examples sessions (using R):
Weeks 1 - 4 and 6 - 9 Tuesday 14.15, Wednesday 09.15, 10.15
There are no lectures in weeks 5 or 10.

Labs (SR G13): Weeks 2,4,5,7,9
Monday 12.15, 14.15
The labs in Weeks 4,5 and 9 will be special "Project labs".


CONTENT

  1. Introduction
  2. Poisson process and associated distributions
    1. Bernoulli trials and related distributions
    2. Poisson process and related distributions
    3. Inference for the Poisson distribution
    4. Dispersion and likelihood ratio tests for Poisson data
    5. Mixed and compound Poisson distributions
  3. Single classifications
    1. Binary classifications
    2. Qualitative categories
    3. Ordered categories
    4. Goodness-of-fit tests for frequency distributions
    5. Residuals
    6. Trend models
    7. Cyclic models
    8. A brief introduction to generalised linear models
  4. Two-way classifications
    1. Factors and responses
    2. Distribution theory for 2x2 tables
    3. Statistical tests for 2x2 tables
    4. The odds ratio for a 2x2 table
    5. Log-linear models for 2x2 tables
    6. rxs tables
    7. Some other situations and special cases
    8. Taking into account a "deterministic denominator" - using an offset
  5. Logistic regression
  6. Three-way classifications
    1. Introduction
    2. Hierarchic log-linear models

READING

  • Plackett, RL (1981) The Analysis of Categorical Data (2nd ed.), Griffin
  • Everitt, BS (1992) The Analysis of Contingency Tables, Chapman and Hall
  • Feinberg, SE (1991) The Analysis of Cross-Classified Categorical Data, MIT
  • Venables, WN, & Ripley, BD (1994) Modern Applied Statistics with S-Plus, Springer
  • Krause, A & Olson, M (2000) The Basics of S and S-PLUS (2nd ed.), Springer



Help

Please email R.J.Gray@ma.hw.ac.uk if you have any problems in connection with this course.

Note that, to mail directly from Netscape, you will need to configure it via the Options, Mail and News Preferences menu to show your correct name and email address (e.g. J.Bloggs@hw.ac.uk), especially if you would like a reply (this will then come to your usual mailbox). Alternatively you can use your usual mailer (e.g. Simeon) to send email.



Roger Gray
Created: 21 December 2000 Last modified: 11 January 2005