Course list: (in part and in CWRU only):
l Data Analysis I: Basic exploratory data analysis for uni-variate response with single or multiple covariates. Graphical methods and data summarization model fitting using S-plus computing language. Linear and multiple regressions. Emphasis on model selection criteria, on diagnostics to assess goodness of fit and interpretation. Techniques include transformation, smoothing, median polish, and robust/resistant methods. Case studies, and analysis of individual data sets.
l
Data Analysis II: Extensions of exploratory data analysis and
modeling to multivariate response observations and to non-Gaussian data.
Singular value decomposition and projection, principal components, factor
analysis and latent structure analysis, discriminant
analysis and clustering techniques, cross-validation, E-M algorithm, and CART.
Introduction to generalized linear model.
Case studies of complex data sets with multiple objectives for
analysis.
l
Bayesian Data Analysis: Principles of Bayesian theory, methodology
and applications. Methods for forming prior distributions using conjugate
families, reference priors and empirically based priors. Derivation of
posterior and Predictive distributions and their moments. Properties when
common distributions such as binomial, normal or other exponential family
distributions are used. Hierarchical models. Computational techniques including
Markov chain
l Statistical Computing: Basic topics in statistical computing: Floating point arithmetic; Semi-numerical computation including generation and tests of random numbers, Monte Carlo methods, variance reduction methods, stochastic models and simulation studies; Numerical computation including numerical linear algebra, optimization and root-finding, numerical integration; Statistical computing, e. g. re-sampling methods, EM algorithms, Gibbs sampling and projection pursuit
l Stochastic Modeling: Introduction to stochastic modeling of data with emphasis on models and statistical analysis of data with a significant temporal and/or spatial structure. Markovian and semi-Markovian models, point processes, point cluster models, queuing models, risk model, likelihood methods, estimating equations.
l
Theoretical Statistics: Point estimation: maximum likelihood, moment
estimators. Methods of evaluating estimators including mean squared error,
consistency, "best" unbiased and sufficiency. Hypothesis testing. likelihood ratio and union-intersection tests. Properties of
tests including power function, bias. Interval estimation.
l
Linear Models: Theory of least squares estimation, interval estimation and tests for
models with normally distributed errors. Regression on dummy variables. ANOVA,VACOV. Variance component models. Model diagnostics. Robust
regression. Analysis of longitudinal data.
l
Advanced Techniques in Data Analysis: Topics drawn from re-sampling methods (including
bootstrapping), MCMC (Gibbs sampling), nonparametric curve and surface fitting,
kernel density estimation, projection pursuit, time series (time permitting),
approaches to model uncertainty, models for repeated measures and
structural-functional models, statistical inference for non-statistical
mathematical models of large systems.
l
Theory and Methods of Experimental Design Experimental design for polynomial regression models
and for multi-factor models. Theory for construction of increased efficiency
designs including fractional factorials, Latin squares. Designs for response
surfaces. GOSSETT-generated optimal designs for nonstandard problems.
l
Survival data analysis: Basic concepts of survival analysis including hazard
function, survival function, types of censoring, Kaplan-Meier estimates,
log-rank tests, and the generalized Wilcoxon tests.
Parametric inference will include exponential and Weibull
distributions with and without censoring. The proportional hazard.
l Statistical consulting: statistical
consulting under the guidance of the instructor.
l
STAT METHOD/ANALYSIS OF DNA: Background on low
level processing and generation of high throughput genomic data. Detect
differentially expressing genes via FDR theory, empirical Bayes,
resampling based approaches, ANOVA methods including
non-Bayesian and fully Bayesian approaches. Optimality and theoretical measures
of performance. Empirical comparisons and case studies.
l
PRINCIPLES OF GENETIC EPID: A
survey of the basic principles, concepts and methods of the discipline of
genetic epidemiology, which focuses on the role of genetic factors in human
disease and their interaction with environmental and cultural factors. Many
important human disorders appear to exhibit a genetic component; hence the
integrated approaches of genetic epidemiology bring together epidemiological
and human genetic perspectives in order to answer critical questions about
human disease. Methods of inference based upon data from individuals, pairs of
relatives, and pedigrees will be considered.
l
Real Analysis: Real and complex measure theory, integral
theorems. Banach space. Riesz
representation theorem, functional analysis, closed graph theorem, open
mapping. Weak topology. Hilbert space.
Fourier series. etc.
l
Abstract Algebra: Basic properties of groups, rings, modules and fields.
Finitely generated modules over principal ideal domains, canonical forms for
matrices; categories and functors; tensor product of
modules, bilinear and quadratic forms; field extensions; fundamental theorem of
Galois theory, solving equations by radicals.
l
Set Topology: Metric spaces,
topological spaces, and continuous functions. Compactness. Connectedness. Path
connectedness. Topological manifolds. Topological groups. Polyhedral. Simplical complexes. Fundamental groups.
l
Algebraic topology: The fundamental group and covering spaces; van Kampen's theorem. Higher homotopy
groups. Long-exact sequence of a pair. Homology theory; chain complexes; short and
long exact sequences; Mayer-Vietoris sequence.
Homology of surfaces and complexes; applications
l
Topological dynamic systems: research topic with Professor Wu.
l
Graph Theory: Building Blocks, Trees, Connectedness,
Matching, transversability. NP-complete. Major COP problems and
algorithms.
l
Combinatory: Permutations, combinations and variations. Principle
of inclusion and exclusion. Generating functions. Difference equations.
Partitions.
l
Algorithm Analysis: Sorting, searching, set manipulation,
graph algorithms, matrix operations, polynomial manipulation, and fast Fourier
transforms. Through specific examples and general techniques, the course covers
the design of efficient algorithms as well as the analysis of the efficiency of
particular algorithms. Certain important problems for which no efficient
algorithms are known (NP-complete
problems) are discussed in order to illustrate the intrinsic difficulty, which
can sometimes preclude efficient algorithmic solutions.
l Database Systems: Basic issues in file processing and database management systems. Physical data organization. Relational databases. Database design. Relational query Languages, SQL. Query languages. Query optimization. Database integrity and security. Object-oriented databases. Object-oriented Query Languages, OQL, XML
l
Operating systems: CPU scheduling, memory management, concurrent processes,
semaphores, monitors, deadlocks, secondary storage management, file systems,
protection, UNIX operating system, fork, exec, wait, UNIX System VIPCs, sockets, remote procedure calls, threads. Must be
proficient in "C" programming language.