550:438: Statistical Methods for Computer Intrusion Detection

David Marchette

Spring, 2008

Tuesdays, 9:30 -- 12:00, Wyman Park Conference Center, Rm. 421

Project Schedule

April 15:
Lee: Statistical attacks against ssh
Morris: Statistical prediction of group action
Reilly: Detecting backdoors

April 22:
Abramowitz: Detecting cheating in online games
Lui: Bioinformatics approach to intrusion detection
Saba: Cross-site scripting attacks

April 29:
Nallasami: Statistical analysis of network packet characteristics
Dhong: User profiling

Syllabus

Grading

Homeworks 50% and project (see below) 50%. Homeworks are due on the Tuesday after they are assigned, unless otherwise stated. It is important that you try all the problems in the homeworks. Late homeworks will not be accepted unless you have arranged in advance, or have an extremely good excuse. I am much more concerned that you have attempted the homeworks than that you did them perfectly.

Projects

The project involves giving a presentation on a topic related to computer security. The presentation length will be 40 minutes (about 30 slides is probably about right). A typical talk will consist of either a discussion of a particular topic (such as Honey Pots), or of a particular paper. The topics will be asigned on a first-come first-served basis, so choose them soon. You may choose one of the papers/topics below, or come up with your own (clear it with me, of course). You can either report on the papers given below, or find your own paper(s) if you wish.

Some Project Topics

Timing attacks against ssh: ssh-timing.pdf

Honey pots/honey nets

Stepping stones 1: Detecting Backdoors

Stepping stones 2:Stepping Stones

Stepping stones 3:Multiscale Stepping Stones

Analyzing alerts: analyzingalerts.pdf

Identifying encrypted web pages: sun02statistical.pdf

Bioinformatics ID: coull01intrusion.pdf

Spam detection 1: spamrank.pdf

Spam detection 2: p243-zhang.pdf

Spam detection 3: nsf-nij-emt.pdf

Spam detection 4: gyongyi2006link.pdf

User profiling: p1-pusara.pdf

SVM for ID: ocsvm.pdf

Malware detection: oakland05.pdf

Host ID: fwraps-final.pdf

Denial of service attacks: final.pdf

Your favorite attack/virus/worm (a statistical assessment or detector)

The statistics of attack costs: how are they measured?

Data sets for intrusion detection research

Topics

Introduction to the R Language

Introduction to Pattern Recognition

Introduction to Networking

Common Network Attacks

Backscatter

Passive Fingerprinting

Host Attacks

Anomalies

Program and User Profiling

Spam Detection

Viruses, Worms and Trojans

Other topics may be covered if time permits, or may be covered in the student presentations.

Lectures

Brief Intro to R
Brief Intro to Pattern Recognition
Brief High Level Intro to Networking
Network attacks
Passive fingerprinting
Host attacks
Backscatter from DDOS
User profiling
Trojans(the dissasembly stuff)
Scan statistics
More user profiling
Still more user profiling

Obtaining R

R is available at: http://lib.stat.cmu.edu/R/CRAN/.

Information on R can be obtained at the above site, and at www.R-project.org. See the documents link for papers and manuals for R.

Code

R code for parallel coordinates
R code for a greedy feature selection algorithm
R code showing some exploratory data analysis
R hints text file. This will be updated as I think of things you might need to know.

Homework

homework1 Due 2/5 before class.

homework2 Due 2/12 before class.

homework3 Due 2/19 before class.

t.data needed for the homework.

Code for partial solution to homework 3.
homework4 Due 2/26 before class.
Evaluating_IDs_DARPA_1998.pdf
McHugh critique

The MIT/LL data and critiques
homework5 Due 3/4 before class.
passive.dat
passive.info column labels for passive.dat
hw5.R Code for parts of homework 5.

homework6 Due 3/11 before class.
hw6.R Code for parts of homework 6.

hw6.R Modification of homework 6 to use generalized inverse (can handle variables that have 0 variance).

homework7 Due 3/25 before class.

Suggested Reading

jain-review.pdf
Learning Pattern Classification--A Survey
Tutorial on Recursive Partitioning (rpart)
TCP/IP Fingerprinting
Smashing the stack
Behavior profiling
Behavior-based approach to email
slammer.pdf
worms.pdf
Evaluating_IDs_DARPA_1998.pdf
Evaluating ID systems
McHugh critique
Insertion and evasion
Detecting Intrusions via statistics
MV Stats for audit trails
Sequence matching
Latent space approaches to social networks
Visualizing social networks
Visualizing social groups
Aberrant Behavior Detection
On the spread of viruses on the Internet
Digital Fingerprinting Science News article
Watch that Passwd
Drive-by Pharming
Statistical Approach to Spam
SpamRank
Link Spam Detection
Semantics-Aware Malware Detection
A Covariance Analysis Model for DDoS Attack Detection
Robust and Efficient Detectionof DDoS Attacks
Statistical-based SYN-flooding Detection
Storm Worm
User Re-Authentication via Mouse Movements
Acivity profiling of windows users
Evaluation of Statistical Spam Filtering
Cooties: fun with PDAs