Course "Algorithmic Problems Around the Web"
Yury Lifshits. Caltech, Fall'07, CS101.2
The course is over. Thanks for participating!


Date Slides Handouts
1 October 2007 Topics for projects Slides for print
3 October 2007 Nearest Neighbor Search by Branch and Bound Slides for print
4 October 2007 Experimental Projects on Web Algorithms Invited lecture at CS141a. Slides for print
8 October 2007 More Branch and Bound Algorithms Slides for print
8 October 2007 Recommendations in the Scientific World Guest lecture by John Delacruz.
9 October 2007 Similarity Search: A Web Perspective Short version for CMI Retreat. Slides for print
10 October 2007 Walking and Matrix-Based Algorithms Slides for print
18 October 2007 Similarity Search: A Web Perspective Full version for Google Tech Talk Slides for print
22 October 2007 Locality-Sensitive Hashing Slides for print
24 October 2007 Progress report by Mike Olson (whiteboard)
29 October 2007 Nearest Neighbors via Inner Product Test Slides for print
31 October 2007 Brief Intro to Association Rules and Minhashing by Yury Person
5 October 2007 Progress report by Hsuan-Tien Lin (whiteboard)
7 October 2007 Mining Users-Ads Bipartite Graph by Peter Sadowski
12 October 2007 Point-to-Point Shortest Path Algorithms with Preprocessing (12 Mb)
by Andrew Goldberg
14 October 2007 Nearset neighbors in small doubling dimension Slides for print
19 October 2007 "Predictive Discrete Latent Factor Models" (Best Paper of KDD 2007)
presented by Kevin Ko (whiteboard)
21 October 2007 Semantic Search Slides for print


Participants: Michael Olson, Peter Sadowski, Shankar Kalyanaraman, Neil Halelamien, Chess Stetson, Dan Wilhelm, (Kevin) Chih-Kai Ko, Hsuan-Tien Lin, Xuan Luo, Eyal Rozenman, Vera Asodi, Shengyu Zhang, Shripad Thite, Yury Person, John Delacruz, Yuval Cassuto, Leonard Schulman, K. Mani Chandy. Janice (Qian) Wang

Our Projects

Similarity search in combinatorial framework Shengyu Zhang & YL
Data mining in bipartite graphs Peter Sadowski, Kevin Ko & YL
Event discovery in blogs Michael Olson & YL
Social network vizualization Yury Person & YL

Topics for Projects: References

T1. Nearest Neighbors for Sparse Vectors Zobel/Moffat, YL:Yandex Talk (Rare point method), Lifshits/Nowotka
T2. Low-Distortion Embeddings for Social Networks Slivkins, Elementary J-L Lemma
T3. Disorder Method for Nearest Neighbors Goyal/Lifshits/Schütze
T4. 3-Step Nearest Neighbors YL: Large Scale Graph Algorithms (video)
T5. Probabilistic Nearest Neighbors Hoffmann/Lifshits/Nowotka, Clarkson
E1. Recommendations for Blog Posts Adomavicius/Tuzhilin
E2. Click-Through Rate Prediction Lifshits/Nowotka, SSA 07, YL Webguide lecture (video)
E3. Social Networks Visualization Slivkins, Pampalk/Rauber/Merkl
E4. Disorder Analysis Goyal/Lifshits/Schütze

General links


DATA SETS

Yandex LLC is providing two extremely interesting datasets for this course: (1) snapshot of Russian-speaking blogosphere with comments, friendship and hyperlinks and (2) monthly log of all advertising campaigns in Yandex.Direct (the largest Russian ad broker). You have to sign a simple NDA form for using them in experiments.