Splošno
Mining Massive Data Sets
Orodja za analizo velikih podatkovnih baz
Instructor: Jure Leskovec
Instructor at FRI: Matej Guid
Email: matej.guid@fri.uni-lj.si
Schedule: This course starts in the second week of January. We will follow the CS246 schedule, which means that you will also have to do homework assignment during exam break.
In March, the course will be held in P04 on Tuesdays at 17:15.
The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. Topics include: MapReduce and Spark/Hadoop, Frequent itemsets and Association rules, Near Neighbor Search in High Dimensions, Locality Sensitive Hashing (LSH), Dimensionality reduction, Recommendation Systems, Clustering, Analysis of massive graphs, Link Analysis (PageRank, HITS), Web spam (TrustRank), Proximity search on graphs, Large scale supervised machine learning, Mining data streams, Learning through experimentation, Web Advertising and Optimizing submodular functions. This course is offered in collaboration with the Stanford University, which offers this course as CS246. Videos of lectures will be available for download. Our university will organize short weekly review sessions and consultations.
Pregledali bomo algoritme strojnega učenja in iskanja znanj v podatkih, ki zmorejo obdelati zelo velike količine podatkov. Med drugim bomo obravnavali naslednje teme: postopek "MapReduce" (preslikaj in skrči), pogosto ponavljoče se stvari v košaricah in povezovalna pravila, učinkovito iskanje sosedov v velikih podatkih, zgoščevanje s sosednostjo (LSH), zmanjševanje dimenzionalnosti, priporočilni sistemi, odkrivanje skupin v podatkih, analiza masivnih grafov, analiza povezav (PageRank, HITS), nezaželene spletne vsebine (TrustRank), iskanje bližnjih vozlišč v grafih, nadzorovano strojno učenje na velikih podatkih, učenje iz podatkovnih tokov, učenje z eksperimentiranjem, spletno oglaševanje in optimiranje submodularnih funkcij. Predmet bo izvajal predavatelj iz Stanforda, kjer se ta predmet izvaja kot CS246. Predavanj ne boste spremljali v živo, pač pa prek video posnetkov. Na FRI bomo organizirali kratke preglede odpredavanega in konzultacijske vaje.
Zoom meeting instructions
In the case of online sessions we will use the following Zoom link:
https://uni-lj-si.zoom.us/j/97525554907?pwd=MmJ4NHFGbklMbVpYb281VEZxdG1rUT09
Meeting ID: 975 2555 4907
Passcode: 067962
USEFUL LINKS / KORISTNE POVEZAVE
Course website / Spletna stran predmeta: http://web.stanford.edu/class/cs246/
Important info / Pomembne informacije:
- handouts / povzetek (PDF): http://web.stanford.edu/class/cs246/handouts/CS246_Info_Handout.pdf
Classes / Predavanja
- 2022: https://snap.stanford.edu/class/cs246-videos-2022/
username: cs246, password: mining2022
- 2021: Lecture Videos (Google Drive).
- Introduction; MapReduce and Spark (Tue March 30)
- Frequent Itemsets Mining (Thu April 1)
- 2019: http://snap.stanford.edu/class/cs246-videos-2019/
- 2018: http://snap.stanford.edu/class/cs246-videos-2018/
Additional materials / Dodatna gradiva: https://web.stanford.edu/class/cs246/index.html#schedule
Reference text / Knjiga: http://www.mmds.org/
Weekly Colab notebooks:
- you will find them directly on the http://web.stanford.edu/class/cs246/ website,
- they are posted every Thursday,
- due one week later on Thursday 23:59 Pacific Time (PT), but rather submit earlier!
- submit via this website (below).
Assignments and grading:
- 4 homework assignments requiring coding and theory (40%)
- Final exam (30%)
- Weekly Colab notebooks (30%)