Swedish name: Statistisk inlärning med högdimensionella data
This syllabus is valid: 2024-08-26 and until further notice
Syllabus for courses starting after 2024-08-26
Course code: 5MS084
Credit points: 7.5
Education level: Second cycle
Main Field of Study and progress level:
Mathematical Statistics: Second cycle, has only first-cycle course/s as entry requirements
Grading scale: Pass with distinction, Pass with merit, Pass, Pass with distinction, Pass, Fail
Responsible department: Department of Mathematics and Mathematical Statistics
Established by: Faculty Board of Science and Technology, 2022-03-14
Revised by: Faculty Board of Science and Technology, 2024-02-20
This course provides comprehensive knowledge, both regarding breadth and depth, about data science and statistical learning. In the course, both traditional and state of the art methods and algorithms in these fields are discussed. The related fundamental theories are also covered. After passing the course, the students should have a strong ability to solve problems through data. Meanwhile, students are also expected to have a strong self-study ability for understanding and learning any newly developed methods and algorithms.
Module 1 (3hp): Theory
Three families of approaches for dimensionality reduction are covered: spectral based learning (multi-dimensional Scaling, Isomap, Kernel PCA, etc.), manifold learning (Locally linear Embedding, Hessian Eigen-mapping, t-distributed stochastic neighbor embedding, etc.), and deep neural network-based methods (Autoencoders, Variational autoencoder, etc.). As special cases of dimensionality reduction, different feature selection methods, such as Ridge regression, LASSO, and Feature importance are also discussed. Supervised learning approaches including the Kernel-based methods (Kernel ridge regression, Support Vector Machine, etc.), Ensemble methods (Random Forest and Adaboost), Neural Networks, and different Deep Learning approaches and architectures are discussed. Furthermore unsupervised learning approaches including different clustering analysis algorithms, such as Density-based methods and Spectral clustering analysis are included. Deep learning-based unsupervised learning methods, such as Generative adversarial networks and its variations are also covered. Finally, fundamental mathematical theories about kernel methods, ensemble methods, penalty approaches, shallow network, gradient descent algorithm, universal estimator, and fundamental theorem of learning, etc. are discussed.
Module 2 (4.5hp): Computer labs
The module covers the analysis of several data sets, using the statistical methods that are included in the course. The analyses are conducted in one of the the programming languages R or Python. In the module, students write thorough reports of the analyses and the results from them.
For a passing grade, the student must be able to
Knowledge and understanding
Skills
Judgement and approach
The course requires 90 ECTS including 7,5 ECTS Computer Programming, 7,5 ECTS Multivariate Data Analysis and 12 ECTS Mathematical Statistics or equivalent. Proficiency in English and Swedish equivalent to the level required for basic eligibility for higher studies.
The teaching in Module 1 takes the form of lectures and lessons. The teaching in Module 2 takes the form of supervised lab work.
Module 1 is assessed through a written exam and is awarded one of the following grades: Fail (U), or Pass (G). The grade is based on the score on the exam. The lab reports are awarded one of the following grades: Fail (U), or Pass (G) and they are given a score. For module 2 to be awarded the grade Pass (G), all the lab reports have to be approved. For the course as whole, one of the following grades is awarded: Fail (U), Pass (3), Pass with merit (4), Pass with distinction (5). The grade for the whole course is determined by the total score on the lab reports and the exam, where the lab reports constitute 2/3 and the written exam 1/3 of the total score.
Deviations from the syllabus examination form can be made for a student who has a decision on pedagogical support due to disability. Individual adaptation of the examination form shall be considered based on the student's needs. The examination form is adapted within the framework of the expected learning outcomes of the course syllabus. At the request of the student, the course coordinator, in consultation with the examiner, must promptly decide on the adapted examination form. The decision shall then be communicated to the student.
A student who has been awarded a passing grade for the course cannot be re-assessed for a higher grade. Students who do not pass a test or examination on the original date are given another date to retake the examination. A student who has sat two examinations for a course or a part of a course, without passing either examination, has the right to have another examiner appointed, provided there are no specific reasons for not doing so (Chapter 6, Section 22, HEO). The request for a new examiner is made to the Head of the Department of Mathematics and Mathematical Statistics. Examinations based on this course syllabus are guaranteed to be offered for two years after the date of the student's first registration for the course.
Credit transfer
All students have the right to have their previous education or equivalent, and their working life experience evaluated for possible consideration in the corresponding education at Umeå university. Application forms should be addressed to Student ser-vices/Degree evaluation office. More information regarding credit transfer can be found on the student web pages of Umeå university, http://www.student.umu.se, and in the Higher Education Ordinance (chapter 6). If denied, the application can be ap-pealed (as per the Higher Education Ordinance, chapter 12) to Överklagandenämnden för högskolan. This includes partially denied applications
This course can not be included in a degree together with another course with similar contents. When in doubt, the student should consult the director of study at the department of mathematics and mathematical statistics. The course can also be included in the subject area of computational science and engineering.
In the event that the syllabus ceases to apply or undergoes major changes, students are guaranteed at least three examinations (including the regular examination opportunity) according to the regulations in the syllabus that the student was originally registered on for a period of a maximum of two years from the time that the previous syllabus ceased to apply or that the course ended.
An introduction to statistical learning : with applications in R
James Gareth, Witten Daniela, Hastie Trevor, Tibshirani Robert
Second edition. : New York, NY : Springer : [2021] : xv, 607 pages :
ISBN: 9781071614204
Mandatory
Search the University Library catalogue