Swedish name: Big data och analys av högdimensionella data
This syllabus is valid: 2020-08-17 and until further notice
Syllabus for courses starting after 2020-08-17
Course code: 5MS062
Credit points: 7.5
Education level: Second cycle
Main Field of Study and progress level:
Mathematical Statistics: Second cycle, has only first-cycle course/s as entry requirements
Grading scale: Pass with distinction, Pass with merit, Pass, Pass with distinction, Pass, Fail
Responsible department: Department of Mathematics and Mathematical Statistics
Revised by: Faculty Board of Science and Technology, 2020-05-04
Element 1 (2 hp): Theory.
In this Element we discuss what characterizes big data and high-dimensional data, including a historical background and examples of applications. Regression analysis including the maximum likelihood- and least squares methods are repeated. The general classification problem is introduced. The goals of classification and hoe performance is measured, are discussed. Furthermore validation methods including cross validation, and evaluation with independent test data, are included. The theory and applications of logistic regression analysis and linear and quadratic discriminant analysis (LDA and QDA) are covered. Variable selection for classification problems, ridge regressio, lasso and principal component analysis (PCA) are treated, as well as how these methods can be used together with logistic regression, LDA and QDA. The statistical software R and interestin program libraries in it are introduced, including a discussion on a worked through exampl containing variable selection, classification and evaluation. Furthermore, the methods K-nearest neighbour (KNN), support vector machines (SVM) and random forest are covered. The general problem of cluster analysis is introduced. The goals of cluster analysis and how performance (robustness) is measured, are discussed. In conection to this, hierarchical cluster analysis, k-means, ans self-organizing maps (SOM) are treated.
Element 2 (5.5 hp) Computer labs.
The Element covers analysis of several data sets, using the statistical methods that are included in the course. The analyses are conducted in the programming language R. In the element, the students are supposed to write thorough reports of the analyses and the results from them.
For a passing grade, the student must be able to
Knowledge and understanding
Skills
Judgement and approach
The course requires 90 ECTS including 12 ECTS Mathematical Statistics and 7,5 ECTS Computer Programming or equivalent. Proficiency in English equivalent to Swedish upper secondary course English 5/A. Where the language of instruction is Swedish, applicants must prove proficiency in Swedish to the level required for basic eligibility for higher studies.
The teaching in Element 1 takes the form of lectures and lessons. The teaching in Element 2 takes the form of supervised lab work.
Element 1 is assessed through written lab reports and a written exam. The lab reports are awarded with one of the following judgements: Fail (U), or Pass (G) and they are given a score. Element 1 and Element 2 are awarded with one of the following judgements: Fail (U), or Pass (G). For Element 2 to be awarded the judgement Pass (G), all the lab reports have to be approved. For the course as whole, one of the following grades is awarded: Fail (U), Pass (3), Pass with merit (4), Pass with distinction (5). The grade for the whole course is determined by total score on the lab reports and the exam, where the lab reports constitute 2/3 and the written exam 1/3 of the total score.
A student who has been awarded a passing grade for the course cannot be reassessed for a higher grade. Students who do not pass a test or examination on the original date are given another date to retake the examination. A student who has sat two examinations for a course or a part of a course, without passing either examination, has the right to have another examiner appointed, provided there are no specific reasons for not doing so (Chapter 6, Section 22, HEO). The request for a new examiner is made to the Head of the Department of Mathematics and Mathematical Statistics. Examinations based on this course syllabus are guaranteed to be offered for two years after the date of the student's first registration for the course.
Credit transfer
All students have the right to have their previous education or equivalent, and their working life experience evaluated for possible consideration in the corresponding education at Umeå university. Application forms should be adressed to Student services/Degree evaluation office. More information regarding credit transfer can be found on the student web pages of Umeå university, http://www.student.umu.se, and in the Higher Education Ordinance (chapter 6). If denied, the application can be appealed (as per the Higher Education Ordinance, chapter 12) to Överklagandenämnden för högskolan. This includes partially denied applications
This course can not be included in a degree together with another course with similar contents. When in doubt, the student should consult the director of study at the department of mathematics and mathematical statistics. The course can also be included in the subject area of computational science and engineering.
An Introduction to Statistical Learning : with Applications in R
James Gareth., Witten Daniela., Hastie Trevor., Tibshirani Robert.
New York, NY : Springer New York : 2013. : xiv, 426 p. 150 ill., 146 ill. in color. :
ISBN: 9781461471370
Mandatory
Search the University Library catalogue