elective subject:Data science
Informacje ogólne
Kod przedmiotu: | ECM4SE24EC-L18 |
Kod Erasmus / ISCED: | (brak danych) / (brak danych) |
Nazwa przedmiotu: | elective subject:Data science |
Jednostka: | Akademia Finansów i Biznesu Vistula |
Grupy: | |
Punkty ECTS i inne: |
0 LUB
3.00
(w zależności od programu)
|
Język prowadzenia: | angielski |
Skrócony opis: |
(tylko po angielsku) The objective of the course is to introduce students to most important models used in modern Data science. Students will learn when to apply these models, will write them in R programming language and use results of these models in various areas of business activity, based on real business cases. |
Pełny opis: |
(tylko po angielsku) Classes will take place in the Summer semester on Tuesdays, between 12:00 and 14:00 in 127 computer lab. Data science models are the core of many Big data applications, which transform almost every aspect of business activity. There are tens of thousands of vacancies globally for a Data scientist position, offering on average a salary of 100,000 USD per year. There are two major languages use in Data science globally: R and Python. This course provides students with knowledge and skills required for professional Data scientist working in R. For each model we will present its theoretical foundations, discuss when to apply it, how it works and how to interpret results. We will then write code in R applying the model to real business and economic data including: predicting economic outcomes, analyzing human resources processes in businesses, predicting student performance, verifying whether computer can replace human in business decision-making etc. After the course students will have good understanding of the models, will be able to code them in R and will have a set of R programs ready to use in their future jobs. Course content: 1. Introduction to Data science: statistical learning, predictive analytics, supervised and unsupervised learning models (2 hours) 2. Data collection (including web scraping), cleaning and preparation, including dealing with missing values (4 hours) 3. R and R studio key features (2 hours) 4. Linear regression in R (2-4 hours) 5. Logit classification in R, confusion matrix and ROC curve (2 hours) 6. Validation and cross validation methods (2 hours) 7. Midterm test and discussing capstone project (2 hours) 8. Tree based models, regression and classification CART in R (2-4 hours) 9. Random forest in R (2 hours) 10. Clustering models in R (2-4 hours) 11. Presentation and discussion of final capstone project (2-4 hours) Depending on the course pace we may also cover some of the following topics, based on student preferences: 12. Support Vector Machines in R 13. Bayesian models in R 14. Dimension reduction methods in R 15. Automatic selection of best models 16. Bagging and boosting in R Total: 30 hours |
Literatura: |
(tylko po angielsku) Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani “Introduction to Statistical Learning”, free pdf file can be downloaded from authors website: http://www-bcf.usc.edu/~gareth/ISL/ Internet resources: CRAN-R, stackoverflow, datacamp, kaggle; specific links will be given during lectures PowerPoint presentations and short readings will be available for each class, and students are expected to read it before the class. While computers will be available in the computer lab, it is advisable that students who have their own laptops download free software: Open source R, R studio and OpenRefine to their laptops to complete home assignments. |
Efekty uczenia się: |
(tylko po angielsku) Has extensive knowledge of Data science reasoning allowing him/her to verify hypotheses with respect to characteristics and behaviors of studied populations and the course of phenomena and processes Can build and assess linear and non-linear models describing economic phenomena, forecast on the basis of Data science modeling, interpret results of model analyses Shows the ability to co-operate in a team (group), both as the supervisor and the supervised Can prepare, individually or in a team, a social project and predict its diverse consequences (outcomes) |
Metody i kryteria oceniania: |
(tylko po angielsku) Midterm test (individual) – 30% Short presentation of a given model (team)– 20% Presentation of the final capstone project (team)– 50% Classes will be organized as follows: - students will self-organize into teams, on average 3 students in each team - first I will discuss theoretical foundations of each model, then present a case in R using real or simulated data (we will use free R Studio software) - teams will be required to apply the learned model to data sets available on kaggle.com (or collected by students) as a method of verification if they understood the model, randomly selected team(s) will make a short presentation at the beginning of the next class - during the last class(es) teams will present their capstone projects Each presentation (short and final) will be graded as follows: half of the grade will be given by the class in a anonymous poll, half of the grade will be given by me. Criteria for judging the project presentations: - clearly and properly stated project goal - using proper model to achieve the goal - application of proper data science techniques and proper R code for model estimation - correct interpretation of model results - correct connection between model results and the project goal - clarity and quality of presentation (slides and team performance) - was the presentation exciting or boring |
Zajęcia w cyklu "Semestr letni 2017/2018" (zakończony)
Okres: | 2018-02-16 - 2018-07-15 |
zobacz plan zajęć |
Typ zajęć: |
Wykład, 30 godzin
|
|
Koordynatorzy: | Krzysztof Rybiński | |
Prowadzący grup: | Krzysztof Rybiński | |
Lista studentów: | (nie masz dostępu) | |
Zaliczenie: |
Przedmiot -
Zaliczenie na ocenę
Wykład - Egzamin/zaliczenie na ocenę/zal w skali zal-std2 |
Właścicielem praw autorskich jest Akademia Finansów i Biznesu Vistula.