OM 420/620 Predictive Business Analytics – Winter, 2019
Required Textbooks — Chapters of the following books will be assigned:
• An Introduction to Statistical Learning: with Applications in R by G. James, D. Witten, T.
Hastie, R. Tibshirani (available online: http://www-bcf.usc.edu/~gareth/ISL/) Optional Textbooks:
• The Art of R Programming: A Tour of Statistical Software Design by N. Malt
• R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by H. Wickham, G.
Grolemund (available online: http://r4ds.had.co.nz/)
• The Elements of Statistical Learning by T. Hastie, R. Tibshirani, J. Friedman (available online:
https://web.stanford.edu/~hastie/ElemStatLearn/) Software (available in the lab computers):
• R Programming Language R resources:
www.datacamp.com: Contains many introductory to advanced courses
www.rstudio.com/resources/webinars/: Many great webinars on R and RStudio
http://stackoverflow.com/questions/tagged/r: The coding cookbook of the Internet Age!
https://rweekly.org: See how R is being used in the real world + tutorials
twitter.com/hadleywickham: Follow Hadley to stay up to date on R and data science Learning Objectives:
The objective of this course is for students to build fundamentals of predictive business analytics.
Because business analytics has applications in finance, marketing, and operations, the course covers examples and includes practical exercises in a variety of areas, such as:
Instructor: Mostafa Rezaei Email: [email protected] Lectures: Bus B18, Mon. & Wed. 11 am - 12:20 pm
Office hours: TBA Office: 4-27
Lab sessions: TBA Lab: B18
Accounting, Operations, and Information Systems
• Finance – how can we predict the risk of an investment?
• Marketing – which customers should we target directly, given their predicted probability of responding to a certain advertisement?
• Operations – how should we predict no-show probability of customers?
The course’s goal is twofold:
• To equip a proficient analyst with required practical skills for his/her role: give students the knowledge about how to extract data from relational databases, prepare the data for analysis, build basic predictive models using data mining software, and prepare reports that are easily understandable by managers
• To supply an essential skill for a qualified manager: give students the knowledge to interpret reports and recommendations that a manager might receive from business analysts, and to decide the best course of action
Since understanding the past is a basis for predicting the future, this course covers the following two dimensions of business analytics:
• Descriptive analytics, which “uses data to figure out what happened in the past”
• Predictive analytics, which “uses data to find out what could happen in the future”
The third dimension of business analytics is prescriptive analytics, which “uses data to prescribe the best course of action to increase the chances of realizing the best outcome”. It is covered by other OM courses such as OM461, OM471, and OM422.
Topics covered:
• Programming Basics:
- R basics
- Data Structures
• Descriptive Analytics:
- Data Transformation - Relational Databases - Visualization
- Exploratory Analysis - K-means Clustering - Hierarchical Clustering
• Predictive Analytics:
- Basics of Predictive Models - Cross-Validation
- Logistic Regression
- Classification and Regression Trees - Bagging, Random Forest, Boosting
Depending on time, we might cover other topics such as: Regularization or Recommendation Systems.
Programming Background:
The class does not assume any programming background but we will be using the R
programming language extensively. We will cover the basics of programming and R in the first few lectures and we might have optional lab sessions with the instructor to help students that require more practice.
Evaluation:
• Assignments: 30%
- Five individual (i.e., not group) assignments. Each will count towards your final mark (i.e., no assignment is dropped).
- No late assignment is accepted.
• Midterm: 30%
- It is an open-book exam, tentatively on March 4 (Monday).
• Final group project: 35%
- Teams of 2. Undergrads with undergrads, MBAs with MBAs. In case of odd number of students, only one group of 3 will be allowed.
- Project presentation 10% (last week of class), project write-up 25% (due: April 14 at 11pm)
- More details will be given during the course. However, the project will consist of finding interesting information in a real-world data set and using this information to make
predictions for better decisions.
• Participation: 5%
- Participating in activities during lectures
Final grades will be based on the overall class performance. OM620 students are evaluated separately from students in OM420.
Course Absence Policy:
If a student is absent and is unable to complete the group project/exam within the given guidelines due to illness, the following procedures shall apply. In the case of group work, no extension will be provided to the group unless there are indications that all group members were incapacitated by illness. If the mid-term exam is missed, no alternate exam will be available;
instead, a half of the weight of the mid-term will be assigned to the group project and the other half to the assignments.
Web Site General Information:
The address of the web site is https://eclass.srv.ualberta.ca/
You can access the course web site with your CCID and password. Please contact IST (780-492-9400) or the [email protected] for assistance if you do not have your CCID or password.
Accommodating Disabilities:
Students who require accommodations in this course due to a disability affecting mobility, vision, hearing, learning, or mental or physical health are advised to discuss their needs with Specialized Support and Disability Services, 2-800 Students' Union Building, 492-3381 (phone) or 492-7269 (TTY) and to contact me as soon as possible so that we can discuss appropriate arrangements.
Academic misconduct:
Students who commit any act of plagiarism, cheating, or misrepresentation in this course will be penalized. All assignments (except for the group project) are to be completed individually.
However, I recognize the value of studying together and comparing notes when working on assignments. To help you judge what I consider acceptable and non-acceptable collaboration, consider the following.
Do:
• Discuss the course material with other students
• Ask classmates for help when you are stumped
• Offer help to other students
• Do your own work.
Don’t:
• Discuss numerical answers with other students
• Use someone else's words without proper attribution
- The best way to avoid using another student's words is to never look at another student's written answers to an assignment
- If you cite an article, book, web page, or any other source in your project report, then you must include complete information about that source
• Copy another student's spreadsheet file, sql file, or any other computer file - There are no exceptions to this rule. Copying another student's file for an
assignment (or another group's work, for the group project) is not acceptable, under any circumstances. It is irrelevant whether the copying is done electronically or manually
The University of Alberta is committed to the highest standards of academic integrity and honesty. Students are expected to be familiar with these standards regarding academic honesty and to uphold the policies of the University in this respect. Students are particularly urged to familiarize themselves with the provisions of the Code of Student Behaviour (online at
www.ualberta.ca/secretariat/appeals.htm) and avoid any behaviour which could potentially result in suspicions of cheating, plagiarism, misrepresentation of facts and/or participation in an
offence. Academic dishonesty is a serious offence and can result in suspension or expulsion from the University.
Academic dishonesty in this course will be prosecuted severely. See the Frequently Asked
Questions (on course web) for some guidelines on what we consider acceptable and unacceptable behaviour.
Policy about course outlines can be found in §23.4(2) of the University Calendar.
TENTATIVE SCHEDULE
Note: This is a general guideline for the semester. To accommodate student interests we may find it necessary to alter the schedule as the semester progresses. Topics will be covered in sequence;
however, it may be necessary to go faster or slower than indicated.
Date Topic Due Dates
Jan 7 Introduction + Programming in R Jan 9 Programming in R
Jan 14 Data Structures: Vectors, Matrices, and Lists Jan 16 Data Structures: Data frames
Jan 21 Data Structures: Data frames HW1 due: Jan 20 11 pm
Jan 23 Data Transformation 1: filter(), arrange(), select(), mutate() Jan 28 Data Transformation 2: group_by(), summarise()
Jan 30 Data Transformation 2: group_by(), summarise()
Feb 4 Relational data: Keys, mutating joins HW2 due: Feb 3 11 pm Feb 6 Relational data: mutating joins, filtering joins
Feb 11 Visualization
Feb 13 Exploratory data analysis
Feb 18 No class - Holiday HW3 due: Feb 17 11 pm
Feb 20 No class - Reading week Feb 25 Exploratory data analysis Feb 27 Review for midterm Mar 4 Midterm exam (tentative) Mar 6 Introduction to predictive models
Mar 11 K-means clustering and Hierarchical clustering Mar 13 Classification: Logistic regression
Mar 18 Classification: Evaluating classifiers HW4 due: Mar 17 11 pm Mar 20 Cross-validation
Mar 25 Decision trees: Regression trees and Classification trees Mar 27 Decision trees: Bagging, random forests, boosting
Apr 1 Recommendation system HW5 due: Mar 31 11 pm
Apr 3 Q&A Session for Final Project Apr 8 Student Group Project Presentations Apr 10 Student Group Project Presentations
Project write-up due:
Apr 14, 11pm