ECBS 5146 SQL and Different Shapes of Data

Instructors

Laszlo Sallo (sallol@ceu.edu)

Contributors

Laszlo Sallo is an IT Manager with a Software Engineering background. Over the past 20+ years, he participated in more than 150 international projects. As Delivery Head of EPAM Systems, Laszlo is leading a large portfolio of projects, including data-focused projects, combining his skills in Agile Project Management and Data Engineering. Laszlo holds an M.S. in Business Analytics from Central European University. He is an instructor at CEU since 2016.

Prerequisites

Students should take ECBS5154 - Mathematics and Informatics Pre-session for Business Analytics

Schedule

Setup Download files required for the lesson
00:00 1. Course intro. First steps in SQL. What is data engineering and why it is important for data analysts?
How to access a data set, stored in a database?
How to load various data files (eg. csv) into database?
01:30 2. DB administration. Queries. How do you make sure your data stored, can be accessed only by authorized persons?
How you query your database to obtain data required for your analytics?
03:00 3. Aggregations. As analyst, one the most important operation you do, is data aggregation. How SQL supports data aggregation?
04:30 4. Data modeling. Joins. How do you design/model a database?
In normalized database, the data is structured in a way to avoid data redundancy and support consistency. As analyst, this structure is not always the best fit for analytics. How do you merge one or more tables to get the required data set?
06:00 5. Stored Procedures. How can simplify the maintainable of long queries with similarities?
How can you build business logic with SQL?
How can you clean/fix a database with corrupted/bad quality data unfit for your analytics?
07:30 6. Data warehouse architectures. What is a Data Warehouse?
As analyst, how can you create a simple analytical platform using SQL DB?
09:00 7. Different Shapes of Data. APIs. What the popular flat file formats used by data analysts?
How I get data from the internet?
10:30 8. Command Line Exercises. How can an analyst use bash to explore/modify data stored in a flat file?
How can an analyst build simple bash tool to perform basic (but fast) analytics?
Can be Linux bash used on Windows/MacOS?
12:00 9. Knime. How the data science landscape looks like in regard to tools?
How to implement an end-to-end data analytics workflow integrated with multiple sources?
How to do rapid prototype a data analytics workflow with low code tools, opposed to using a programing language (R,Python etc)?
13:30 10. Data Architecture Tradeoffs. Why 80% of classical Data Warehouse projects fails?
Why RDBMS is less trendy these days?
What are the expected characteristics of a data store these days?
What are the limitation of a classical RDBMS and how the “New tools” are solving these limitations?
15:00 11. NoSQL. What is polyglot persistence and how solves the data storge problems of the present days?
What are the common treats of NoSQL solutions?
How can a data analyst choose the appropriate data solution tailored to her/his task?
18:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.