ECBS 5146 SQL and Different Shapes of Data: Term project 1

Format

The term project will be completed individually.

What to use as template / guiding material

Typically, the course material and exercises provided here suffice, particularly the chapters on Data warehouse architectures.

Goal

High level requirements

OPERATIONAL LAYER: Develop an operational data layer in MySQL. Import a relational dataset of your choice into your local instance. Choose data that is suitable for transformation into an analytical layer for further analysis. Ideally, you can utilize the results from HW1.

ANALYTICS: Draft a concise plan detailing potential analytics that can be performed with this dataset. Outline how the analytical data layer, ETL, and Data Mart would support these analytics (refer to the ProductSales example from class).

ANALYTICAL LAYER: Design a denormalized data structure based on your operational layer. Create the corresponding tables in MySQL.

ETL PIPLINE: Construct an ETL pipeline using Triggers and Stored procedures. Ensure each element of ETL (Extract, Transform, Load) is demonstrated.

DATA MART: Establish Views as data marts.

*Optional: create Materialized Views with Events for some of the data marts.

Delivery

Store and submit project artifacts in a folder named “Term1” in your dedicated DE1 GitHub repo. (You can store here your HW results and other artifacts as well.)

For naming conventions and structure, you have the freedom to make your own choices unless it was implicitly covered in the course (Operational layer data modeling and documentation with Enhanced Entity-Relationship (EER) diagrams or Analytical layer modeling and documentation using Star Schema or similar). I encourage you to research online and maintain consistency and coding conventions.

Testing is optional, as it was not covered in the course. However, be aware that testing might be crucial part of real world enterprise project delivery.

For documentation utilize GIT markdown and comments in SQL files or other coding artifactifacts

The project should be reproducible in a straightforward manner. In other words, any reviewer with github link should be able to run your code and achieve the same results as you without changing the code.

Grading criteria

Extra points 10 pts max [Please explicitly highlight in your documentation and code comments, if you introduced something extra into your project]

Submission

Before you begin the project, submit your planned dataset here. You may start once the Instructor approves it in the right column.
Submit your GitHub link (and nothing else) to Moodle when you are finished (ensure the repository is public!) The deadline for the term project is November 6th, EOD.