Data Engineering(Degree)
Build data pipelines and maintain the infrastructure required to process and store large datasets with our quizzes. Master data integration, data transformation, and data warehousing. Prepare to architect the infrastructure that supports data analytics
Big Data Ecosystems with Apache Spark
This course provides comprehensive training in Apache Spark, the leading unified analytics engine for large-scale data processing. Students learn to process petabytes of data across clustered computers using Spark's core APIs (RDD, DataFrame, Dataset) for batch processing and Spark Streaming for near-real-time analytics. Mastery of Spark is essential for Data …
Cloud Data Engineering on AWS and Azure
This course focuses on building and deploying data pipelines using cloud-native services from major providers like Amazon Web Services (AWS) and Microsoft Azure. Students gain hands-on experience with services such as AWS Glue, Azure Data Factory, Redshift, Synapse Analytics, and cloud object storage (S3, ADLS). As Zambian enterprises and government …
Data Engineering Capstone: Production System Build
This culminating project course requires students to integrate all acquired knowledge to design, build, document, and present a fully functional, production-like data engineering system. Working with a real or simulated dataset from a Zambian context (e.g., utility data from ZESCO, mobile data from MTN), students will implement a complete pipeline—from …
Data Engineering Fundamentals and Pipeline Design
This foundational course introduces the core principles, roles, and responsibilities of a Data Engineer within the modern data ecosystem. Students learn the data pipeline lifecycle—from ingestion and transformation to storage and serving—and compare batch versus streaming architectures. The course establishes the critical importance of reliable, scalable data infrastructure for Zambian …
Data Governance, Quality, and Operations (DataOps)
This course addresses the non-technical pillars of successful data engineering: governance, quality assurance, and operational excellence (DataOps). Students learn to implement data lineage tracking, define and monitor data quality rules (completeness, validity, consistency), and establish CI/CD practices for data pipelines. In Zambia's regulated sectors like Banking and Telecommunications, or for …
Data Modeling and Warehousing Techniques
This course covers the systematic design of data storage systems for analytical processing, moving beyond transactional databases. Students master dimensional modeling concepts (star and snowflake schemas), learn to design fact and dimension tables, and apply data warehouse architecture patterns (Kimball, Inmon). This skill is fundamental for building the centralized reporting …
Data Pipeline Orchestration with Airflow
This course focuses on the critical operational skill of orchestrating complex, interdependent data workflows using industry-standard tools like Apache Airflow. Students learn to define workflows as directed acyclic graphs (DAGs), schedule tasks, handle dependencies, monitor pipeline health, and manage alerts. Reliable orchestration is the backbone of any mature data platform, …
Extract, Transform, Load (ETL) Process Design
This course delves into the heart of data engineering: designing, building, and maintaining reliable ETL/ELT pipelines. Students learn to extract data from diverse sources (APIs, logs, databases), apply complex transformation logic for cleaning and business rules, and load data into target systems. The course emphasizes idempotency, error handling, and data …
Python for Data Engineering and Automation
This hands-on course focuses on using Python as the primary tool for building robust, automated data pipelines. Students learn core Python programming, then specialize in libraries critical for engineering: Pandas for data wrangling, PySpark for distributed processing, SQLAlchemy for database interaction, and Apache Airflow for workflow orchestration. The ability to …