Login / Register

Shopping cart

Data Engineering on AWS

Category:

Last Update:

October 7, 2025

Accredited by:

Course Overview:

To earn the AWS Certified Data Engineer – Associate certification, participants are required to complete the Data Engineering on AWS course. This comprehensive learning course is designed to provide a solid foundation in data engineering principles, tools, and best practices to prepare you to perform the data engineering role in the AWS Cloud. Through this course, you will progress from an introduction to the role itself, to diving deeper into building solutions using AWS services.

Course Objectives:

By completing this learning course, you will gain knowledge and skills in core data-related AWS services, in the ability to implement data pipelines, to monitor and troubleshoot issues, and to optimize cost and performance in accordance with best practices.

Expected Outcomes:
Upon successful completion of this training, participants will be able to:

  • Understand the Data Engineering Role & Ecosystem:
    • Define the responsibilities of a data engineer.
    • Identify key personas and collaboration requirements
  • Design and Build Scalable Data Architectures:
    • Evaluate and select AWS services for data lakes, data warehouses, and streaming architectures.
    • Apply best practices for designing secure, cost-efficient, and scalable solutions.
  • Develop and Automate Data Pipelines:
    • Orchestrate and automate batch and streaming data pipelines using services like AWS Glue, EMR, Step Functions, and Kinesis.
    • Implement CI/CD and IaC tools (e.g., AWS SAM, CloudFormation) for pipeline automation.
  • Secure, Monitor, and Troubleshoot Pipelines:
    • Apply AWS security tools and practices to secure data solutions.
    • Use AWS monitoring and alerting services to track performance and troubleshoot issues
  • Optimize Performance and Costs:
    • Leverage tools for performance tuning, cost analysis, and optimization (e.g., Redshift tuning, AWS Cost Explorer).
    • Automate scaling, fault-tolerance, and pipeline improvements.
  • Hands-on Labs for Real-World Implementation:
    • Practice building batch and streaming pipelines using Amazon S3, Athena, Redshift, EMR, Glue, Kinesis, and MSK.
    • Gain experience solving real analytics challenges through guided labs.

Target Audience

The target audience for a Data Engineer on AWS includes professionals who build and manage data pipelines in the cloud, such as data engineers, cloud engineers, ETL developers, and tech leads.

Prerequisite:

We recommend that attendees of this course have fundamental knowledge of:

  • Completed AWS Cloud Practitioner Essentials or equivalent.
  • Prior experience with AWS core services.
  • Programming experience in any one of the following languages: Python, .NET, Java.

 

 

 

Day 1

Module 1: Foundations – Roles and Concepts

  • Introduction
  • Data Discovery
  • AWS Data Services and Modern Data Architecture
  • Orchestration and Automation Options

Module 2: Foundations – Tools and Considerations

  • Continuous Integration and Continuous Delivery Tools
  • Infrastructure as Code Tools
  • AWS Serverless Application Model
  • Networking Considerations
  • Cost Optimization Tools

Module 3: A Data Lake Solution – Building a Data Lake Solution

  • Set Up Storage
  • Ingest Data
  • Build Data Catalog
  • Transform Data
  • Serve Data for Consumption

Lab 1: Setting up a Data Lake on AWS

  • Use Amazon S3 as the storage layer of a data lake.
  • Organize data into layers (or zones) in Amazon S3.
  • Configure an S3 event notification to invoke an AWS Lambda function.
  • Create an Amazon EventBridge rule to invoke the Lambda function.

Day 2

Module 4: A Data Lake Solution – Optimizing and Securing a Data Lake Solution

  • Open Table Formats
  • Security Using AWS Lake Formation
  • Troubleshooting

Lab 2: Automate Data Lake Creation using AWS Lake Formation Blueprints

  • Create an AWS Glue workflow using a Lake Formation blueprints
  • Automate the Lake Formation data lake setup process with an AWS Glue workflow
  • Create a custom AWS Glue workflow.

Module 5: A Data Warehouse Solution – Building a Data Warehouse Solution

  • Designing the Data Warehouse Solution
  • Ingesting Data
  • Processing Data
  • Serving Data for Consumption

Lab 3: Setting up a Data Warehouse using Amazon Redshift Serverless

  • Create a data warehouse with Amazon Redshift
  • Create a schema.
  • Create a table.
  • Load the table with sample data.

Module 6: A Data Warehouse Solution – Optimizing and Securing a Data Warehouse Solution

  • Monitoring and Optimizing Options
  • Orchestration Options
  • Security and Governance Options

Lab 4: Managing Access Control in Redshifts

  • Create and manage users and roles
  • Apply and manage column-level security
  • Apply and manage row-level security
  • Configure dynamic data masknig.
  • Review audit logs.

Day 3

Module 7: A Batch Data Pipeline Solution – Building a Batch Data Pipeline

  • Designing the Batch Data Pipeline
  • Ingesting Data

Module 8: A Batch Data Pipeline Solution – Implementing the Batch Data Pipeline

  • Processing and Transforming Data
  • Cataloging Data
  • Serving Data for Consumption

Lab 5: A Day in the life of a Data Engineer

  • Create an AWS Glue crawler.
  • Create and run a job in AWS Glue Studio
  • Explore permissions required to run AWS Glue crawlers and AWS Glue Studio jobs.
  • Query the AWS Glue Data Catalog using Amazon Athena.

Module 9: A Batch Data Pipeline Solution – Optimizing, Orchestrating, and Securing Batch Data Pipelines

  • Optimizing the Batch Data Pipeline
  • Orchestrating the Batch Data Pipeline
  • Securing and Governance of the Batch Data Pipeline

Lab 6: Orchestrate data processing in Spark using AWS Step Functions

  • Use Amazon Simple Storage Service (Amazon S3) Event Notifications and AWS Lambda to automate the batch processing of data.
  • Use the Step Functions state machine language to:
    • Create an on-demand Amazon EMR cluseter.
    • Add an Apache Spark step job in Amazon EMR and create an Amazon Athena table to query the processed job.
    • Add an Amazon SNS topic to send a notification.
  • Validate a Step Functions state machine run.
  • Review an AWS Glue table and validate the processed data using Athena.

Day 4

Module 10: A Streaming Data Pipeline Solution – Building a Streaming Data Pipeline Solution

  • Ingesting Data from Stream Sources
  • Storing Streaming Data
  • Processing Data
  • Analyzing Data

Lab 7: Streaming Analytics with Amazon Managed Service for Apache Flink

  • Build a real-time streaming analytics pipeline in Managed Apache Flink Studio using Apache Flink and Apache Zeppelin to ingest, enrich, and analyze the clickstream data with catalog data stored in Amazon S3.
  • Perform interactive data analytics and visualize using Apache Zeppelin notebooks with Managed Apache Flink Studio.
  • Output the data to a Kinesis data stream for further downstream processing depending on operational needs.

Module 11: A Streaming Data Pipeline Solution – Optimizing and Securing a Streaming Data Pipeline Solution

  • Optimization
  • Security and Governance

Lab 8: Introduction to Access Control with Amazon Managed Streaming for Apache Kafka

  • Publish to and consume from an MSK cluster using IAM authenticated broker Uniform Resource Locators (URLs) with a Java demo producer and Java demo consumer.
  • Learn about the IAM method to authenticate and authorize users of an MSK cluster.

Course Wrap-up

  • Course overview
  • AWS training courses
  • Certifications
  • Course feedback

Price From:

RM5,400.00

Enquire for More Info

Courses You May Like

CERTIFICATION

Microsoft Word Foundation

1 Day

Fundamental

RM300.00

CERTIFICATION

Generative AI for Executives

0.5 Days

Fundamental

RM1,800.00

CERTIFICATION

Developing Serverless Solutions on AWS

3 Days

Intermediate

RM5,400.00