• Course overview
  • Course details
  • Prerequisites

Course overview

About this course

This 2-day course introduces learners to the data integration capability of Google Cloud using Cloud Data Fusion. In this course, we discuss the challenges of data integration and the need for a data integration platform (middleware). We then examine how Cloud Data Fusion can help effectively integrate data from a variety of sources and formats and generate insights. We look at the main components of Cloud Data Fusion and how they work, how to process batch and streaming data in real time with visual pipeline design, rich metadata and data lineage tracking, and how to deploy data pipelines on various runtime engines.

Audience

  • Data Engineer
  • Data Analysts

Show More Show Less

Course details

Module 1: Introduction to Data Integration and Cloud Data Fusion
  • Data integration: what, why, challenges
  • Data integration tools used in the industry
  • User personas
  • Introduction to cloud-based data fusion
  • Data Integration Critical Capabilities
Module 2: Building Pipelines
  • Cloud Data Fusion architecture
  • Core concepts
  • Data pipelines and directed acyclic graphs (DAG)
  • Pipeline Lifecycle
  • Designing pipelines in Pipeline Studio
Module 3: Designing Complex Pipelines
  • Branches, merging and joining
  • Actions and Notifications
  • Error handling and macros
  • Pipeline configurations, scheduling, import and export
Module 4: Pipeline Execution Environment
  • Scheduling and triggers
  • Execution environment: Compute profile and provisioners
  • Monitoring pipelines
Module 5: Building transformations and preparing data with Wrangler
  • Wrangler
  • Directives
  • User-defined directives
Module 6: Connectors and Streaming Pipelines
  • Understand the data integration architecture.
  • List various connectors.
  • Use the Cloud Data Loss Prevention (DLP) API.
  • Understand the reference architecture of streaming pipelines.
  • Build and execute a streaming pipeline
Module 7: Metadata and Data Lineage
  • Metadata
  • Data lineage
Module 8: Summary
  • Course summary

Show More Show Less

Prerequisites

Before attending this course, students should have:

  • To get the most out of this course, participants are encouraged to have: Completed Google Cloud Fundamentals: Big Data and Machine Learning (GCF-BDM)

Our Technology Partners

Spectrum Networks is the Authorised Learning Partner for some of the leaders in IT technology for Digital Transformation