Published

July 7, 2012

Data Engineer

Here is a day in the life of a Data Engineer:

  • Gathering data from disparate sources.
  • Integrating data into a unified view for data consumers.
  • Preparing data for analytics and reporting.
  • Managing data pipelines for a continuous flow of data from source to destination systems.
  • Managing the complete infrastructure for the collection, processing, and storage of data.

To be successful in their role, Data Engineers need a mix of technical, functional, and soft skills.

  • Technical Skills include working with different operating systems and infrastructure components such as virtual machines, networks, and application services. It also includes working with databases and data warehouses, data pipelines, ETL tools, big data processing tools, and languages for querying, manipulating, and processing data.
  • An understanding of the potential application of data in business is an important skill for a data engineer. Other functional skills include the ability to convert business requirements into technical specifications, an understanding of the software development lifecycle, and the areas of data quality, privacy, security, and governance.
  • Soft Skills include interpersonal skills, the ability to work collaboratively, teamwork, and effective communication.

This could turn out to be a large section and I’ll attempt to divide it into logical pieces. Here is an overview of the topics I’ll cover:

Data


Shell Outline


ETL, ELT & Data Pipelines Introduction


ETL vs ELT Basics

ETL - Extract

ETL - Transform

ETL - Load

ETL - Pipelines

ELT

ETL vs ELT

ETL Workflow

Staging Areas

DAG

Data Pipeline Overview

Pipeline Definition

Latency

Throughput

Use cases

Stages of Data Pipeline Processes (DPP)

Batch processing

Streaming processing

Micro-batch

Batch vs Stream

Lambda architecture