Back to all careers
๐Ÿ”ง
Data ยท Mid level

Data Engineer

Build the data pipelines and infrastructure that power analytics and AI. Work with big data technologies and cloud data platforms.

Salary
$120,000
$90,000 to $160,000
Demand
Very High
Time to entry
4 to 8 months
Difficulty
Mid
A Day in the Life

What a typical day looks like

I work on the pipelines that move data from where it is created to where it is used. Most mornings start with checking that all the overnight jobs finished โ€” Airflow dashboard, dbt run logs. If something failed, that is the first priority. Then I move to the day's main work โ€” usually a new pipeline, a refactor, or a performance fix. Afternoons often have a coordination element: meeting with analysts whose queries are slow, meeting with engineers whose system needs to emit better events, meeting with product about new data they want to capture. The role is more relational than people expect. Strong data engineers are part SQL wizard, part diplomat.

Hour-by-hour

8:30
Check Airflow. One pipeline failed at 3am due to schema drift in the source. Investigate.
9:00
Patch the source schema reader to handle the new column gracefully. Re-run the pipeline.
9:30
Standup. Update on the overnight incident and the fix. Mention will add a test to prevent recurrence.
10:00
Deep work. Refactor a slow dbt model. Add incremental materialisation and partition by event_date.
12:00
Lunch with a data analyst. They complain about a frequently-joined table being slow. Take notes.
13:00
Investigate the slow table. Add appropriate clustering. Re-run benchmarks. 8x faster.
14:30
Code review. Approve two dbt model PRs from junior engineer. Suggest cleaner naming.
15:30
Architecture meeting. Discuss whether to move from Airflow to Dagster. Pros and cons on the whiteboard.
16:30
Documentation. Update the data dictionary for two newly-shipped tables. Future-Anita will thank present-Anita.
17:30
Done. Push final PR. Set Slack to away.

Skills you need

Required

PythonSQLSpark/DatabricksData WarehousingETL/ELT Pipelines

Nice to have

Azure Data FactoryKafkadbtAirflowCloud Platforms
Portfolio Projects

Build these to stand out

Hands-on projects beat any CV bullet point. Pick one and finish it.

Intermediate 2 to 3 weekends

ETL Pipeline with Airflow + dbt

Pull data from a public API daily, transform with dbt, load to a warehouse (BigQuery free tier or DuckDB). Schedule with Airflow. Add tests. Document with dbt docs.

Tech: Python, Airflow, dbt, DuckDB or BigQuery, SQL
Why it helps

Demonstrates the modern data engineering stack. Solid portfolio piece.

Advanced 3 to 4 weekends

Streaming Data Pipeline

Build a Kafka producer/consumer system. Produce events to one topic, consume and aggregate to another. Add windowed aggregations with Kafka Streams or Flink.

Tech: Kafka, Python, Docker, possibly Flink
Why it helps

Streaming is harder than batch. Companies need senior engineers who can do it.

Intermediate 1 to 2 weekends

Open-Source Contribution

Pick a data engineering tool (dbt, Airflow, dlt, Meltano). Find a 'good first issue' on GitHub. Submit a PR. Get it merged.

Tech: Depends on the project
Why it helps

An OSS merged PR is gold on a CV. Shows you can navigate someone else's codebase.

Help someone else find this

This is free, no ads. Share with anyone preparing for the test.