What I’ve learnt at Skyscanner in my first three years

I joined Skyscanner in Jan 2018 as a Data Engineer in the Data tribe. It’s been a good experience and a massive learning curve. Despite the impact of the Coronavirus Pandemic occurring just as I felt I was getting into the swing of things, this has given me an opportunity to look back.

Here’s a little summary of what I learnt in my first three years:

Operational work, toil, addressing errors making on call sane rotations are important, particularly during a time when there was a high staff turnover (due to the aviation sector shutting down during the peaks of the pandemic)
Protobuf Schema design is hard. Protobuf is used to define schemas of all events sent to the Data platform and subsequently how they are stored as tables in Hive Metastore for later analytical querying. Whilst one can deprecate a field, one can not delete it, so schema design must ensure it is backward compatible and anticipates future needs.
Trying to run a Kafka cluster on EC2 instances is a major undertaking. Avoid doing so where possible and use managed services such as AWS MSK or Kinesis instead).
Streaming data is not always necessary. Use ETL batch processes if you can since they are repeatable.
Metadata catalogue of datasets is key for Data Governance and also Data Quality. Being able identify the producers and data lineage of a table is very important.
I learnt about how to build and run software which is Sarbine Oxley Compliant as well as how to implement GDPR Subject Access Request compliance and applying a relevant data Retention period.
Cloud Cost monitoring in AWS is something that engineers need to be aware of to ensure cost are kept under control. This is where I learnt about AWS Cloudhealth.
Good Documentation is key skill. Clear READMEs should document how to run the repo, test and build and deploy it. This will speed up onboarding new joiners
Pairing remains an important skill to spread knowledge in a team and complete projects faster

Books worth reading:

Designing Data Intensive Applications by Martin Kleppman is excellent and well worth your time:

I would that Web Operations by John Allspaw gave a good overview of how to run highly scaleable web services. This gave me a good intellectual frame work for understanding how use Service Level Agreements, Service Level Indicators, Service Level Objectives to run a highly available service.

Examples of good practise when running a large Web Operation such as documentation, Retrospectives, Blame free Incident debriefs. Ensuring Runbooks exist and are up to date so the people responding to incidents have all the relevant information available to them.

Addressing Technical Debt is always going to a be a balance between new feature work and removing blockers to future changes. I found Michael Feather’s book, “Working Effectively With Legacy Code very insightful in how manage it.

There’s another project in the works so here’s a to a better year in 2022!

Skyscanner

This entry was posted on November 19, 2021, 14:50 and is filed under Skyscanner. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

Michael Okarimia's Code Blog

What I’ve learnt at Skyscanner in my first three years

Most Recent Posts

Archives

Categories

Search posts