Michael Okarimia's Code Blog

Socrates UK 2016

Posted by Michael Okarimia in Conferences on August 17, 2016

I was fortunate to attend Socrates UK 2016 this year, which was hosted in the beautiful Wotton House, near Dorking in Surrey.

I’ve not attended a Socrates conference before but I’d heard great things about it, as Socrates has at heart the idea of promoting software craftmanship.

After arriving in the early evening and after hearty meal we were ushered into the main conference room for the evening Lightning Talks.

The two talks from that evening that stuck in my mind was one about Team smells, which has nothing to with hygiene but are signs that a team is not operating optimally. A rather informative mind map of Team smells was created and discussed

Franziska Sauerwein showed her presentation of Outside in TDD, something I’ve done throughout much of my career

As I met more and more attendees, I was struck by how friendly and welcoming they were to newcomers like myself. It was their desire to make software a craft rather than getting code out of the door as soon as possible, and attempting to improve professionalism of the industry.

Day One

Using the Unconference format, the agendas of the conference was unplanned before the conference start, where sessions were decided upon at the beginning of each day of the conference.

The on the first day of sessions I opted to attend the following:

Anti-patterns anonymous

This was a session where the attendees arranged themselves to sit in a large circle and take turns to share their tales of software anti-patterns. It generally was a case of recounting war stories of incompetence and project failure, as we all took turns to tell our account of when a project didn’t go well, and then what we thought could have solved the problem.

Now I had an appetite for some technical sessions, so went on to attend a session about an application built on the SMACK platform

SMACK is an acronym listing the stack if technologies used in a single platform namely:

Spark, a distributed data processing framework, often used to analyse large datasets
Mesos, a distributed Linux kernel designed to operate on many nodes and host distributed systems
Akka, a Java runtime framework optimised for message based distributed systems
Cassandra, a highly scalable database
Kafka, a high throughput distributed messaging system

The application of this stack was used as a monitoring tool to log and analyse metrics from cloud based servers.

Having worked on a application at 7digital which uses Kafka I was interested to hear the problems that other session attendees had encountered and solved in their own platforms.

The next session I attended was titled Microservices vs Monoliths which became a discussion over the problems encountered with either architecture. I related my experiences of how breaking out a monolithic application into smaller APIs whilst sharing the same data store was not ideal, and how one had to consider fault tolerance when doing so.

Mashooq Badar of Codurance led a session on Serverless Architecture which I found very interesting. He talked about how his team built AWS Loft website was created only in AWS tech, namely using the Lambdas and API Gateway

One of the main advantages of using Lambda is the very low cost, rather than hosting your application inside EC2 or Beanstalk instances, one can pay for the resources used to execute the requests. Amazon will bill you appropriately per request, rather than for the resources used to maintain the uptime of your instances if they are not used.

After the hour was up, I moved onto a session titled Mentorship Patterns which is a role I’ve performed at 7digital, helping apprentices improved their skills and become more proficient software developers. I learnt that I could improve myself as a mentor by setting the what expectations I’d expect of the apprentice from the outset. Giving regular feedback was also essential, and this was something I’d do via regular 1-2-1’s.

As the end of the first day wound up, dinner was served, and in the evenings the lighting talks began. Attendees were encouraged to perform a short 5 minute talk in the evenings.

Domain Driven Development Strategic Patterns was a great talk by @Ouarzy whos blog is well worth reading

Radical Candor: Training guidance vs feedback. I wanted this talk to continue for more than its allotted five minutes as the concept of Radical Candor is to tell your team members constructively that they need to improve. The talker linked to this excellent article:
It sounds so simple to say that bosses need to tell employees when they’re screwing up. But it very rarely happens./

Forty Days of fixing, by @suzyhamilton commit to make a single change to a project to improve it, one change a day. Small changes to large messy project can slowly make it better, by following the boy scout rule; when making a change, always leave that part of code base in a better state than when arrived in it.

Discussion of the book Non-violent Communication

There was a story of introducing Agile into an enterprise waterfall project, which lead to a discussion of the book The Phoenix Project, a novel about how an IT project was turned around to save the company. It’s heavily inspired by the classic novel on the Theory Of Constraints by Eliyahu M. Goldratt, The Goal and has a contemporary setting.

Antony Marcano finished the lighting talks with by demoing how to applying SOLID principles to PageObjects when writing Acceptance Web Tests using Selenium & webdriver.

Day Two

Serverless architecture was a subject the had piqued my interest so I spent a double session working on a hands on exercise

Mashooq Badar led a hands-on lambda session where we set up our own AWS web app powered by lambda. The lab was based upon his blog post on codurance’s site

When build applications on AWS it’s worth considering how to make them fit easily within the AWS ecosystem.

Amazon have created a guide to show how to do this

During the session I pushed my version of the lambda gateway application up up on my github account for future reference. This was probably my favourite technology session at SocratesUK.

Moving back onto the soft skills required to be a good developer, I headed over to join the session titled YOUâ€™RE a developer?!
which was a discussion on the lack of diversity in technology sector, in particular the lack of women. There is still a cultural barrier that puts women off a industry sector which could do much more to become more professional. There is still a lot of sexism that goes unchallenged and we exchanged incidents of this occurring. Then we moved onto ideas around encouraging the changes in attitude that can help the situation. There’s clearly much more that can be done.

After another hearty dinner it was time again for the final five minute long Evening Lightning Talks

My former colleague Matt Butt spoke about the The Dangers of Empathy, and Emotional Contagion, and how to avoid Empathy Burn out. He recommended have a chat/slack room to vent one’s negative feelings. One should try to cultivate compassion without getting emotionally involved.

Finally Matthew Forrester demoed his code which could create diagrams of database schema from a YAML file

Socrates UK was the best conference I’ve attended. It really opened my eyes to the software craftmanship movement, some of the practices I was familiar with and use every day, but software craftmanship seems to tie them together succinctly. The attendees I met really seemed to care about writing great code that solves the right problem, and were a friendly and welcoming bunch to boot.

I want to attend next year’s conference and I wholeheartedly encourage others to do so as well.

Further reading; information I learnt during the conference:

Tips for allow software developers to develop and grow. Seems to be informed by one of my favourite software books, Peopleware by Tom DeMarco and Tim Lister

Useful links:

Monitoring tool with SMACK architecture: instana.com

Secor is a tool to move Kafka logs into S3, created by pintrest

Software Craftsmanship Newsletter was created by @alebaffa
https://github.com/lscc/socrates-uk/wiki

Software craftmanship slack room

Projects: mixter, learn CQRS via Koans(!)

No Comments

RideLondon 2016

Posted by Michael Okarimia in charity on August 14, 2016

On 31 July I participated in the 2016 Prudential RideLondon cycling event, a closed road 100 mile course along the streets of London and the hills of Surrey.

I completed the course in 4 hours 57 minutes, 53 minutes faster than my time last year so I was very happy with my improved performance.

I set out to raise money for Haematology Cancer Care, a charity operating in UCL Hospital in Euston, London.

Recovering from the ride in Green Park, after finishing the ride on the Mall

On my fund raiser page I raised more than Â£800 for Haematology Cancer Care

Boxhill

Here are my cycle stats with all the juicy ride data:

I couldn’t have done it without the great team of fellow cyclists working together in a mini peloton!

The Finishing Team!

No Comments

Improving search results at 7digital

Posted by Michael Okarimia in 7digital on April 14, 2016

Developing the search & catalogue infrastructure for the 7digital API

Technological Objectives

The biggest problem with the old search platform was that in January 2014, the average track per search response time was 4000 milliseconds. In addition to being slow, the search results were often wrong, out of date or would return errors. Customers feedback was that they felt there was a poor user experience and were often irritated by the constant feed of error messages.

The meta data of the tracks was stored in a search index that was 660Gb in size, containing 660 mil documents, which is extremely large compared with a number of search indexes. Various tweaks were made to JVM and memory settings were made, but these failed as there was no permanent improvement. An extensive investigation was carried out on the search platform. It was discovered that the previous schema in production was indexing fields like track price which were never actually searched upon. A prototype was created to come up with a much smaller search index, with a size around 10 Gb. Benchmarked against the original size of 660Gb, this is a clear improvement.

Technological Advancements

This new smaller search index has created a number of technological advances:

being reliant on the ~/track/details endpoint meant we always returned current results, and we were 100% consistent with the rest of the API, which eliminated the catalogue inconsistencies problem;

we could create a brand new index within an hour, meaning up to date data;

much faster average response times for ~/track/search, reduced the response time by 88% from around 2600ms to around 350ms;

no more deleted documents bloating the index, thus reducing the search space;

longer document cache and filter cache durations, which would only be purge at the next full index every 12 hours, this helped performance;

quicker to reflect catalogue updates, as the track details were served from the SQL database which could be updated very rapidly.

Technological Uncertainties

Would the existing hardware cope with current levels of traffic with the new architecture?
Would new architecture provide consistent responses across different endpoints, I.E. search results and catalogue results, chart endpoints?

Innovations

We created a Git repository on Github with public access; anyone can submit to us a pull request to add new synonyms for our search platform. We can choose to accept the search synonym to our platform and the change will be effected on our search API within 12 hours of our acceptance of change. Repository is here: https://github.com/7digital/synonym-list

Some labels deliberately publish tribute, sound-alike and karaoke tracks with very similar names to popular tracks, in the hope that some clients mistakenly purchase them. These tracks are then ingested into our platform, and 7digital’s contract with those labels means we are obliged to make them available. At the same time, consumers of our search services complain that the karaoke and sound-alike artists are returning in the search results above the genuine artists, mostly because of the repeated keywords in their track and release titles.

In order to satisfy both parties, we decided to override default Lucene implementation of search and exclude tracks, releases and artists that contained certain words in their titles, unless the user specifically entered them in as search term. For example, searching for “We are the champions” now returns the tracks by the band Queen, which is what customers expect. To achieve this we tweaked the search algorithm so all searches by default it will purposefully exclude tracks with the text “tribute to” anywhere in their textual description, be it the track title, track version name, release title, release version name or artist name.
The results look like this: https://www.7digital.com/search/track?q=we%20are%20the%20champions%20queen

Prior to the change, all tribute acts would appear in the search results above tracks by the band Queen. To allow tribute acts to still be found, the exclusion rule will not apply if you include the term “tribute to” in your search terms, as evidenced by the results here: https://www.7digital.com/search/track?q=we%20are%20the%20champions%20tribute%20to%20queen

Other music labels send 7digital a sound-alike recording of a popular track, and name it so it’s release title and track tile duplicate the title of a well known track. This would mean that searching for “Umbrella” by Rhianna

Search: Dumb similarity modification of Lucene. Lucene is a capable search engine which specialises in fast full text searches, however the documents it is designed to search across work best when they are paragraph length containing natural prose, such as newspaper articles. The documents that 7digital add to Lucene are models of the metadata of a music track in our catalogue, in the form of as follows:

Standard implementation of Lucene will give documents containing the same repeated terms a higher scoring match than those that contain a single match. This is means when using the search term: “Michael” results such as “The Best of Michael Jackson” by “Michael Jackson”, will score higher than “Thriller” by “Michael Jackson” because the term “Michael Jackson” is repeated in the first document, but not the second.

In terms of matching text values this makes sense, but for a music search we want to factor in popularity of our releases based on sales and streams of it’s tracks.

Ignoring popularity leads to a poor user experience; since “Best of Michael Jackson” release is ranked as the first result, despite being much less popular than “Thriller”which is ranked lower in the search results.

This was achieved by modification of the Lucene’s term frequency weighting in the similarity algorithm

search, SOLR

No Comments

Michael Okarimia's Code Blog

Socrates UK 2016

RideLondon 2016

Improving search results at 7digital

Most Recent Posts

Archives

Categories

Search posts