torstai 9. huhtikuuta 2020

Learnings of practicing data scientist on building AI products

What is AI product

AI product is just like all other software products with notable exception that AI provides critical part of value creation logic. Therefore building AI product requires all the same skills as other software products combined with relevant parts of the AI know-how.

The value creation logic must the clear enough to be written in simple sentences in few lines. No mumbo jumbo about AI, no technology name soup and no elaborated explanation on how AI enables this and that and undoubtedly cures the cancer too.

 

Types of AI projects - should you tackle market or technology risk first?

In practice there are two types of AI products. Amazing value proposition with high technological risk (a.k.a. moonshot project, e.g. self driving car) and moderate value proposition risk with moderate technical risk (a.k.a. "do old stuff better"). Amazing value proposition with low technical risk is rarely if ever seen practice and ones with high technical risk with moderate value proposition are not worth doing.

It is common that the same project is combination two of aspects. Often it is possible to have severely reduced feature set with moderate technical risk to test the market risk and solve the hard technical problem later. Other times one has to solve the hard part first, e.g. flying car actually has to fly -- sort of flying while being stuck on the ground doesn't cut it.

Combining AI competence with use case knowledge - avoiding the collection of smiling clueless smart people

Building AI products is really about solving the customer need and not about AI at all. Therefore it is absolutely mandatory to have use case knowledge in the team and have flawless communication between the technical and use case people. In practice this can achieved only if use case people are part of the implementation team. The practice has shown that it is easier to teach use case to technical people than technology to use case people but everybody should strive for somewhat overlapping knowledge to allow for shared vision of the mission.

Root cause of all failures is illusion of communication. I am sure everybody have been in meetings where nobody has real understanding what is going on but due to various reasons are unwilling to ask for clarifications. This is collection of smiling happy clueless people and sure sign that project is heading towards the failure. Only remedy is to bite the bullet and ask for clarifications (a.k.a truly stupid questions) until it is crystal clear that everybody is on the same page.  Might be painful and might not make friends but is absolutely mandatory to deliver AI product that truly creates value.

On skill and competence

 

Nature of skill and competence

General skill is overall competence on the topic. In software engineering this means design patterns, algorithms, good developing practices, testing and things of this sort.

Specific skill is know-how on specific technology or topic. In software engineering this means using frameworks/libraries and in some cases internals of frameworks or technology. Initial specific skill is highly valuable in short project and loses importance on long projects as people are able to pick up the used technologies in relatively fast pace anyhow.

 

Individual skillset

In AI project one needs to have some level of these skills
  • Software engineering
  • Data engineering
  • Data science/AI
  • Use case know-how
Each team member should be approximately evaluated on these skills and team skills should be compared to product being built. In some cases AI driven product can be built with relative weak data science know-how using ready made libraries. Yet, in also in these cases it is absolutely mandatory to investigate model performance thoroughly.

The self evident conslusion is that more competitive advantage the AI features provide bigger the AI know-how in the team needs to be.


Organizing the team

The two stereotypical ways to organize team are 
  • Everybody does everything
  • Each member has specialty

 

Everybody does everything

In ideal agile team any team member can take any task and push it to production. In practical teams this is rarely accomplished and often times not even desired since there is bound to be big productivity differences within the team on various task.

Yet, if team consists of senior people with overlapping competences it is possible and desired that for each task there are multiple people capable of completing the task. This also requires that team members are not keen to cherry pick the tasks that are most interesting and fill the technology hype bingo in the CV.

Upsides of this is that team doesn't get stuck if any team member loses productivity temporarily, is on vacation or just leaves the team. Based on my experience this is also very fun and enjoyable way to work since it is impossible not to learn new things constantly.

 

Each team member has specialty

This makes body shopping easy. Need a deep learning specialist on flower species classification? No worries, just call suitable consulting company and rent an expert for suitable euros per an hour and you are all set.

However, downsides are significant. The obvious problem is that lack of significant overlap between the team members causes big issues if a team member leaves the team or is not performing well enough. Another big problem is constant hand overs and changing team composition. Usually a given specialty is needed only in one part of the project and therefore people are joining and leaving the team constantly. Third problem is that there are only limited number of sweet roles in any project  -- in AI projects too there is more perspiration than inspiration and for team spirit it is best to share fun and boring parts mostly evenly.

 

Practical project management  -- AI R&D and software engineering

 

Dailys and other agile rituals

The project progresses like a clock, faster the clock speed sooner the project completes. In my experience having short (max 15min) daily each day keeps people focused in the task at hand and sort of forces the project to go forward each day. Not all days are easy but moving forward each day leads to success sooner or later. The trick to make the dailys work is to limit people who attend the daily --- only people working together should be the in daily.
 
Retrospectives are extremely valuable tool to improve team productivity. Demos not so much --- often times demos eat up quite a lot working time to prepare even if you the team explicitly agrees that demos should be very light weight. Demos are good but if you have to choose prioritize dailys and retrospectives over demos every time.

 

On communication

Nobody in history of software has failed due to over communication but every minute another project grinds to halt due to over meeting. Know the difference between meetings and communication and spend your time wisely.

Team must have shared understanding of the product vision and next few steps of the roadmap to the vision. The team cannot be successful if team members are not working towards the same goal with common understanding of hard requirements (eg. data cannot leave on-premise servers) and priorities.

Good communication is clear, short and to the point. AI hype and jargon is best left to sales people -- it does no good when you actually have to deliver.

 

Does the agile method work in data science?

By and large it doesn't. The agile method works when the primary concern is executing tasks and iterating the product. Problem with the data science is that you are not really working on tasks but solving the puzzle or crosswords. When solving the puzzle one needs to iterate between trying to figure out individual words and looking at the big picture.

The trick is to have two working modes. At the start when it is not clear how to solve the AI puzzle one should strive for maximal flexibility and  freedom --- do what you did during your PhD years but add the team work to the mix. Drink too much coffee at cafeteria, try things out on the white boards, walk around the building or go skydiving if it helps you to clear your head. When the path to solution is more or less clear and the biggest technical risk are manageable move towards agile project work with well defined tasks and clear goals.

Toxic behaviour that must not be tolerated

 

Prima donnas

Sometimes individual in the team develops prima donna behaviour, that is, individual picks only nice task and therefore forces other team members to shovel the manure. Upside is that prima donnas are often relatively competent in their area of interest so at least prima donna is productive on the task he is willing to work on. However, overall this is really bad for the team spirit.

There are two ways handle prima donnas:

  • The preferred solution is not to hire prima donnas and give each team member boring and unpleasant task already in the trial period at the start of the employment. Most people will pick up the hint and choose the team player mode.
  • Organize the whole team around the prima donna.  In extreme case the prima donna is absolutely critical to project and you just have to tolerate the behavior. However, this is major motivation killer for other team members. In practice the easiest way out is to hire experienced consultants with mercenary attitude. They know how to manage the prima donna, are in the project for short term anyhow,  primarily in it for the money and therefore don't really mind shoveling the manure. This is usually expensive and not optimal in any way so avoid prima donna syndrome at all costs.

 

Dumping manure downstream - Jira ticket syndrome

This is really bad one. Even in the most agile teams there are bound to be hand overs where one team member does the first part of the task and other team member does the latter part. If the first team member is only interested in his individual productivity by ignoring project perspective then often times the person doing the latter part is forced to redo major parts of the earlier part too. 

This happens when closing the Jira tickets is seemingly more important than actually creating real value. Only remedy is to constantly remind that no real value is created until end-to-end solution is provided -- when running marathon one needs to run all the kilometers, not just the first one in record time.

 

Eternal lack of faith

Sometimes everybody looses the faith that project works out. If you never loose faith you are not solving hard problem so occasional whining and complaining is absolutely acceptable and much preferred to cult like positivity where everything is 'amazing challenge' and problems are essentially denied. However, you need to make sure that nobody is loosing the faith for long and that team members are in vast majority of time upbeat.

Progress is the best motivator so each team member should have constant mixture of relative surely successful task and the hard ones. This way the victories carry the torch when one is getting stuck in the tar pit.

 

Not done my way

This is similar to prima donna syndrome in a way that this usually happens on senior team members or start up founders but there are none the positive aspects of the prima donnas. 

Sometimes some team members require that everything is done the way they would do it. Team absolutely should have standards on code quality, some ways of working and things like that. Keeping up these standards is positive behavior but micro managing implementation details is not. Usually there are several more or less equally good ways to solve task and it is absolutely unreasonable to require that everything is done just the way you would do it.

Only remedy is to wake up and the smell the coffee. If you need to have a team you also need to embrace the different ways to solve the tasks. If you cannot get your head around this then you should just quit because the project is doomed to failure anyways. Being a jerk is a very expensive hobby.

On technology

AI components of AI driven software product. Note that in most cases machine learning model update cycle is independent of other software update cycle but in some use cases machine learning model cannot be decoupled from rest of the software.

 

Keep collecting new data

The first version of the AI is practically always trained on static data set. Yet as machine model runs in the wild it encounters new situations that should be added to training set. World also has tendency to change so existing training data may not anymore be representative and model is no longer well fitted to new circumstances. Periodic model retraining becomes necessary and data collection becomes mandatory.
 
Build the data collection in to the product and do not try to add it as feature later on.
 

Isolate machine learning components

In all AI product vast majority of the code base has nothing to do with machine learning as such. For sure there are a lot of data related code all over the code base but usually actual machine learning is relatively well contained functionality.

In practice machine learning model changes independently of other software changes. In model runner concept machine learning model is packaged inside interface and model runner serves the machine learning model to other parts of the software. This allows for easy A/B testing of models and separating machine learning from other code base.

Usually software updating cycle and model updating cycle diverge. Software is updated to add new features and to fix bugs whereas model retrained to better fit on the always changing world. In model runner pattern model runner component is updated as part of the software update cycle whereas models are updated as new data becomes available.

AI that actually works is very use case specific machine learning so it is easy put behind API or inside software component. 


Technology selection

Each used technology should be evaluated on following criteria
  • Fit for purpose. If technology can't do the job then it is useless.
  • Will the technology be supported and developed in future. If it is commercial product is likely that company stays afloat and if it is open source product will developer community exists also in future?
  • Is technology ready for production. Being a pioneer is tough so check if the technology used in production elsewhere?
  • Ease of finding developers. If you can't hire developers today and tomorrow then technology should not be chosen.
  • Could technology choice be a pull factor when looking for team member? This might work sometimes but is also very risky tactic since often today's hot technology is tomorrow's niche legacy with small developer community.
Today most products are built on cloud services. Often there is temptation to aim for cloud independence so that cloud providers could be competed against each other . In practice cloud independence means dockerizing most functionality and therefore few of cloud computing goodies -- serverless and managed services -- can be used. In most cases cloud independence is very expensive proposition and should not be considered unless project has very deep pockets.


The dreaded deadlines

Developers are notoriously reluctant to give firm deadlines whereas business people seem to get fixated on the deadlines. Both sides of the argument are valid because you can't have business case without cost estimates (=deadlines) but deadlines in software projects are somewhat of a joke. It is just really hard to estimate amount of work needed in software engineering.

In the worst case work estimates are just fairy tales people pretend to believe in but in the best case work estimates become self fulfilling prophecy. Yet once more the way out is efficient and clear communication on between the implementation team and the ones paying for the product development.

 

Wrap-up

AI driven product project is very similar to any other software product project. The biggest differences are
  • Stronger emphasis on data
  • Machine learning engine of some sort
  • Higher technical risk
In most software products (or products with large software component) the biggest risk is always the market risk. Therefore most AI products are not inherently riskier than other products if AI specific matters are managed properly. The notable exception is moonshot projects but in these the risk is balanced by vastly larger market potential.