maanantai 16. huhtikuuta 2018

Figuring out an obscure database

Challenge

In many data driven products one must look into arcane operational database to access the necessary training data or operational database must be turned to analytical database. Implementation details vary from project to project but practically always one is confronted with an arcane operational database with obsolete documentation at best and usually none at all.

At first this might seem like a daunting task. How is it possible to decrypt meaning of tens and tens of tables? But fear you not, here is how the trick is done.

Learning cycle

Start with asking why database exists because use case of application using the database is bound to show up in the database structure somehow. The best source for this information is whoever is working daily with the database.

Draw physical Entity-Relationship (ER) diagram of the database using database DDL scripts. If you have access to database you are almost guaranteed to be able to do this. Sometimes you can only access database dump without valid constraints but even then you can make educated guesses using table names, field names, data types and table structure.

Most operational databases use common database design patterns (star schemabridge table etc.) so learn how these work and mapping business concepts to database tables becomes much easier. Physical ER diagram will help tremendously on this part. Remember though that even if database was born with solid structure by the time you come around there can be any number of creative and not so solid features so flexible mind is mandatory.

The preceding steps offer high level understanding of the database but very little to understanding on how to compute anything from database.

In almost all cases application has some kind of UI, e.g. distribution of ice cream in ice cream manufacturing company has some business user UI. Based on the database design patterns, ER diagrams and a little bit of intuition gained from discussions with business users you can guess how to map primary key of some center table of the database to UI.  When you compare what you see in database to  what the application wants so show to the user you can create first valuable hypothesis on how database tables should be interpret.

Now you have gained a little bit of understanding on the data so use your new found knowledge to ask more relevant questions from business users, understand ER diagram better an so forth.  Essentially you start with very vague understanding of the database and iteratively refine your understanding until you can compute the necessary values using the database.

Tricks of the trade

  • Use mainly production data since TEST (and QA too..) can be just about anything. Without any access to PROD data probability of success is greatly diminished.
  • Change TEST data to confirm hypotheses, e.g. to change database state by making a new ice cream delivery order to check if database looks what it should after the new order.
  • Sometimes you want access end results of the application and compare them to database content. Quite often companies store the most relevant parts of the database information also in a different format, e.g. ice cream manufacturing company might store delivery records in pdf files or in paper records for accounting purposes since they must be able to present that data to regulatory agencies on request. 
  • Beware that TEST data and PROD data can have funny differences, e.g. IDs used to map ice creams to ice cream descriptions can be different in TEST and in PROD.
  • If you can access the application source code you should take a look. Be warned though, it can take a long time to understand relevant parts of complex legacy application but sometimes you have no choice.
  • Quite often some logic is coded in first character of a string or there is some other black magic involved. This creates sort of hidden logic in database that is not available in ER diagrams but can be spotted by inspecting the data.
  • Don't assume that database content should be interpret in same way on different time spans. If company's business has changed then the application has changed too and seemingly similar products may not behave the same. Similar wholesale ice cream package may not look the same each year in the database.
  • You are solving a puzzle so combine intuition with logic.

keskiviikko 5. huhtikuuta 2017

Anatomy of Data Science Project

In data science all the buzz and fuzz is about cool algorithms, exciting results and profits. However, to really gain value from data science it is necessary to understand mundane day to day work from project work angle too.  As adage goes all models are wrong but some are useful and same applies to mental frameworks to handle data science projects too.

This is my framework, what is yours?

Value Cycle


You cannot hope to get it right in one go. You can try but you will run out of time, out of budget, out of sponsors or out of ideas. The key is to deliver value continuously and only way to do that is to make modelling process incremental. Start with really simple model and use also exploratory data analysis to produce value to stakeholders.

Data will surprise you and iterative approach allows you to be surprised just a little bit each time instead being knocked out in one blow.

Key takeaways: Iterate fast and make sure every iteration ends in tangible result. Use iterations to learn ins ands outs of data.

Problem definition


The project starts, the team and all the stakeholders are excited. We are going to figure it out, we are going to make things real. Let's rock and roll!

Hold your horses for a while though. No matter how clear the objective seems and no matter how straightforward the project sounds this is the time to stop and ask the critical questions:
  • Who is end customer?
  • What is value creation logic for end customer?
  • What value does data science add here, can you solve problem on whiteboard instead?
  • Why now and what are hard constraints (ie. GDPR, deadlines)?
  • Is there a way to deploy the model?
  • What data sources are available?

Key takeaways: Understand what product owner needs (not only what product owner means), make sure you understand who is end-customer and what is value creation logic.

Data collection and wrangling  


Ditch digging phase, a lot of perspiration and little inspiration. Changes are that data from various sources is imported several times during the project and you should also make the whole project reproducible. The best way to achieve this is to automate data loading and wrangling as much as possible and write short README files when applicable

No need to do extensive documentation but write something as there are always plentitude of hidden assumptions in data. When debugging the models you are forced to check details of data loading and wrangling over and over again so don't fall for temptation to do data loading and cleaning manually.

Key takeaways: Automate, documentate (don't overdo)

Exploratory data analysis


Plot this and check that. Model performance depends on data and therefore you must know the data to choose a feasible model. Also, this is great phase to detect inconsistencies and missing values in data. If data is not representative you have good change to detect it here too.

Often you can find useful insights in this phase. Not guaranteed though and depends how well data is analyzed already. Value of an insight doesn't depend on how much work it required but creating an insight is hard if data set is already analyzed through and through.

Key takeaways: Intuition of data, insights

Optimization proxy selection


The great irony of data science (especially in machine learning) is that it is all about optimization but you almost never optimize the real thing.

Let's think about recommendation engine for a moment. What you really care about is providing interesting and surprising recommendations to customer. This is, sadly, an impossible objective for any machine learning algorithm and data scientist must find a credible and technologically feasible proxy (e.g. ratings of movies per customer) for end customer's true need.

Key takeaways: Make optimization proxy explicit.

Model development


This is part of data science everybody is talking about and all the books are about this so let's not go to technical details here.

Sometimes you use different tool chains in different iterations of value cycle (ie. R -> Python -> Scala -> ..) and therefore you should uncouple data preparation from modelling if possible. Of course there will be quite lot of coupling but try to keep it to minimum at least in early iterations and stabilize architecture/technology stack later as project matures. Remember that using tools that fit the problem is much easier than shoehorning problem to your favourite tool chain.

In first iteration you should build baseline model that can be deployed to production. Make it simple and make it clear. In later iterations you redefine the model or choose a more complex approach altogether. Only exception to this rule is when you are doing moonshot project (not clear if even doable) where there is no point even talking about production before the biggest technical risks are solved.

Faster you can train the model the better. You will run into many gotchas during the model development and faster you train the model faster you can sort them out. On other hand, many techniques to parallelize model training also add complexity and make training less transparent (e.g. running cluster vs. running on one huge AWS instance). It depends on case by case which way to go but make it a conscious decision.

Key takeaways:  Baseline model first, be tools agnostic, think your productivity

Out-of-sample test


Time see how predictive you predictive model really is and decide if you can use it in real life. Out-of-sample test means that you run your model on data that has not been used to train the model. In data science you generalize using training data and out-of-sample test tells how well you succeeded.

Key takeaways: Use out-of-sample test to do go-alive decision.

Deploying model to production


There are two ways to turn insight into tangible benefit. Indirect route is essentially consulting where you advise someone who has means to make a difference. Classical example of indirect route is optimizing marketing budget. Direct route means that you create (or work with someone who creates) automated system that provides insights (say, recommendations) to end customer.

Key take-aways: Figure out a way to turn model into real life benefits.

Real life test


The moment of truth, everything before this is just dress rehearsal. There is no guarantee whatsoever that customer cares for your optimization proxy. You worked hard, your hair turned gray and you started to look like an old and feeble man while developing the model but does the customer really care?

This is the first time your end-customer voices his opinion so to try get here as soon as possible. In fact, does customer even care for the product in which the model is embedded in?

Measure improvement against baseline instead a arbitrary target. The point is to improve performance over baseline or create new business opportunities instead of trying to hit an arbitrary target (which may be way too low or way too high).

Key takeaways: Get here fast, measure improvements relative to baseline model.

Data scientist survival guide


It is foolhardy to forget software development best practices. Not having extensive automated tests is going to backfire in data science just as surely as in any other software project. Static code analysis and linters are still great (and almost free) way to improve code quality and communication with stakeholders is just as important as in any other project.

There are plentitude of free packages available online but here you should use same caution as in software projects. Check the licence, check if package still updated (check commit history) and use your judgement to determine if package will be updated in future.

However, there is a more to the story. The data science projects are inherently risky. No matter how many algorithms you try (or invent) there is always a change that data just isn't representative (relevant to problem) or maybe there is no signal in data. Sooner or later you will run into this but almost always there are more degrees of freedom than you initially see. Maybe you can buy some data, collect more data or try some totally orthogonal approach? These allow you to turn looming failure to great, if a bit postponed, success. There are always more solutions than you think, trust me on this one.

Copy all the good parts from software engineering and add a strong dose of Sherlock Holmes' wits. A friend of mine once described data science as no-holds-barred approach to solving a crosswords puzzle. Everything is allowed except using test set in training phase.

Key takeaways: Be resourceful and organized.

Product owner survival guide


Data science projects have tendency to become quite complicated quite fast. The data sources come in all forms and sizes, algorithms are exotic and full of misleading acronyms. It is quite easy to get lost in the endless jargon and incomprehensible gibberish.

However, most of this is just implementation details and not that important to you actually. You as a product owner should really be concerned about only a few things:
  • Is the team solving the right question (problem definition phase)?
  • Is out-of-sample test done and are results encouraging (go-alive decision)?
  • Are results in production system measured and how big is improvement compared to baseline (real-life test)?
  • Are there deliverables from each iteration of value creating cycle (is there real progress)?

Key takeaways: Solve the right question, out-of-sample test, real-life test

Summary


Solve the right question, work iteratively, check your results against baseline and deploy to production as fast as possible.

Not all data science projects fit in this framework but all projects have least some elements from it. Actually, most projects have all the elements in some sense.

Data science is fun!

keskiviikko 15. kesäkuuta 2016

Value creation in data science

Creating value in data science is hard work and requires down to earth attitude. One must understand business model thoroughly and provide actionable insights or better yet provide end-to-end working system that captures value and serves it to the end customer. Note that end customer may be e.g. consumer, an employee or a company.

What is real value?


A genuinely good thing. Not great because you say it is great, not credible because of your reputation nor supported because it advances somebody's personal agenda but because it helps corporate customer to solve a problem than prevents business growth (in existing or new business) or allows consumer to enjoy life more. A real value makes measurable difference and improves lives. 

Kinds of insights


A good insight is profitable and actionable. Let's break this down to components and examine them separately.

Profitable insight is a plan or a non-trivial fact that provides value to customer if acted upon. You must also estimate size of the opportunity to decide if the opportunity of worth pursuing.

Actionable insight is doable in practice. That is, you can reach the end customer some way and provide to the end customer the value created by the insight.

Insight "People who buy carrots also buy jeans" is profitable because in principal it allows you to bundle these products non-trivially together but actionable only if there is a way to serve this bundle to end customer, e.g. through shelf placement in retail store or by mobile application.

Ways to create value out of an insight


An insight can be valuable either directly or indirectly. Advising on trends or segmenting the market tend to fall into indirect category whereas recommendation system or predictive maintenance are direct ways to create value out of an insight.

When creating value directly one implements a system that serves value directly to the end customer, e.g. by providing personalized, meaningful and non-obvious leisure time activity recommendations through Facebook.

When creating value indirectly one must provide advice to somebody else who has means to make end customer's life better, e.g.. providing advice to a car dealer on car accessories that make end customer's driving safer and more enjoyable. 

Direct way to provide value has benefit that it can usually be measured and iterated in rapid manner whereas indirect way to provide value tends to have much longer feedback loop.

How to discover profitable and actionable insights?


Start from the basics. Plot this, double check that, check for data consistency, talk to all stakeholders and make sure you got the basics right. You cannot hope to gain valuable insights if you don't understand the business both quantitatively and qualitatively. 

Max out business intelligence tools and existing infrastructure.  Many insights can be found by using business intelligence tools and business intelligence tools can also be used to double check and clarify insights found using more advanced tools like Spark.

Be tools agnostic. If the job requires R then use R and if job requires Spark then use Spark. As a rule it is easier to learn to use multiple tools compared to trying to squeeze the job to the form that your favourite tool chain supports. This is especially important if you change topics in a rapid manner and build on top of existing libraries and packages. Remember that the best tool varies case by case basis. 

Be practical and never forget that your job is to provide value. If problem can be solved using whiteboard and spreadsheet all the better because you just saved everybody's time and money. Then again, if advanced tools are needed then they must be used and it is up to you to overcome any technical hinderance that you might run into.

maanantai 28. joulukuuta 2015

Outlier detection using Spark

Introduction

Detecting outliers is very important ie. in fraud detection, in medicine and in intruder detection systems. One important application is to determine whether a particular test result should lead to medical care or extensive clinical testing. All of these applications require a scalable outlier detection algorithms and computational platform as datasets can be very big and include both numerical and categorical values.

It is hard to define what outlier is exactly but it seems to be any observation or item that is somehow distinctly different from other observations or items. There are several algorithms to detect outliers but let's here take a look in one of more straightforward algorithms. Attribute value frequency (AVF) algorithm scores observations based on frequencies of the their attribute values. Observations that receive the lowest scores are assumed to be outliers based on hyperparameter $k$ determining the number of outliers.

Spark is an easy to use distributed computing platform using Resilient Distributed Datasets (RDD) allowing efficient iterative algorithms. Spark supports Java, Python and Scala programming languages and AWS supports creating Spark clusters with a only a couple of mouse clicks. For more information on how spark achieves its efficiency please see ie. Spark Cluster Computing with Working Sets or Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing.

To develop the algorithm and test its scalability I wrote tools to generate arbitrarily sized test datasets. Tools and spark program for AVF can be found here. Using a generated arbitary dataset is a good way to make sure that your algorithm works are intended.

Attribute Value Frequency algorithm

Attribute value frequency algorithm computes frequencies of the attribute values using

\begin{equation} AVF Score(x_{i}) = \frac{1}{m}\sum_{i=1}^{m}f(x_{il}), \end{equation} where $m$ is number of attribute values, $x_{il}$ is l-th attribute value of $x_{i}$ and $f(x_{il})$ is number of times l-th attribute value of $x_{i}$ appears in the dataset.

AVF algorithm pseudocode:
Input: dataset D
Output: k detected outliers

Label all data points as non-outliers
calculate frequency of each attribute value
foreach point x
  AVFscore = compute AVF score for item x(i)
end foreach
return to k outliers with minimum AFVscore


Using spark this can be expressed as

Running spark on AWS

An excellent tutorial to submitting spark jobs to AWS EMR using CLI can found here.

First one needs to create a Spark cluster and then add step to run the code. The code to be run needs to be put to the s3 and it is advisable to also write the results to s3.

In this case there is no need to install custom software to cluster. However, using public DNS name and ssh key (selected when cluster was created), you can you can login directly to master or slave nodes to install needed software like scikit-learn or opencv. Note that you must first allow ssh connections (port 22) from your IP addess to nodes. If you wish to allow connections from everywhere set IP address to '0.0.0.0/0'.

Conclusions

Spark allows straightforward implementations of certain classes of outlier detection algorithms. AWS provides easy-to-use environment to scale you computations to arbitrary scale dynamically. There are some hickups when setting up the cluster in AWS but mostly it is an enjoyable experience.

lauantai 5. syyskuuta 2015

Inverted pendulum - simulation and control






Check out the SIMULATION! Swipe the screen horizontally to move the pendulum.


Introduction 

Inverted pendulum is the classic problem in control theory and therefore well suited for an exercise combining mathematical modelling, simulation, control theory and web technologies for presentation purposes.

Just about all suitable controllers have been applied to inverted pendulum problem since it has become the challenge all new controllers must be able to conquer. Model Predictive Control (MPC) uses a mathematical model of the phenomenon to predict how the system is going to evolve and what control input should be applied to achieve a desired end state. MPC Controller computes an optimal control sequence and applies the first control input, then computes the optimal control sequence again and applies the first control input again and so forth. You are encouraged to take a look at the freely available excellent lecture notes.

In all projects key to success is rigorous scoping and prioritization. This project consist of making simulation model and linear model predictive controller in c++ and use Node.js addon to glue those to web frontend. Simulation is visualized by drawing inverted pendulum in 3d using three.js WebGL library.

Code is available at the bitbucket and a running example is available here. In the example you can whack the pendulum using a mouse (press left button down, move mouse horizontally and release the left button) or a sweep in a mobile device and see how the controller stabilizes the system.

Simulation model

The first step is to create a reasonably accurate simulation model of the inverted pendulum. In principal this could be done using standard Newtonian physics you learned at school, F=ma and so forth, but handling constraints is a bit tricky in Newtonian formalism. For an example inverted pendulum should not fall through the ground nor rise above the ground. Ensuring this in Newtonian formalism requires a careful consideration of all relevant forces that cancel out in the actual simulation keeping the inverted pendulum on the ground.

However, it is possible to handle constraints using Lagrangian mechanics that use generalized coordinates and forces. A little bit confusing sentence if your background is not in the physics or control theory but it will become very clear by the end of this article.

In Lagrangian mechanics you need to express kinetic and potential energy using generalized coordinates. In this case kinetic energy is
\begin{equation} K = \frac{1}{2}M\dot{x}^2 + \frac{1}{2}m\big[ (\dot{x} - L\dot{\varphi}\sin{\varphi})^2 + (\dot{y} + L\dot{\varphi}\cos{\varphi})^2 \big] \end{equation}
and potential energy is \begin{equation} V = mgL\sin{\varphi} \end{equation} Lagrangian is \begin{equation} \begin{split} L = &K - V \\ = & \frac{1}{2}(M+m)\dot{x}^2 + \frac{1}{2}mL^{2}\dot{\varphi}^{2} - L\sin{\varphi}\dot{x}\dot{\varphi} - mgL\sin{\varphi} \\ \end{split} \end{equation} since $\dot{y}=0$. Lagrangian can be used to derive differential equation of the system using the generalized equations of motion
\begin{equation} \label{equations_of_motion} \frac{d}{dt} \frac{\partial L}{\partial \dot{q_{i}}} - \frac{\partial L}{\partial q_{i}}= F_{i}, \end{equation} where $F_{i}$ is generalized force. Let's apply generalized equations of motion (\ref{equations_of_motion}) for $x$ coordinate
\begin{equation} \begin{split} \frac{\partial L}{\partial \dot{x}} = & \: (M+m)\dot{x} - mL\dot{\varphi}\sin{\varphi} \\ \frac{dL}{dx} = & \: 0\\ \frac{d}{dt}\bigg(\frac{\partial L}{\partial \dot{x}}\bigg) = & \: (M+m)\ddot{x} - mL\ddot{\varphi}\sin{\varphi} - mL\dot{\varphi}^{2}\cos{\varphi}\\ \end{split} \end{equation} and therefore \begin{equation} (m+M)\ddot{x} - mL\ddot{\varphi}\sin{\varphi} - mL\dot{\varphi}^{2}\cos{\varphi} = F_{lin}, \end{equation} where $F_{lin}$ is in this case just a combination control force ($F_{c}$) pushing cart left and right and linear friction defined as \begin{equation} F_{lin} = F_{c} - \mu_{x}'x, \end{equation} where $\mu_{x}'$ is linear friction coefficient including physical properties of the ground and the cart. Let's next consider coordinate $\varphi$ \begin{equation} \begin{split} \frac{\partial L}{\partial \dot{\varphi}} = & \: mL^{2}\dot{\varphi} - mL\dot{x}\sin{\varphi} \\ \frac{dL}{d\varphi} = & \: -mL\dot{x}\dot{\varphi}\cos{\varphi} - mgL\cos{\varphi}\\ \frac{d}{dt}\bigg(\frac{\partial L}{\partial \dot{\varphi}}\bigg) = & \: mL^{2}\ddot{\varphi} - mL\ddot{x}\sin{\varphi} -mL\dot{x}\dot{\varphi}\cos{\varphi}\\ \end{split} \end{equation} and therefore \begin{equation} L\ddot{\varphi}-\ddot{x}\sin{\varphi} + g\cos{\varphi} = \frac{F_{\varphi}}{mL} = F_{rot}, \end{equation} where $F_{rot}$ is scaled generalized force defined as friction \begin{equation} F_{rot} = F_{rotnoise} - \mu_{\varphi}'\dot{\varphi}, \end{equation} where $\mu_{\varphi}'$ is rotational friction coefficient including hinge's physical properties and $F_{rotnoise}$ is rotational disturbance which can be intentional, say user input, or noise i.e. due to wind.

Next we need to uncouple $\ddot{x}$ and $\ddot{\varphi}$ which is easily done by solving them from the systems of equations we just derived. After a little bit of arithmetic one gets \begin{equation} \frac{d}{dt} \left[ {\begin{array}{c} \dot{\varphi}\\ \varphi\\ \dot{x}\\ x \end{array} } \right] = \left[ {\begin{array}{c} \frac{F_{lin}\sin{\varphi}+mL\dot{\varphi}^{2}\cos{\varphi}\sin{\varphi} - (m+M)g\cos{\varphi} + (m+M)F_{rot}}{L(m+M-m\sin^{2}{\varphi})}\\ \dot{\varphi}\\ \frac{F_{lin} + F_{rot}m\sin{\varphi} -mg\sin{\varphi}\cos{\varphi}+mL\dot{\varphi}^{2}\cos{\varphi}}{m + M - m\sin^{2}{\varphi}}\\ \dot{x} \end{array} } \right] \label{statePropagationEq} \end{equation}
These equations can be solved trivially using Runge-Kutta method. The friction terms serve dual purpose. Firstly, they are physically justified. Secondly, they make the system to bleed out energy. Due to numerical inaccuracies and approximations Runge-Kutta solvers tend to gather some extra energy in system like this so bleeding out some energy helps to keep the system stable.

Model predictive control

Model predictive control comes in two variants. Linear model predictive control considers only linear systems whereas nonlinear model predictive control considers nonlinear systems. Linear model predictive control leads to convex optimization problem that can be solved easily whereas optimization problem in nonlinear model predictive control may be riddled with troubles such as local optima and is generally hard to solve. One of the great benefits of model predictive control is that you can explicitly limit the maximum and minimum values of the control input. That is, optimal sequence on control inputs is different based on physical limits of the actuators. In addition you can also set limits to allowed states (say, speed of the cart). 

Linear model predictive control problem tries to find sequence on control inputs (items of vector u) such that \begin{equation} \begin{split} J(k) = & \sum_{i=1}^{N}\big[ x^{T}(k+i|k)Qx(k+i|k) + u^{T}(k+i|k)Ru(k+i|k) \big]\\ &+ x^{T}(k+N|k)\bar{Q}x(k+N|k) \end{split} \label{Jk} \end{equation} is minimized. $N$ is the length of the prediction horizon and matrices $Q$,$R$ and $\bar{Q}$ can be used to give different terms and variables a proper weight in the optimization problem. These matrices offer some degrees of freedom to the control problem and can be used to tune the controller. In numerical computations matrices are preferred and an equivalent form of ($\ref{Jk}$) can be used \begin{equation} J(k) = \mathbf{u}^{T}(k)H\mathbf{u}(k) + 2x^{T}F^{T}\mathbf{u}(k) + x^{T}(k)Gx(k) \label{Jkmatrix} \end{equation} The first term is control input cost punishing excessive control inputs. The second term is cost due to system state changes by the control input and the last term is cost due to free propagation of the system as it doesn't depend on the control input. Matrices are \begin{equation} \begin{split} H = & C^{T}\tilde{Q}C + \tilde{R} \\ G = & M^{T}\tilde{Q}M + Q \\ F = & C^{T}\tilde{Q}M, \\ \end{split} \end{equation} where \begin{equation} M = \left[ {\begin{array}{c} A\\ A^{2} \\ \vdots \\ A^{N} \end{array} } \right] \end{equation} \begin{equation} C = \left[ {\begin{array}{cccc} B & 0 & \cdots & 0\\ AB & B & \cdots & 0\\ \vdots& \vdots & \ddots & \\ A^{N-1}B & A^{N-2}B & \cdots & B \end{array} } \right] \end{equation} \begin{equation} \tilde{Q} = \left[ {\begin{array}{cccc} Q & 0 & \cdots & 0\\ 0 & \ddots & & \vdots\\ \vdots& & Q & 0\\ 0 & \cdots & 0 & \bar{Q} \end{array} } \right] \end{equation} and \begin{equation} \tilde{R} = \left[ {\begin{array}{cccc} R & 0 & \cdots & 0\\ 0 & \ddots & & \vdots\\ \vdots& & R & 0\\ 0 & \cdots & 0 & R \end{array} } \right] \end{equation} Many solvers require derivative of cost function with respect to optimized parameter. Taking derivative of ($\ref{Jkmatrix}$) with respect to $x$ leads to \begin{equation} \frac{dJ(k)}{du} = 2\mathbf{u}^{T}(k)H + 2x^{T}F^{T} \end{equation} For details of derivation see.

There are two glaring issues here. Firstly, differential equations of inverted pendulum are nonlinear while ($\ref{Jkmatrix}$) applies only to linear systems. Secondly, MPC control problem is presented in terms of difference equations instead of differential equations. Let’s tackle these issues one by one. Inverted pendulum can be considered linear as long as it is almost in the upright position. Applying linearization formula

\begin{equation} \bar{f}(\bar{x}) = \bar{f}(\bar{x_{0}}) + \nabla \bar{f}|_{x_{0}}(\bar{x}-\bar{x}_{0}) \end{equation}

to ($\ref{statePropagationEq}$) at $ x_{0}=[0, \pi/2, 0, 0]^{T}$ leads to a system of linear differential equations
\begin{equation} \dot{x} = Ax(t) + Bu(t), \end{equation} where \begin{equation} A = \left[ {\begin{array}{cccc} -\frac{\mu_{\varphi}(m+M)}{LM} & \frac{(m+M)g}{LM} & -\frac{\mu_{x}}{LM} & 0\\ 1 & 0 & 0 & 0\\ -\frac{\mu_{\varphi}m}{M} & \frac{mg}{M} & -\frac{\mu_{x}}{M} & 0\\ 0 & 0 & 1 & 0 \end{array} } \right] \end{equation} and \begin{equation} B = \left[ {\begin{array}{c} \frac{1}{LM}\\ 0 \\ \frac{1}{M}\\ 0 \end{array} } \right] \end{equation} $x(t)$ is state vector (note that this is not x coordinate) and $u(t)$ is control input. In this case only available control input is to push cart left or right so $u(t)$ is just $F_{c}$. Also, it's assumed that $F_{rotnoise}$ is zero centered and it's ignored in the prediction model. Let’s now transform these to difference equations using formulas

\begin{equation} A_{discrete} = e^{At} \end{equation} and \begin{equation} B_{discrete} = \bigg( \int_{0}^{T}e^{At}dt \bigg)B \end{equation}

The first matrix can be evaluated as power series since
\begin{equation} e^{X} = \sum_{0}^{\infty}\frac{1}{k!}X^{k} \end{equation}
However the second expression is trickier since matrix A is singular in this case and you cannot use trivial integration formula similar to scalar case. Lets derive a power series form also for it 

\begin{equation} \begin{split} \int_{0}^{T}e^{At}dt = & \int_{0}^{T} \sum_{0}^{\infty}\frac{1}{k!}(At)^{k}dt \\ = & \sum_{0}^{\infty}\frac{1}{k!}A^{k}\int_{0}^{T}t^{k}dt \\ = & \sum_{0}^{\infty}\frac{1}{k!}A^{k}\bigg/_{0}^{T}\frac{1}{k+1}t^{k+1} \\ = & \sum_{0}^{\infty}\frac{1}{k!}A^{k}\frac{1}{k+1}T^{k+1} \\ = & T\sum_{0}^{\infty}\frac{1}{(k+1)!}\big(AT\big)^k \end{split} \end{equation}
Solving optimization problems is fascinating but quite involved topic so let’s scope implementation of constrained optimization problem solver out by using an existing library. dlib library offers a wide range of optimizers as well as classes for matrix arithmetics. 

Software stack 

In one sense software engineering is all about stacking libraries on top of each other to create a fully functional system. More one can use ready-made libraries and components more quickly one can finish the project. Hence the name software stack and importance of getting it right.


Browser frontend uses three.js for 3d visualization of inverted pendulum. Three.js is an easy-to-use abstraction of WebGL interface. Visualization of the simulation needs to be near real time so websocket is used for communication between the browser frontend and server backend that is implemented using NodeJS with Express.js . Actual simulation is implemented using standard c++ threads which don't share a lot of state. One thread runs actual simulation and keeps up the simulation state while the other thread runs the MPC controller.

As shown above MPC controller is essentially just a real time optimization problem solver whose input is the system state and the output is the control value needed to solve control problem. In actual controller one usually also has the estimation block but here it is assumed that controller is able to measure system state directly. Noise is added to controller's state measurement to make simulation more interesting.

Actual optimization problem is solved using a dlib library which is a templated multipurpose optimization and machine learning library implemented in c++.  Simulation is glued to frontend using Node.js addon that exposes V8 objects to the javascript.

Gluing dlib to to Node.js was fairly easy. By default Real Time Type Information RTTI and exceptions are turned off in Node.js. Both of these are needed by dlib but you can easily turn these on in the binding.gyp file.

Conclusions

Model predictive control allows a fine tuned and efficient control of complex system while taking into account also physical limitations of actuators. Web technologies allow you to present your work to in an interactive manner without forcing the reader to install a multitude of arcane libraries and navigating through the maze of issues due to slightly varying system environments and setups.

Control theory is cool by itself and in addition mathematical foundations of many branches of data science have a lot of in common with the tools used in model predictive control.