I will post for next couple of days on some interesting interdisciplinary topics around Computational statistics and Computational Finance. This will be the last week of posts from The Information Age for this year. Next year I hope to come back to the daily information technology, computer science, data science related fields’ series of posts with enhanced commitment and quality.

For today I decided to share a post that caught my eye once again in the Revolutions Blog. This Blog is Microsoft’s official blog about the R programming language and its applications in various settings, namely, big data analytics, data science, machine learning, statistical learning and computational statistics. This post is more about computational statistics and numerical optimization in particular, and makes for an excellent reading for the way it mixes introductory conceptual remarks about numerical optimization with a new R package for dealing with numerical optimization it presents developed by the researcher Dick Shumacher. The R package is available in the author’s GitHub profile page and it is named ompr, which stands for Optimization Modelling Package.

### Mixed Integer Programming in R with the ompr package

Numerical optimization is an important tool in the data scientist’s toolbox. Many classical statistical problems boil down to finding the highest (or lowest) point on a multi-dimensional surface: the base R function optim provides many techniques for solving such maximum likelihood problems.

Counterintuitively, numerical optimizations are easiest (though rarely actually easy) when all of the variables are continuous and can take any value. When integer variables enter the mix, optimization becomes much, much harder. This typically happens when the optimization is constrained by a limited selection of objects, for example packages in a weight-limited cargo shipment, or stocks in a portfolio constrained by sector weightings and transaction costs. For tasks like these, you often need an algorithm for a specialized type of optimization: Mixed Integer Programming.

For problems like these, Dirk Schumacher has created the ompr package for R. This package provides a convenient syntax for describing the variables and constraints in an optimization problem. For example, take the classic “knapsack” problem of maximizing the total value of objects in a container subject to its maximum weight limit. In mathematical form, the problem can be described as an objective (quantity to be maximized) and a constraint:

max n∑[i=1] vixi

subject to n∑[i=1] wixi ≤ W, xi ∈ {0,1},i=1,…,n

In ompr, the same problem is naturally expressed in R as follows:

model <- MIPModel() %>% add_variable(x[i], i = 1:n, type = "binary") %>% set_objective(sum_expr(v[i] * x[i], i = 1:n)) %>% add_constraint(sum_expr(w[i] * x[i], i = 1:n) <= W)

To actually solve the problem, you need to provide a “backend” solver algorithm to ompr. It’s designed to integrate with any solver, and currently works with the ROI (R Optimization Infrastructure) package. ROI in turn provides a number of solver algorithms including GLPK, the GNU Linear Programming Kit, which you can use to solve problems like this.

Dirk provides a number of worked examples of the ompr package in use. Examples include the Traveling Salesman problem, solving Sodoku puzzles, and selecting optimal warehouse locations to minimize delivery costs to customers.

*Post Script (PS): After knowing that Dirk Shumacher is a Berliner, I would like to join the voices of condemnation to the recent tragical attack in the German capital city, which is one of the most important cities in the World and the capital of the most important European country in our time.*

*featured image: *