Wednesday, February 21, 2018

Modeling vs Programming, Python vs GAMS

In [1] a question is posed about formulating an LP model dealing with a shortest path problem. The suggested solution is programmed in Python. This is a good example where coding even in a high-level programming language is less compact than a formulation in a modeling language. Lots of code is needed to shoe-horn the model and data into an LP data structure. We see that a standard matrix form LP:

\[\begin{align} \min\>&c^Tx\\ &Ax=b\\&x\ge 0\end{align}\]

as data structure is not always a good abstraction level. In the GAMS model below, we actually use a model that is closer to the problem at hand:

\[\begin{align} \min\> & z = \sum_{(i,j) \in A} d_{i,j} x_{i,j} \\& \sum_{(j,i) \in A} x_{j,i} + b_i = \sum_{(i,j) \in A} x_{i,j} & \forall i\\& x_{i,j}\ge 0 \end{align}\]

Here \(A\) is the set of (directed) arcs. I.e. we lower the abstraction level. This looks like a bad idea, but it makes the "step" we need to make from problem to code much smaller. Quantitatively, we need between 2 and 3 times the number of lines of code in Python compared to the corresponding GAMS model. Python is well-known for its compactness: no need for declarations and a rich library of built-in functions for data manipulation lead to fewer lines of code than needed in many other languages. The GAMS model devotes quite a few lines to declarations. This requires some extra typing but it will provide type and domain checks that can help for larger, more complex models. In the end we need much more code in the Python model than in the GAMS model.  Besides just looking at compactness, I also think the Python version is much more obfuscated: it is difficult to even recognize this as a shortest path network LP model. The GAMS model is very close to the mathematical model. Readability is probably the most important feature when dealing with implementing maintainable, real optimization models. In the Python code this got lost. Note that this shortest path LP is a very simple model. The above observations are even more relevant in larger, more complex models that we see in practice. Finally, of course there are many Python modeling frameworks that provide better tools to create LP models.

I often do not understand the attractiveness of solving LP models using matrix oriented APIs. These type of APIs are found in Python and are prevalent in R and Matlab [2]. Only for some very structured models this interface is really easy to use. The argument between modeling systems vs matrix generators actually goes back a long time [3].

Python code

Code is taken from [1].

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import numpy as np
from scipy.optimize import linprog

""" DATA"""
edges = [('A', 'B', 8),
         ('A', 'F', 10),
         ('B', 'C', 4),
         ('B', 'E', 10),
         ('C', 'D', 3),
         ('D', 'E', 25),
         ('D', 'F', 18),
         ('E', 'D', 9),
         ('E', 'G', 7),
         ('F', 'A', 5),
         ('F', 'B', 7),
         ('F', 'C', 3),
         ('F', 'E', 2),
         ('G', 'D', 2),
         ('G', 'H', 3),
         ('H', 'A', 4),
         ('H', 'B', 9)]
s, t = 'G', 'C'

""" Preprocessing """
nodes = sorted(set([i[0] for i in edges]))  # assumption: each node has an outedge
n_nodes = len(nodes)
n_edges = len(edges)

edge_matrix = np.zeros((len(nodes), len(nodes)), dtype=int)
for edge in edges:
    i, j, v = edge
    i_ind = nodes.index(i)  # slow
    j_ind = nodes.index(j)  # """
    edge_matrix[i_ind, j_ind] = v

nnz_edges = np.nonzero(edge_matrix)
edge_dict = {}
counter = 0
for e in range(n_edges):
    a, b = nnz_edges[0][e], nnz_edges[1][e]
    edge_dict[(a,b)] = counter
    counter += 1

s_ind = nodes.index(s)
t_ind = nodes.index(t)

""" LP """
bounds = [(0, 1) for i in range(n_edges)]
c = [i[2] for i in edges]

A_rows = []
b_rows = []
for source in range(n_nodes):
    out_inds = np.flatnonzero(edge_matrix[source, :])
    in_inds = np.flatnonzero(edge_matrix[:, source])

    rhs = 0
    if source == s_ind:
        rhs = 1
    elif source == t_ind:
        rhs = -1

    n_out = len(out_inds)
    n_in = len(in_inds)

    out_edges = [edge_dict[a, b] for a, b in np.vstack((np.full(n_out, source), out_inds)).T]
    in_edges = [edge_dict[a, b] for a, b in np.vstack((in_inds, np.full(n_in, source))).T]

    A_row = np.zeros(n_edges)
    A_row[out_edges] = 1
    A_row[in_edges] = -1

    A_rows.append(A_row)
    b_rows.append(rhs)

A = np.vstack(A_rows)
b = np.array(b_rows)
res = linprog(c, A_eq=A, b_eq=b, bounds=bounds)
print(res)

The reason this model is less than obvious is that we need to map an irregular collection of vertices \((i,j)\) to a one-dimensional decision variable \(x_k\). If the graph is dense we can easily do a simple calculation: \(k := i + j*n\). But in this case our graph is sparse so the mapping \((i,j) \rightarrow k\) is not straightforward. In addition we need to be able to find all arcs going into \(i\) or leaving \(i\). Some well-chosen dicts can help here.

GAMS Model


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
set i 'nodes' /A*H/;
alias(i,j);
parameters
  d(i,j) 'distances' /
    A.(B 8, F 10)
    B.(C 4, E 10)
    C.D 3
    D.(E 25, F 18)
    E.(D 9, G 7)
    F.(A 5, B 7, C 3, E 2)
    G.(D 2, H 3)
    H.(A 4, B 9) /
  inflow(i) 'rhs' /
    G 1
    C -1
   /;

set a(i,j) 'arcs exist if we have a distance';
a(i,j) = d(i,j);

positive variables x(i,j) 'flow';
variable z 'obj';

equations
   obj   'objective'
   nodebal(i) 'node balance'
;

obj.. z =e= sum(a,d(a)*x(a));
nodebal(i)..  sum(a(j,i), x(j,i)) + inflow(i) =e= sum(a(i,j), x(i,j));

model m /all/;
solve m minimizing z using lp;
display x.l;

The operations that were so problematic in Python, are actually straightforward in GAMS. Just write the equations as stated in the mathematical formulation. The set a implements the set of arcs. This allows us to have quite a natural expression of the flow conservation constraints.

The GAMS model is actually well suited to solve large sparse instances. To test this we can generate random data:

set i 'nodes' /node1*node1000/;
alias(i,j);
parameters
  d(i,j) 
'distances'
  inflow(i) 
'rhs' /
    
node1 1
    
node1000 -1
   
/;
d(i,j)$(uniform(0,1)  < 0.05) = uniform(1,100);


This will give us a network with 1,000 nodes and about 50,000 arcs. We don't need to change any of the data representation as all parameters and (multi-dimensional) sets are automatically stored using sparse data structures. This model with 1,000 equations and 50k variables takes 0.3 seconds to generate and 0.3 seconds to solve (it is a very easy LP problem). Note that the solver called in the above Python code is not able to handle a model of this size. The Python code takes 5.5 seconds to generate the LP problem; then we are stuck in the LP solver for hours. Dense LP solvers are really only for very small problems.


References


  1. Using SciPy's minimize to find the shortest path in a graph, https://stackoverflow.com/questions/48898447/using-scipys-minimize-to-find-the-shortest-path-in-a-graph
  2. Matlab vs GAMS: linear programming, http://yetanothermathprogrammingconsultant.blogspot.com/2016/09/matlab-vs-gams-linear-programming.html
  3. R. Fourer, Modeling Languages versus Matrix Generators for Linear Programming. ACM Transactions on Mathematical Software 9 (1983) 143-183.

Wednesday, February 14, 2018

Van der Waerden Puzzle

In [1] a puzzle is mentioned:

Find a binary sequence \(x_1, \dots , x_8\) that has no three equally spaced \(0\)s and no three equally spaced \(1\)s

I.e. we want to forbid patterns like \(111\),\(1x1x1\), \(1xx1xx1\) and \(000\),\(0x0x0\), \(0xx0xx0\). It is solved as a SAT problem in [1], but we can also write this down as a MIP. Instead of just 0s and 1s we make it a bit more general: any color from a set \(c\) can be used (this allows us to use more colors than just two and we can make nice pictures of the results). A MIP formulation can look like the two-liner:

\[\begin{align}& x_{i,c} + x_{i+k,c} + x_{i+2k,c} \le 2 && \forall i,k,c  \text{ such that } i+2k \le n\\ & \sum_c x_{i,c}=1 && \forall i\end{align} \]

where \(x_{i,c} \in \{0,1\}\) indicating whether color \(c\) is selected for position \(i\). For \(n=8\) and two colors we find six solutions:


For \(n=9\) we can not find any solution. This is sometimes denoted as the Van der Waerden number \(W(3,3)=9\).

For small problems we can enumerate these by adding cuts and resolving. For slightly larger problems the solution pool technique in solvers like Cplex and Gurobi can help.

For three colors \(W(3,3,3)=27\) i.e. for \(n=27\) we cannot find a solution without three equally spaced colors. For \(n=26\) we can enumerate all 48 solutions:


Only making this problem a little bit bigger is bringing our model to a screeching halt. E.g. \(W(3,3,3,3)=76\). I could not find solutions for \(n=75\) within 1,000 seconds, let alone enumerating them all.

MIP vs CP formulation


In the MIP formulation we used a binary representation of the decision variables \(x_{i,c}\). If we would use a Constraint Programming solver (or an SMT Solver), we can can use integer variables \(x_i\) directly.

Binary MIP formulationInteger CP formulation
\[\large{\begin{align}\min\>&0\\& x_{i,c} + x_{i+k,c} + x_{i+2k,c} \le 2 &&  \forall  i+2k \le n\>\forall c \\ & \sum_c x_{i,c}=1 && \forall i\\ & x_{i,c} \in \{0,1\} \end{align}} \] \[ \large{\begin{align} &x_{i} \ne x_{i+k} \vee x_{i+k} \ne x_{i+2k} && \forall  i+2k \le n\\ & x_i \in \{1,\dots,nc\} \end{align} }\]


Python/Z3 version


This is to illustrate the integer CP formulation. Indexing in Python starts at 0 so we have to slightly adapt the model for this:

from z3 import *

n = 26
nc = 3

# integer variables X[i]
X = [ Int('x%s' % i ) for i in range(n) ]

# lower bounds X[i] >= 1
Xlo = And([X[i] >= 1 for i in range(n)])
# upper bounds X[i] <= nc
Xup = And([X[i] <= nc for i in range(n)])
# forbid equally spaced colors 
Xforbid = And( [ Or(X[i] != X[i+k+1], X[i+k+1] != X[i+2*(k+1)] ) for i in range(n) \
               for k in range(n) if i+2*(k+1) < n])
# combine all constraints
Cons = [Xlo,Xup,Xforbid]
       
# find all solutions    
s = Solver()
s.add(Cons)
k = 0
res = []
while s.check() == sat:
    m = s.model()
    k = k + 1
    sol = [ m.evaluate(X[i]) for i in range(n)]
    res = res + [(k,sol)]
    forbid = Or([X[i] != sol[i] for i in range(n) ])
    s.add(forbid)     
print(res)

Here we use andor and != to model our Xforbid constraint. We could have used the same binary variable approach as done in the integer programming model. I tested that also: it turned out to be slower than using integers directly.

Performance


Here are some timings for generating these 48 solutions for the above example:

Model/Solver Statistics Solution Time
Binary formulation with Cplex (solution pool) 494 rows
78 binary variables
51092 simplex iterations
3721 nodes
1.6 seconds
Integer formulation with Z3 26 integer variables
693 boolean variables (first model)
5158 clauses (first model)
871 boolean variables (last model)
13990 clauses (last model)
5.5 seconds
Binary formulation with Z3 78 binary variables
860 boolean variables (first model)
2974 clauses (first model)
1015 boolean variables (last model)
6703 clauses (last model)
137 seconds


The binary MIP formulation is not doing poorly at all when solved with a state-of-the-art MIP solver. Z3 really does not like the same binary formulation: we need to use the integer formulation instead to get good performance. Different formulations of the same problem can make a substantial difference in performance.


Van der Waerden (1903-1996)


References


  1. Donald E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 6, Satisfiability, 2015
  2. Van der Waerden Number, https://en.wikipedia.org/wiki/Van_der_Waerden_number


Saturday, February 10, 2018

Singular system of nonlinear equations

For a project I am faced with solving large, sparse systems of nonlinear equations\[F(x)=0\] In some cases I get a a singular system: the Jacobian at the feasible point is not invertable. These models are also solved using a simple iterative scheme, which tolerates such a condition. How would standard solvers handle this?

I generated a small test problem as follows:

where \(F(x)=0\) has no issues, but \(G(y)=0\) is singular. I simply used:\[\begin{align}&y_1+y_2=1\\&y_1+y_2=1\end{align}\] Let's see what feedback we get:

Solver Modeltype Model Status Feedback/Notes
Conopt CNS Locally Infeasible
** Error in Square System: Pivot too small.

**** ERRORS/WARNINGS IN EQUATION g(k1)
     1 error(s): Pivot too small.

**** ERRORS/WARNINGS IN VARIABLE y(k1)
     1 error(s): Pivot too small.
Ipopt CNS Solved Feasible solution
Knitro CNS Solved Feasible solution
Miles MCP Intermediate Infeasible
Failure to converge
Minos CNS Unbounded
EXIT - The problem is unbounded (or badly scaled).
Minos CNS+basis Solved Feasible solution
Path CNS Solved Feasible solution
Path MCP Optimal Feasible solution
Snopt CNS Solved Feasible solution

Some solvers will report a feasible solution (without a further hint). Conopt does not, but gives a good indication where things go wrong. We are not given the set of dependent equations, but Conopt at least points us clearly to one culprit.

Not related to my problem, but another question is: what if singular but no solution exists? E.g. use: \[\begin{align}&y_1+y_2=1\\&y_1+y_2=2\end{align}\] Conopt gives the same results as above. The other solvers will typically report "infeasible". In most cases a solution corresponding to a phase I "min sum of infeasibilities" objective is provided where \(x\) is feasible and \(y\) is infeasible. Some solvers just give back a solution with many infeasibilities.

Monday, February 5, 2018

Fun with a quadratic model


In [1] a type of assignment problem is described:

I'd like to match n ~= 100 people to m ~= 5 teams, subject to a couple of constraints. For each of the n people, I have a vector of k numbers that correspond to various skill ratings. Each person also has a preference rating for each of the 5 teams, as well as a list of people they'd like to work with and a list of people they would like not to work with. 
Given some criteria for a 'good' team (for example, some linear combination of skill levels of all its team members, everyone is happy with the team they're on (given their preferences), etc.) that can be used for evaluation, how should I design an algorithm to match the people to teams?
The foundation for an optimization model for this problem can be an assignment structure with:

\[x_{p,t} = \begin{cases} 1 & \text{ if person $p$ is assigned to team $t$}\\  0 & \text{ otherwise}\end{cases}\]

The assignment constraints can look like: \[\begin{align} &\sum_t x_{p,t} \le 1 & \forall p\\&\sum_p x_{p,t} = \mathit{demand}_{t}& \forall t \end{align} \] The problem can be viewed as a multi-criteria optimization problem with objectives: \[\begin{align} &\max\> f_{skill} = \sum_{p,s,t} \mathit{skill}_{p,s} x_{p,t} && \text{skill sets} \\ &\max\> f_{\mathit{preft}} = \sum_{p,t} \mathit{prefteam}_{p,t} x_{p,t} && \text{team preferences}\\&\max\>f_{\mathit{prefw}} = \sum_{p,p',t|p' >p} \mathit{prefworker}_{p,p'} x_{p,t} x_{p',t} && \text{worker preferences}\end{align}\] where \(s\) is the set of different skills. A value \( \mathit{prefworker}_{p,p'}\lt 0 \) indicates workers \(p\) and \(p'\) dislike each other. Note that in the quadratic expression we sum only over \(p' \gt p\) to avoid double counting. A typical way to handle competing objectives is to form a weighted sum: \[\max\> z = \sum_k w_k f_k\] So our first model M1 can look like: \[\bbox[lightcyan,10px,border:3px solid darkblue]{ \begin{align} \max\> &z = \sum_k w_k f_k\\ & f_{skill} = \sum_{p,s,t} \mathit{skill}_{p,s} x_{p,t}\\ & f_{\mathit{preft}} = \sum_{p,t} \mathit{prefteam}_{p,t} x_{p,t} \\ & f_{\mathit{prefw}} = \sum_{p,p',t|p' >p} \mathit{prefworker}_{p,p'} x_{p,t} x_{p',t} \\ &\sum_t x_{p,t} \le 1 \\ &\sum_p x_{p,t} = \mathit{demand}_{t}\\ &x_{p,t} \in \{0,1\} \end{align}}\]


  • M1: Not many solvers can handle this problem as stated: Cplex is the major exception.
  • M2: We can convince Gurobi and Xpress to solve this by changing the quadratic equality constraint into a \(\le\) constraint. (Interestingly Cplex has much more of a tough time with this version than with M1).
  • M3: This model is formed by substituting out the \(f\) variables. We end up with an MIQP model.
  • M4: As suggested by Bob Fourer below in the comments, we can simplify the quadratic term in the objective \[\sum_{p,p',t|p' >p} \mathit{prefworker}_{p,p'} x_{p,t} x_{p',t} \] by \[\begin{align} & \sum_{p,t} x_{p,t} y_{p,t}\\& y_{p,t} = \sum_{p' > p} \mathit{prefworker}_{p,p'} x_{p',t}\\& y_{p,t} \text{ free}   \end{align}\] Note that this is a MIQP model but with a simpler Q matrix than M3
  • M5: We can formulate this problem as a straight linear MIP by introducing binary variables \(y_{p,p',t}=x_{p,t} x_{p',t}\). This multiplication can be written as: \[\begin{align}&y_{p,p',t}\le x_{p,t}\\ &y_{p,p',t}\le x_{p',t}\\ &y_{p,p',t}\ge x_{p,t} + x_{p',t} -1\\ & f_{\mathit{prefw}} =  \sum_{p,p',t|p' >p} \mathit{prefworker}_{p,p'} y_{p,p',t}\\ & y_{p,p',t} \in \{0,1\} \end{align}\] This formulation removes any non-convexity from the problem.

In the table below I show how different solvers react on these different models. If you want your model to solve with as many solvers as possible, it is a good idea to reformulate this as a MIP model.

Model Solver Results
M1 (MIQCP)CplexOK
GurobiGurobi can only solve QCP with =L= or =G= quadratic rows. Could not load problem into Gurobi database (1)
MosekNonconvex model detected. Constraint 'objprefworker'(2) is a nonlinear equality. (*1290*)
XpressQuadratic equality constraint not supported: objprefworker. ERROR: Could not load problem into XPRESS optimizer.
M2 (MIQCP)CplexOK
GurobiOK
MosekConstraint 'objprefworker'(2) is not convex. Q should be positive semidefinite in a constraint with finite upper bound. (*1293*)
XpressOK with warning:
?900 Warning: The quadratic part of row 'R3' defines a nonconvex region. Please check your model or use Xpress-SLP. Could not solve final fixed LP. We report the unfixed solution.
M3 (MIQP)CplexOK
GurobiOK
MosekThe quadratic coefficient matrix in the objective is not negative semidefinite as expected for a maximization problem. (*1296*)
XpressOK
M4 (MIQP)CplexOK
GurobiOK
MosekThe quadratic coefficient matrix in the objective is not negative semidefinite as expected for a maximization problem. (*1296*)
Xpress?899 Warning: The quadratic objective is not convex
Please check model or use Xpress-SLP
Problem is integer infeasible
M5 (MIP)CplexOK
GurobiOK
MosekOK
XpressOK

Some of the messages are a bit over the top. If an LP and Q matrix  is called a 'database', what should we call a real database?

Often I make the argument that a MIP can perform better than a MIQP/MIQCP model. Here is another reason why such a reformulation may make sense: different solvers have different rules about what type of MIQCP/MIQP models they can solve. A formulation that works for one solver is no guarantee it will work for another solver. With a MIP formulation this is a little bit more predictable.

Performance


To show the unexpectedly large variation in performance of the models M1 through M5 we reproduce the Cplex results on a small randomly generated data set.

Model Rows Columns (discrete) Iterations Nodes Seconds Objective value
M1 109 504 (500) 3420 0 3.16 3178
M2 109 504 (500) > 10,000 (time interrupt)
M3 106 501 (500) 2977 0 4.13 3178
M4 606 1,001 (500) 25203 1504 6 3178
M5 74,359 25,254 (25,250) 19983 53 65.92 3178

Model M2 takes a very long time: I was unable to solve this to optimality in runs that took several hours. This is somewhat surprising: the difference between models M1 and M2 is just the sign of equation: \[f_{\mathit{prefw}}\left\{\begin{matrix}=\\ \le\end{matrix}\right\}\sum_{p,p',t|p' >p} \mathit{prefworker}_{p,p'} x_{p,t} x_{p',t}\] Some models are very sensitive to minor differences in the formulation.

References


  1. Matching people with skills and preferences to groups, https://stackoverflow.com/questions/48482550/matching-people-with-skills-and-preferences-to-groups

Friday, February 2, 2018

Solving Sparse Systems of Equations with an optimizer: a basis trick


Sparse Systems of Linear Equations


Solving a sparse system of linear equations\[Ax=b\] is a task an LP solver should be quite well-equipped to do. A typical Simplex method will contain a rather sophisticated sparse LU factorization routine. Formulating this as a standard LP with a dummy objective leads to somewhat disappointing performance (see the table below). Here we try a very sparse system of \(n=9,801\) variables and equations.

One reason a simplex method is slow here it that the initial basis is chosen rather poorly. We can help the solver with this. We know that the basis status of the final solution is as follows:

  • All rows are nonbasic
  • All columns are basic
This suggests we can construct an advanced basis that reflects this prediction, and tell the LP solver to use this. In GAMS it is very easy to setup such a basis. A non-zero marginal indicates the corresponding row or column is non-basic. We can just use an assignment statement:

   equation e(i,j);
   e.m(i,j) = 1;

The variables just have their default marginal value of zero. The results are not something to sneeze at:


Solver Model Iterations Seconds
cplex LP 4358 27
cplex LP+basis 0 6.5

When we pass on an advanced basis the solver needs no simplex iterations (i.e. basis changes): it just factorizes the basis matrix and it is done.

Sparse Systems of Nonlinear Equations


When dealing with a large system of nonlinear equations \[F(x)=0\] we can do a similar thing. It helps only for algorithms that are simplex based such as MINOS. Here we solve a standard benchmark problem due to Broyden:

Broyden's tridiagonal problem

This is a quadratic problem with \(n=10,000\) equations and variables. Solvers like Cplex or Gurobi are not able to solve it. The model type CNS stands for Constrained Nonlinear System: a square system of non-linear equations (with some possible bounds) without an objective. With MINOS we see:

Solver Model Iterations Seconds
minos CNS The problem is unbounded (or badly scaled)
minos CNS+basis 0 0.2

We see again that a good starting basis can help: in this case it makes the difference between being able to solve the problem or not. Using an advanced basis, the solver does not need to do any simplex-type iterations and just can use some Newton algorithm which will converges extremely fast.

References


  1. Erwin Kalvelagen, Solving Systems of Linear Equations with GAMS, http://amsterdamoptimization.com/pdf/lineq.pdf
  2. C. G. Broyden, A Class of Methods for Solving Nonlinear Simultaneous Equations,  Mathematics of Computation, Vol. 19, No. 92 (Oct., 1965), pp. 577-593

Monday, January 29, 2018

Linearizing nested absolute values

In [1] an interesting question is posted: How to model a nested absolute value: \[ \min \left| \left( |x_1-a_1| \right) - \left( |x_2-a_2|\right) \right| \] A standard way to model absolute values in the objective function in a linear minimization model \[\min |x|\] is: \[\begin{align}\min\>&y\\&-y \le x \le y\end{align}\] It would be tempting to apply this here: \[\begin{align}\min\>&z\\&-z\le y_1-y_2 \le z\\& -y_i \le x_i - a_i \le y_i \end{align}\] The easiest way to illustrate this formulation is wrong, is showing a counter example. For this I added a few extra conditions to the model:\[\begin{align}\min\>&z\\&-z\le y_1-y_2 \le z\\& -y_i \le x_i - a_i \le y_i\\&x_1=x_2\\&x_1,x_2 \in [2,5] \end{align}\] When using \(a_1 = -1\), \(a_2=2\), I see as solution:

----     37 PARAMETER a  

i1 -1.000,    i2  2.000


----     37 VARIABLE x.L  

i1 2.000,    i2 2.000


----     37 VARIABLE y.L  

i1 3.000,    i2 3.000


----     37 VARIABLE z.L                   =        0.000  

We can see the value  \(y_2=3\) is incorrect: \(|x_2-a_2|=|2-2|=0\). The underlying reason is of course: we are not minimizing the second term \(|y_2-a_2|\).

A correct formulation will need extra binary variables (or something related such as a SOS1 construct):\[\begin{align}\min\>&z\\&-z\le y_1-y_2 \le z\\& y_i \ge x_i - a_i \\& y_i \ge -(x_i - a_i)\\ & y_i \le x_i - a_i + M\delta_i\\& y_i \le -(x_i - a_i) + M(1-\delta_i)\\&\delta_i \in \{0,1\}  \end{align}\]

Background


The function \(z = \max(x,y) \) can be linearized as: \[\begin{align}&z \ge x\\&z \ge y\\ &z \le x + M\delta\\&z \le y + M(1-\delta)\\&\delta \in \{0,1\}\end{align}\] The function \(z=|x|\) can be interpreted as \(z=\max(x,-x)\).

Results


The example model now will show:


----     68 VARIABLE x.L  

i1 2.000,    i2 2.000


----     68 VARIABLE y.L  

i1 3.000


----     68 VARIABLE z.L                   =        3.000  

----     68 VARIABLE d.L  

                      ( ALL       0.000 )


I think we need binary variables for for both terms \(y_i\) although for this numerical example it seems that we could only do it for the second term \(y_2\), and handle \(y_1\) as before, I believe that is just luck. A correct formulation will use binary variables for both inner absolute values.

Linearizing absolute values is not always that easy.

References


Tuesday, January 23, 2018

Knuth and SAT

Presentation by Knuth about SAT problems and solvers (and also a little bit about Integer Programming). Not new but still interesting..


Section 7.2.2.2 is about SAT. Ran out of extra space for more sub-numbering:



Looks like Donald Knuth got a bit sidetracked by this subject (his program SAT13 seems to indicate he implemented at least 13 SAT solvers). He had the same thing when starting on the book TAOCP. He did not like any of the available typesetting software. So he implemented TeX. Then he did not like the fonts, so he implemented MetaFont. There is a similar story about about Bill Orchard-Hays: he needed to implement an LP solver on a different machine. In those times everything was coded in assembly. Well, he did not like the assembler, so he started with implementing a proper macro-assembler. Then he did not like the linker, so he implemented a new linker. After that it just took a few weeks to implement the LP solver. (I am not sure this story is 100% true, but it could well be).

Knuth mentions integer programming and Cplex when discussing pictures like:


References


  1. Donald E. Knuth, The Art of Computer Programming, Volume 4, Fascicle 6, Satisfiability, 2015
  2. William Orchard-Hays, Advanced Linear Programming Computing Techniques, 1968.

Thursday, January 18, 2018

CBC and semi-continuous variables

The Coin CBC MIP solver [5] does not seem to have a way to specify semi-continuous variables in either an LP file or an MPS file [1] (slightly edited):

Hi,

I am using version 2.9.9 of cbc in ubuntu 17.10 docker image.  My test.lp file has following content:

Maximize
 obj: x1 + 2 x2 + 3 x3 + x4
Subject To
 c1: - x1 + x2 + x3 + 10 x4 <= 20
 c2: x1 - 3 x2 + x3 <= 30
 c3: x2 - 3.5 x4 = 0
Bounds
  0 <= x1 <= 40
  2 <= x4 <= 3
General
 x4
Semis
 x1 x2 x3

When trying with semis section I get an error "terminate called after throwing an instance of 'CoinError?' Aborted"

On a Mac I get: libc++abi.dylib: terminating with uncaught exception of type CoinError? Abort trap: 6 

However if I comment out Semis it works fine. I was hoping that Semis are supported. Am I doing something wrong?

My command is : cbc -presolve on -import test.lp solve solu out.txt

On further analysis I found out when in cbc prompt I type "import test.lp" it fails and shows same error is I tried in MPS format as well there i got

Bad image at line 19 < SC BND1 YTWO 3 >

Contents of mps file are :

NAME          TESTPROB
ROWS
 N  COST
 L  LIM1
 G  LIM2
 E  MYEQN
COLUMNS
    XONE      COST                 1   LIM1                 1
    XONE      LIM2                 1
    YTWO      COST                 4   LIM1                 1
    YTWO      MYEQN               -1
    ZTHREE    COST                 9   LIM2                 1
    ZTHREE    MYEQN                1
RHS
    RHS1      LIM1                 5   LIM2                10
    RHS1      MYEQN                7
BOUNDS
 UP BND1      XONE                 4
 SC BND1      YTWO                 3
ENDATA

Some notes:

  • The LP file parser crashes hard. Preferably, a parser should never crash on erroneous input and always produce a (good) error message,
  • The MPS file reader does not crash, but the error message "bad image" seems a bit strange to me (what has this to do with an image? In prehistoric times an "image" was sometimes used as the equivalent of "file". "Bad file" is still somewhat of a bad message.).
  • CBC actually supports semi-continuous variables.
  • Semi-continuous behavior can be modeled with binary variables [3]: \[\begin{align} &\delta \cdot \ell \le x \le \delta \cdot u\\&\delta \in \{0,1\}\\&x \in [0,u]\\&\ell > 0\end{align}\]
  • If you have good upper bounds, this may actually solve faster (MIP solvers do like binary variables).
  • Semi-continuous variables with a zero lower bound do not make much sense.
  • The LP and MPS file in this post do not represent the same problem.
  • Why did nobody complain about this before?
The answer by John Forrest [2] is funny:

Semi continuous variables are in Cbc (as a simple version of lotsizing variables).

However there is no way of inputting them using mps or lp files.

Having invented semi-continuous variables (probably before you were born!), I feel some responsibility to make them easier to use in Cbc. I will get them working with mps files in trunk.

 John Forrest


References

Monday, January 15, 2018

Least squares as QP: convexity issues

This is a post about theory vs practice. Theory says: least squares problems are convex. In practice we can encounter cases where this is not the case. The background is a post [1] about a non-negative least squares problem being solved using Cplex's matlab interface (function cplexlsqnonneglin [2]). It fails. We will fix this by modeling: a somewhat non-intuitive reformulation will help us solving this problem.

Model 1


The model this function tries to solve is:
\[\begin{align}\min\>&||Cx-d||_2^2\\&x\ge 0\end{align}\]
Unfortunately we see:


Number of nonzeros in lower triangle of Q = 861
Using Approximate Minimum Degree ordering
Total time for automatic ordering = 0.00 sec. (0.02 ticks)
Summary statistics for factor of Q:
  Rows in Factor            = 42
  Integer space required    = 42
  Total non-zeros in factor = 903
  Total FP ops to factor    = 25585
CPLEX Error  5002: objective is not convex.
QP with an indefinite objective can be solved
to local optimality with optimality target 2,
or to global optimality with optimality target 3.
Presolve time = 0.03 sec. (0.06 ticks)
Barrier time = 0.03 sec. (0.06 ticks)

This looks like a numerical or tolerance issue: least-squares problems should be convex. More on this further down.

Model 2


I often propose a different model:
\[\begin{align}\min\>&||r||_2^2\\&r=Cx-d\\&x\ge 0, r\text{ free}\end{align}\]
This looks like a bad idea: we add variables \(r\) and extra constraints. However this model has a big advantage: the \(Q\) matrix is much more well-behaved. It is basically a (very well-scaled) diagonal matrix. Using this model we see:


Tried aggregator 1 time.
QP Presolve eliminated 1 rows and 1 columns.
Reduced QP has 401 rows, 443 columns, and 17201 nonzeros.
Reduced QP objective Q matrix has 401 nonzeros.
Presolve time = 0.02 sec. (1.21 ticks)
Parallel mode: using up to 8 threads for barrier.
Number of nonzeros in lower triangle of A*A' = 80200
Using Approximate Minimum Degree ordering
Total time for automatic ordering = 0.00 sec. (3.57 ticks)
Summary statistics for Cholesky factor:
  Threads                   = 8
  Rows in Factor            = 401
  Integer space required    = 401
  Total non-zeros in factor = 80601
  Total FP ops to factor    = 21574201
 Itn      Primal Obj        Dual Obj  Prim Inf Upper Inf  Dual Inf          
   0   3.3391791e-01  -3.3391791e-01  9.70e+03  0.00e+00  4.20e+04
   1   9.6533667e+02  -3.0509942e+03  1.21e-12  0.00e+00  1.71e-11
   2   6.4361775e+01  -3.6729243e+02  3.08e-13  0.00e+00  1.71e-11
   3   2.2399862e+01  -6.8231454e+01  1.14e-13  0.00e+00  3.75e-12
   4   6.8012056e+00  -2.0011575e+01  2.45e-13  0.00e+00  1.04e-12
   5   3.3548410e+00  -1.9547176e+00  1.18e-13  0.00e+00  3.55e-13
   6   1.9866256e+00   6.0981384e-01  5.55e-13  0.00e+00  1.86e-13
   7   1.4271894e+00   1.0119284e+00  2.82e-12  0.00e+00  1.15e-13
   8   1.1434804e+00   1.1081026e+00  6.93e-12  0.00e+00  1.09e-13
   9   1.1163905e+00   1.1149752e+00  5.89e-12  0.00e+00  1.14e-13
  10   1.1153877e+00   1.1153509e+00  2.52e-11  0.00e+00  9.71e-14
  11   1.1153611e+00   1.1153602e+00  2.10e-11  0.00e+00  8.69e-14
  12   1.1153604e+00   1.1153604e+00  1.10e-11  0.00e+00  8.96e-14
Barrier time = 0.17 sec. (38.31 ticks)

Total time on 8 threads = 0.17 sec. (38.31 ticks)
QP status(1): optimal
Cplex Time: 0.17sec (det. 38.31 ticks)

Optimal solution found.
Objective :           1.115360

This reformulation is not only useful for Cplex. Some other solvers show the same behavior: the first model is declared to be non-convex while the second model solves just fine. Some other solvers solve both models ok. These solvers add a very tiny value to the diagonal elements of the \(Q\) matrix to make it numerically positive semi-definite [3].



A little background in numerics


The \(Q\) matrix is formed by \(C^TC\). We can show that \(C^TC\) is always positive semi-definite. This follows from: \[x^T(C^TC)x=(Cx)^T(Cx)=y^Ty\ge 0\] However the inner product \(Q=C^TC\) can accumulate some small floating point errors, causing \(Q\) to be no longer positive semi-definite. When I calculate \(Q=C^TC\) and print its eigenvalues I see:


 [1]  6.651381e+03  4.436663e+02  3.117601e+02  2.344391e+02  1.377582e+02  4.842287e+01
 [7]  2.471843e+01  4.970353e+00  4.283370e+00  3.878515e+00  9.313360e-01  4.146773e-01
[13]  2.711321e-01  2.437365e-01  2.270969e-01  8.583981e-02  2.784610e-02  2.047456e-02
[19]  1.670033e-02  8.465492e-03  3.948085e-03  2.352994e-03  1.726044e-03  1.287873e-03
[25]  1.095779e-03  2.341448e-04  1.776231e-04  1.266256e-04  4.064795e-05  2.980187e-05
[31]  1.056841e-05  2.409130e-06  2.039620e-06  5.597181e-07  1.367130e-07  2.376693e-08
[37]  3.535393e-09  1.355988e-11  2.996032e-14 -1.468230e-13 -1.622616e-13 -2.217776e-13

The presence of negative eigenvalues indicate this calculated matrix \(Q\) is indeed not pos def. Another verification is to see if we can form a Cholesky factorization [4]. This will fail if the matrix is not positive definite:


This type of factorization on the \(Q\) matrix is typically used inside Quadratic Programming solvers. Finally, it is noted that it is also possible that \(Q\) is actually positive definite but both tests (eigenvalues and Cholesky decomposition) fail to see this due to floating point errors inside their computations. Of course, for all practical purposes that means the matrix is not numerically positive definite.

In Ordinary Least Squares (OLS) it is well known that solving the normal equations is not very numerically stable. I believe this is related (the same inner product is formed).


André-Louis Cholesky (1875-1918)

Performance


This reformulation can also help with performance. For an example see [5].

References


  1. https://stackoverflow.com/questions/48164626/linear-combination-of-curves-to-match-a-single-curve-with-integer-constraints
  2. Cplex, cplexlsqnonneglinhttps://www.ibm.com/support/knowledgecenter/SSSA5P_12.8.0/ilog.odms.cplex.help/refmatlabcplex/html/cplexlsqnonneglin-m.html
  3. Gurobi, PSDtolhttp://www.gurobi.com/documentation/7.5/refman/psdtol.html
  4. Cholesky decomposition, https://en.wikipedia.org/wiki/Cholesky_decomposition
  5. Constrained Least Squares solution times II, http://yetanothermathprogrammingconsultant.blogspot.com/2016/11/constrained-least-squares-solution.html

Saturday, January 13, 2018

Solving a facility location problem as an MIQCP

Facility location models [1] are one of the most studied areas in optimization. It is still interesting to build a model from scratch.

The question is [2]:

Given \(m\) customer locations, find the minimum number of warehouses \(n\) we need such that each customer is within \(\mathit{MaxDist}\) miles from a warehouse.

Model 1 : minimize number of warehouses


The idea I use here, is that we assign a customer to a warehouse and make sure the distance constraint is met. In the following let \(i\) be the customers, \(j\) be the (candidate) warehouses. We can define the binary variable:\[\mathit{assign}_{i,j} = \begin{cases}1 & \text{if customer $i$ is serviced by warehouse $j$}\\0 &\text{otherwise}\end{cases}\]

A high-level distance constraint is:

\[\begin{align} &\mathit{assign}_{i,j}=1 \Rightarrow \mathit{Distance}(i,j) \le \mathit{MaxDist}&&\forall i,j \\ &\sum_j \mathit{assign}_{i,j} =1 && \forall i\\&\mathit{assign}_{i,j} \in \{0,1\} \end{align}\]

The distance is the usual:

\[ \mathit{Distance}(i,j)  = \sqrt{ \left( \mathit{dloc}_{i,x} - \mathit{floc}_{j,x} \right)^2 + \left( \mathit{dloc}_{i,y} - \mathit{floc}_{j,y} \right)^2 }  \]

Here \(\mathit{dloc}, \mathit{floc}\) are the \((x,y)\) coordinates of the demand (customer) and facility (warehouse) locations. The \(\mathit{dloc}\) quantities are constants, opposed to \(\mathit{floc}\) which are decision variables: these are calculated by the solver.

This distance function is nonlinear. We can make it more suitable to be handled by quadratic solvers by reformulating the distance constraint as:

\[\left( \mathit{dloc}_{i,x} - \mathit{floc}_{j,x} \right)^2 + \left( \mathit{dloc}_{i,y} - \mathit{floc}_{j,y} \right)^2 \le \mathit{MaxDist}^2 + M (1-\mathit{assign}_{i,j})\]

where \(M\) is a large enough constant.

The next step is to introduce a binary variable: \[\mathit{isopen}_{j} = \begin{cases}1 & \text{if warehouse $j$ is open for business}\\0 &\text{otherwise}\end{cases}\] We need \(\mathit{isopen}_j=0 \Rightarrow \mathit{assign}_{i,j} =0\). This can be modeled as: \[\mathit{assign}_{i,j}\le \mathit{isopen}_j\] The objective is to minimize the number of open warehouses:\[\min n = \sum_j \mathit{isopen}_j\]

A complete model can look like:


The problem is a convex MIQCP: Mixed Integer Quadratically Constrained Problem. This model type is supported by solvers like Cplex and Gurobi. The last constraint order is designed to make sure the open facilities are the first in the list of candidate facilities. It is also a way to reduce symmetry in the model.

To test this, I generated 50 random points with \(x,y\) coordinates between 0 and 100. The maximum distance is \(\mathit{MaxDist}=40\). The optimal solution is:


----     60 VARIABLE n.L                   =        3.000  number of open facilties

----     60 VARIABLE isopen.L  use facility

facility1 1.000,    facility2 1.000,    facility3 1.000


----     63 PARAMETER locations  

                    x           y

facility1      43.722      25.495
facility2      31.305      59.002
facility3      67.786      51.790

It is noted that the only relevant result in the solution is \(n=3\) (the number of open facilities). The chosen location for the warehouses and the assignment is not of much value. We can see this from a picture of the solution, in particular the variables floc and assign:

Misinterpretation of the solution

Model 2: Optimal location of warehouses


We can solve a second model that given \(n=3\) warehouses obtains a solution that:

  • assigns customers to their closest warehouse
  • finds warehouse locations such that the distances are minimized 
If we minimize the sum of the distances:
\[\min \sum_{i,j} \left[ \mathit{assign}_{i,j}\cdot \sqrt{\sum_c  \left( \mathit{dloc}_{i,c} - \mathit{floc}_{j,c} \right)^2 }\right]\]
we achieve both goals in one swoop. Of course this is a nasty nonlinear function. We can make this a little bit more amenable to being solved with standard solvers:
\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min &\sum_{i,j} d_{i,j}\\ &d_{i,j} \ge  \sum_c  \left( \mathit{dloc}_{i,c} - \mathit{floc}_{j,c} \right)^2 - M (1-\mathit{assign}_{i,j})&&\forall i,j\\&\sum_j \mathit{assign}_{i,j} =1 && \forall i\\&\mathit{assign}_{i,j} \in \{0,1\}\\&0\le d_{i,j}\le\mathit{MaxDist}^2 \end{align}}\] Note that now \(j = 1,\dots,n=3\). This is again a convex MIQCP model.

To be a little bit more precise: with this model we minimize the sum of the squared distances which is slightly different. The objective that models the true distances would look like \[\min\sum_{i,j} \sqrt{d_{i,j}}\] This would make the model no longer quadratic, so we would need to use a general purpose MINLP solver (or use some piecewise linear formulation). With the linear objective \(\min \sum_{i,j} d_{i,j}\) we place a somewhat higher weight on large distances. This is a different situation from the first model. There we also used the trick to minimize squared distances, but in that case it did not change the meaning of the model: it gives the same optimal solution. In the case of this second model the summation of squared distances will give slightly different results from actually minimizing the sum of distances.

When we solve this model we see a different location for the warehouses:


----     78 PARAMETER locations  

                    x           y

facility1      24.869      18.511
facility2      72.548      46.855
facility3      24.521      74.617

The plot looks much better now:

Optimal Warehouse Locations with Minimized Distances


There are of course many other ways to attack this problem:

  • Instead of using two models we can combine them. We can form a multi-objective formulation \(\min w_1 n  + w_2 \sum_{i,j} d_{i,j}\)  Use a large weight \(w_1\) to put priority on the first objective.
  • Instead of minimizing the sum of the distances we can choose to minimize the largest distance.
  • Instead of Euclidean distances we can use a Manhattan distance metric. This will give us a linear MIP model.


References 


  1. Zvi Drezner, Horst W. Hamacher, eds., Facility Location, Applications and Theory, Springer, 2001,
  2. Facility Location - Algorithm to Minimize facilities serving customers with distance constraint, https://stackoverflow.com/questions/48213881/facility-location-algorithm-to-minimize-facilities-serving-customers-with-dist