On Dual Norms

In this post, we will talk about dual norms: a handy tool in optimization.

Consider the following optimization problem:

\begin{equation} \min_{\mathbf{a}} ||\mathbf{a}||_p \;\;\text{s.t.}\; \mathbf{a}^T\mathbf{b}=c \end{equation}

where $\mathbf{a},\mathbf{b} \in \mathbb{R}^n$

How do we find $\mathbf{a}^*$? Well, we have two cases.

Case 1. $p=0$, this is straightforward and there exist $n$ optimal solutions, if $c\neq 0$. One optimal solution is of the form:

$\mathbf{a}^*=(c/b_1,0, \ldots, 0)$

Case 2. $p>0$, here we make use of the concept of dual norms. Formally, the dual norm $||\cdot||_p$ of the norm $||\cdot||_q$ is defined as

$||\mathbf{a}||_p=\sup\{\mathbf{a}^T\mathbf{b}| ||\mathbf{b}||_q \leq 1 \}$

In other words, for any arbitray $\mathbf{b}$, we have

$\mathbf{a}^T\mathbf{b}\leq |\mathbf{a}^T\mathbf{b}| \leq ||\mathbf{a}||_p ||\mathbf{b}||_q$

where $p$ and $q$ are such that $\frac{1}{p}+\frac{1}{q}=1$ or alternatively $p= \frac{q}{q-1}$. This follows from the Young’s inequality for products as shown below briefly:

From Young’s inequality, we have for all $i \in [n]$

$|a_i||b_i|\leq \frac{|a_i|^p}{p} + \frac{|b_i|^q}{q}$

where equality holds only if $|a_i|^p=|b_i|^q$. Let’s sum over all $i$, we get $\mathbf{a}^T\mathbf{b}\leq \frac{||\mathbf{a}||^p_p}{p} + \frac{||\mathbf{b}||^q_q}{q}$ Now, let $||\mathbf{a}||_p$ $||\mathbf{b}||_q=1$ and $a_i = b_i$, we have

$\frac{1}{p}+\frac{1}{q}=1$

Having described dual norms, let’s go back to our case $p>0$ for Eq. (1). From the definintion of dual norm, it is straighforward to see that

$||\mathbf{a}^*||_p = \frac{|c|}{||\mathbf{b}||_q}$

We found the objective function value at the optimal solution, but how about $\mathbf{a}^*$? Well, we know that

$|\mathbf{a}^{*T}\mathbf{b}|=||\mathbf{a}^*||_p ||\mathbf{b}||_q$

So $\mathbf{a}^*$ must be $||\mathbf{a}||_p\mathbf{x}$ such that $\mathbf{x}^T\mathbf{b}=||\mathbf{b}||_q$

From the definition of $||\cdot||_q$ we have

$x_i=\frac{|b_i|^{q-1}\texttt{sign(}b_i\texttt{)}}{||\mathbf{b}||^{q-1}_q}$

In other words, for all $i \in [n]$

$a^*_i= \frac{c}{||\mathbf{b}||^{q}_q} |b_i|^{q-1}\texttt{sign}(b_i)$

Let’s see this in code.

# problem setup
from cvxopt import normal
import numpy as np
from cvxopt.modeling import variable, op, max, sum, dot
import cvxpy as cp
import matplotlib.pyplot as plt

n = 10
b = np.random.randn(n)
c = np.random.randn()
p = 3

def solve_cvxopt(n, b, c, p):
    a = cp.Variable(shape=n)
    if p == 0: print("not supported! exit!"); return
    obj = cp.Minimize(cp.norm(a, p))
    constraint = [cp.sum(cp.multiply(a,b))== c]
    prob = cp.Problem(obj, constraint)
    # ECOS and SCS solvers fail to converge before
    # the iteration limit. Use CVXOPT instead.
    prob.solve(solver=cp.CVXOPT, verbose=False)
    return a.value

def solve_dual(n, b, c, p):
    if p == 0:
        a_opt = np.zeros(n)
        for i, _b in enumerate(b):
            if abs(_b) > 0:
                a_opt[i] = c / b[i]
                return a_opt
    elif p == 1:
        a_opt = np.zeros(n)
        idx = np.argmax(abs(b))
        a_opt[idx] = c / np.max(abs(b))
        return a_opt
    else:
        if p == np.inf : q = 1
        else: q = p / (p - 1)
        den = np.linalg.norm(b, ord=q) ** q
        a_opt = c / den * np.sign(b) * (abs(b) ** (q - 1))
        return a_opt

x_opt = solve_cvxopt(n, b, c, p)

x_dual = solve_dual(n, b, c, p)

np.linalg.norm(x_opt,p)

0.06388941220368444

np.linalg.norm(x_dual,p)

0.06388941212687518