For a reference please look inside the book of Joan Daeman, Vincent Rijmen, the design of Rijndael.
(Probably there will be a match to a closer point while searching for Rijndael.pdf in google.)

The chapter 3 explains the situation better then it can be done here, and 3.4.3 is dedicated to the MixColumnStep, the matrix in (3.12) is the same as posted. In order to introduce the data in sage, to have an other, alternative approach to the excellent and quick one of vdelecroix, i will work in the polynomial ring over the field $F = \mathbb F_{2^8}=\mathbb F_2(a)$ given by:
$$ R = F[X]/(X^4+1)\ , $$
and let $x$ be the image of $X$ when taken modulo $X^4+1=(X+1)^4$. See *loc. cit.* for the equivalence of the description. Then the matrix multiplication is translated in a multiplication with the polynomial
$$ c = \mathtt{03}\cdot x^3 + \mathtt{01}\cdot x^2 + \mathtt{01}\cdot x + \mathtt{02}\ ,$$
its inverse being
$$ d = \mathtt{0B}\cdot x^3 + \mathtt{0D}\cdot x^2 + \mathtt{09}\cdot x + \mathtt{0E}\ .$$
In sage we can initialize this framework as folllows:

```
R.<A> = PolynomialRing( GF(2) )
F.<a> = GF( 2**8, modulus = A^8 + A^4 + A^3 + A + 1 )
S.<X> = PolynomialRing( F )
J = S.ideal( X^4+1 )
Q.<x> = S.quotient( J )
c = (a+1)*x^3 + x^2 + x + a
d = 1/c
```

We do not need $d$ (immediately), but it is a good check to see how things work, it is

```
sage: d
(a^3 + a + 1)*x^3 + (a^3 + a^2 + 1)*x^2 + (a^3 + 1)*x + a^3 + a^2 + a
```

which matches the book value. Now we consider the branch number of the application which is the multiplication with $c$ of a polynomial $p$ from $Q$. The weight $w$ of a polynomial is the number of its non-zero coefficients, in translation. The branch number of $c$, in my notation here $$B(c) $$ is then the minimal number among all
$$ w(p) + w(cp)\ ,\ p\ne 0\ . $$
Since we have "singletons", e.g. the polynomials $1,x,x^2,x^3$ (multiplied by constants in $F^\times$) taking $p$ to be one of them we have
$$ B(c)\le B(1)+B(c\cdot 1)=1+4=5\ .$$
To show that $5$ is indeed the minimal value, we have to prove it by structure, or make the brute force check.

Since it is simpler to check, while in the same time using sage, let us check. A smaller value would come from realizations of $w(p)+w(cp)$ of the shape
\begin{align}
&1+3\\
&2+2\\
&3+1\\
&1+2\\
&2+1\\
&1+1\\
\end{aligned}
so we will check if it is possible to realize one of the above cases. The case of weight one for one of the summands, the singleton case, is already checked, since a "singleton" can be reduced linearly to one of the polynomials $1, x, x^2, x^3$, and it is clear that multiplication with $c$ and/or $d=1/c$ does not provide a smaller value, $w(c)=w(d)=4$.

There is one more case left in the list, this is $2+2$, we have to eliminate it. Let us start with a polynomial $p$ of weight $2$. Because $w(p) = w(xp) = w(x^2p)=w(x^3p)$, we can assume that its constant coefficient is not zero, and further we can norm it to $1$. (We can maybe further restrict to special cases, but this is enough.) The question is now if any of the polynomials
$$ c(1+fx^k)\ , \ f\in F^\times\ ,\ k\in{1,2,3} $$ has weight two (or smaller).
The check is simple:

```
def w(p):
return len( p.lift().coefficients() )
for f in F:
if f.is_zero():
continue
for k in [1..3]:
p = f*x^k + 1
if w( c*p ) <= 2:
print "Weight %s for %s" % ( w(c*p), p )
```

and there is no print. To still check that there is no error in between, let us alternatively print the weights $\le 3$ and the corresponding $p$-polynomials:

```
weight3 = [ f*x^k + 1 for f in F for k in [1..3] if not f.is_zero() and w(c*(f*x^k + 1)) <= 3 ]
for p in weight3:
print "Weight %s for cp when p = %s" % ( w(c*p), p )
```

Results:

```
Weight 3 for cp when p = (a + 1)*x + 1
Weight 3 for cp when p = (a + 1)*x^2 + 1
Weight 3 for cp when p = (a^7 + a^6 + a^5 + a^4 + a^2 + a + 1)*x + 1
Weight 3 for cp when p = a*x^2 + 1
Weight 3 for cp when p = a*x^3 + 1
Weight 3 for cp when p = (a^7 + a^3 + a^2 + 1)*x + 1
Weight 3 for cp when p = (a^7 + a^3 + a^2 + 1)*x^2 + 1
Weight 3 for cp when p = (a^7 + a^3 + a^2)*x^3 + 1
Weight 3 for cp when p = (a^7 + a^6 + a^5 + a^4 + a^2 + a)*x^2 + 1
Weight 3 for cp when p = (a^7 + a^6 + a^5 + a^4 + a^2 + a)*x^3 + 1
Weight 3 for cp when p = x + 1
Weight 3 for cp when p = x^3 + 1
```

(Else we get of course the weight four.)

We can help you in the implementation of you needs in Sage, but we can not decide for you what you need or want to achieve.

I have since found the branch number of a matrix is: $B(

A) = min[weight(a) + weight(aA)], but I cannot find an example of this. I believe the weight of a matrix is the number of its non-zero components. Even so, I am not sure how to calculate it without a clear example.The weight is defined on

vectors(as in your formula!) and not on matrices. It is generally called Hamming weight. On the other hand, we are missing information to understand the matrix in your question. What does mean 02 and 00000010? What is the field of definition of your matrix? In order to apply the matrix to a vector you need to make sense of a multiplication.I see. Well, the MDS matrix shown multiplies column vectors of four elements, all of which are bytes (as are the entries in the matrix) in the field $GF(2^8)$ and using an irreducible polynomial $p(x) = X^8 + X^4 + X^3 + X + 1$. An example column vector might be $(a3\ 25\ c4\ 06)^T$. Therefore multiplying by the first entry in the MDS matrix, $(02)$, gives: $02 \times a3 = X (X^7 + X^5 + X^2 + X) = X^6 + X^4 + X^3 + X^2 + 1 $ or $5d$ (after reduction by p(x)).