Optimization technique (OT) or Mathematical optimization or mathematical programming is a powerful set of tools to maximize or minimse an objective of a problem. There are
OT aims to slelect a best element, with respect to some criterion, from some set of available alternatives. Generally, it is divided into two subfields: (i) Discrete optimization and (ii) continuous optimization.
To find the minimum of a function in our school days we use to perform differentiation of a function i.e., find the gradient of a function and equate it to zero. We normally didn’t apply gradient descent. So why we to use other methods, such as iterative techniques while solving the problems in computers?
- The first way is by using a graph. You can find the Maxima and Minima value visually by graphing or plotting the equation and finding the value on the graph. But it gets complicated when the fucntion is complex.
In a smoothly changing function a maximum or minimum is always where the function flattens out (except for a saddle point).
- Where does it flatten out: Where the slope is zero.
- Where is the slope zero? The Derivative tells us!
- Example:
$h = 3 + 14t - 5t^2$ , find the maximum of this function. Using derivatives we can find the slope of that function:
$dy/dx = 14 - 10x$ , now find when the slope is zero, so equate the derivative equal to zero.$14 - 10x=0; x = 1.4$ The slope is zero at t = 1.4, and 'h' at this point is h = 3 + 14×1.4 − 5×1.4^2; h= 12.8 - "Second Derivative: less than 0 is a maximum, greater than 0 is a minimum"
From a mathematical perspective, it seems odd. Gradient descent relies on knowledge of the gradient to ‘slide down the slope’, but Instead, we just find the minimum analytically by directly solving for when the gradient is 0. So why then, when we move to the domain of computers, do we suddenly switch tactics and use iterative techniques? Also, why do we use gradient descent instead of, for example, Newton descent, the other popular iterative optimization technique?
If a function is defined as
Now we take a look of Gradient: Gradient captures all partial derivatives of a multivariable function. In above derivatives we talk about a fucntion which has only one scalar variable. Now we have two variable,
As we have two variables, we have to find partial derivative as we have multi-variable funciton. So we have a function
To find the gradient
Why gradient is useful? In above case for a function
In numPy it is important understand the 1D and 2D array.
Norm/length of array or vector: norm of vector can be obtained by sqaring each element of vector than sum and take the squareroot of them,
Dot product or inner product: it can be defined for 1D array of same length. Dot product of array is always scalar. For exaple
Matrix Multiplication:
matrix multiplication of
Matrix Inverse Consider a square matrix
Does the Inverse exist? Compute the determinant: when the determinant of a square matrix is zero (Singular matrix), the inverse does not exist. In Python we can find determinant by 'np.linalg.det(A)'
Matrix transpose In python we can do transpose by 'A.transpose or A.T'
Pseudo Inverse : The easieset way to understand pseudo inverse is to look the linear equation
Vector Norm: A vector is a 1D array. Informal defieition of norm is that the "norm": measure the size of vector. The formal defienition of norm is that the norm of a vector is a function that maps a vector to a positive value.