Method of Gradient Descent

Similar to Method of Gradient Ascent Also see Gradient Ascent

Consider the function

with initial estimate

Calculate the gradient vector

and the Hessian Matrix

Look at the plot

Do one step of gradient descent:

The general gradient descent algorithm for any function with a minimum.

def gradient_descent(func, x0):
 
"""
 
Performs one step of gradient descent with optimal alpha calculation
 
for a given sympy function of x1 and x2, starting from x0.
 
  
 
Parameters:
 
- func: sympy function of two variables, x1 and x2.
 
- x0: Initial point as a list or tuple of two numbers [x1, x2].
 
  
 
Returns:
 
- New point as a list [x1, x2] after one step of gradient descent.
 
- Optimal alpha used for the step.
 
"""
 
# Define symbols
 
x1, x2, alpha = symbols('x1 x2 alpha')
 
# Calculate gradient
 
grad = [diff(func, var) for var in [x1, x2]]
 
# Substitute x0 into gradient and express new point as function of alpha
 
grad_at_x0 = [g.subs([(x1, x0[0]), (x2, x0[1])]) for g in grad]
 
x1_new = x0[0] - alpha * grad_at_x0[0]
 
x2_new = x0[1] - alpha * grad_at_x0[1]
 
# Function value as a function of alpha
 
f_alpha = func.subs([(x1, x1_new), (x2, x2_new)])
 
# Derivative of function with respect to alpha and solve for optimal alpha
 
df_dalpha = diff(f_alpha, alpha)
 
optimal_alpha_solution = solve(df_dalpha, alpha)
 
# Handle multiple solutions for alpha
 
if not optimal_alpha_solution:
 
print("No solution found for alpha. Check the function's form.")
 
return x0, None # Return original point and None for alpha if no solution
 
optimal_alpha = optimal_alpha_solution[0] # Assuming the first solution is optimal
 
# Calculate new point using the optimal alpha
 
x1_optimal = x1_new.subs(alpha, optimal_alpha)
 
x2_optimal = x2_new.subs(alpha, optimal_alpha)
 
return [x1_optimal, x2_optimal], optimal_alpha

For the function considered, this code will return which will result in the next estimate Continue doing gradient descent steps until it converges to the approximate global minimum