gradient Accumulation Step
Parameters
value
Gradient accumulation means running a configured number of "GradAccumulationStep" steps without updating the model weights while accumulating the gradients of those steps, and then using the accumulated gradients to compute the weight updates. Must be a positive integer.