
My view at the moment is that the command sequence essentially acts as a first order system, exponentially changing in response to a step disturbance. The transient response of this system is governed by the size of the optimization step. So as long as the period of the dynamic disturbances are sufficiently larger than the rise time, the algorithm shouldn't have a problem rejecting it.
This analogy isn't exactly correct since the gradient must be estimated at each step, and the step size is only constant if the normalize gradient and constant gain are used, which isn't always the case. Still, the problem basically boils down to finding a control gain large enough to provide convergence, but small enough to be stable. Even more, since the objective function is basically quadratic, there's probably some way to quantify the optimal step size, but I'm too lazy at the moment to spend much time on that street.
The current simulation is very simple, using a single channel with no poke matrices or actuator nonlinearities. The next step still be to gradually introduce these features until I have a realistic model. If this works at all, it might be worth looking into other first-order algorithms that have even faster convergence properties. i.e. pseudo-Newton and Nesterov's.

No comments:
Post a Comment