use memcpy's to make splitvertical gradient much faster - using log n memcpy's is much quicker than setting a pointer value n times
Here are some profiling results. splitvertical1 is the original code, splitvertical2 is some slight improvements in locality for it, and splitvertical3 is the new O(log n) memcpy code
i also tested this with 'time' to draw 1000 gradients, and the new code used approximately half the user time, and finished 10 seconds quicker. so yeah, it's magical and works well.