The average is usually one that weighs recent payoffs more heavily than earlier ones. Typically,
S(t+1) <-- S(t) + b(P - S(t)),
where b is a "learning rate" something like 0.1, P is the current payoff, and S(t) and S(t+1) are the current and adjusted strengths, respectively.
Notice that the equation tends to push the strength in the direction of the current payoff. In fact, if the payoff is constant for many updates in a row, you can see that the strength converges to it.
What exactly is the "strength" of a classifier"