a4.2.1.1.html

I wanted a test landscape that (1) had many different payoff levels, but such that (2) for each payoff level there were a number of different input/action combinations having that payoff. (1) was because many learning situations--e.g., sequential problems using discounted payoff--have payoff landscapes with many different levels; I wanted a single-step test bed for that. (2) was because I wanted to test the hypothesized (Section 4.1) generalization power of XCS; so I needed a problem which would permit generalizations.

These requirements are satisfied by the "layered" or ramp-like landscape, with the results shown in Figure 3. I also did two other landscapes, including the 1/0 (or 1000/0) landscape you ask about.

Here are compact descriptions of all three landscapes (the ramp is first):

;; Payoff-lists for multiplexer functions

(defconstant *6m-ppmap* '((1000.0 700.0)
			  (900.0 600.0)
			  (800.0 500.0)
			  (700.0 400.0)
			  (600.0 300.0)
			  (500.0 200.0)
			  (400.0 100.0)
			  (300.0 0.0))
)

(defconstant *6m-xmap*	'((1000.0 300.0)
			  (900.0 400.0)
			  (800.0 500.0)
			  (700.0 600.0)
			  (600.0 700.0)
			  (500.0 800.0)
			  (400.0 900.0)
			  (300.0 1000.0))
)

(defconstant *6m-pn*	'((1000.0 0.0)
			  (1000.0 0.0)
			  (1000.0 0.0)
			  (1000.0 0.0)
			  (1000.0 0.0)
			  (1000.0 0.0)
			  (1000.0 0.0)
			  (1000.0 0.0))
)

Each list gives the correct-action/incorrect-action payoff for the corresponding input schema from the list

11***1
11***0
10**1*
10**0*
01*1**
01*0**
001***
000***

E.g., for the 6m-ppmap, with input 110101, the payoff would be 1000 for the correct (Boolean multiplexer) action, 1, and 700 for the incorrect action, 0.

The second payoff list, 6m-xmap, has two "crossed" payoff ramps. The third list, 6m-pn, is the "standard" 1000/0 landscape that you asked about ("pn" means "payoff-nil"). XCS's performance is approximately the same on all three landscapes.

In particular, the curves for the 1000/0 landscape are very similar to Figure 3, but (averaging over 10 runs, as in Figure 3), 100% performance is reached at about 5,000 problems (i.e., approximately 2,500 explore problems), versus 4,000 problems (2,000 explore problems) for the ramp landscape of Figure 3. Thus the 1000/0 problem is apparently a bit harder, assuming the difference is not due to statistical errors.

A similar experiment using the second payoff landscape, the "crossed" one, gave results very close to the ramp landscape. The error and population size curves were very close. I couldn't really make a performance curve for the crossed landscape because its payoffs don't obey the rule that the "correct" answer always has larger payoff.

Why did you use this layered payoff landscape? Can XCS learn a complete mapping given the simple landscape of 1 for correct actions and 0 for incorrect actions?