What is NetQ?
Questions (by section)
>>Click on the dots (....) after each question
to get answers. To ask a new question, see bottom of the page.
1. Introduction
When you say "traditionally", do you mean most classifier systems are like
that? Which ones do not use strength, or do not use it that way?
....
What do you mean by "payoff"?
....
What do you mean by a classifier's "accuracy"?
....
Are there many people working on classifier systems?
....
2. How to Measure Fitness?
What exactly is the "strength" of a classifier?
....
Won't doing the GA in niches also address the first problem (preventing
strong-niche classifiers from taking over the population), in addition
to the last problem?
....
You seem to be making a value judgment between accurate
and general classifiers, preferring the former. Why is
this? Can't general classifiers be useful, too, to capture
the first-order statistics of a broad range of input
situations?
....
So would having a complete mapping mean that your system will work better
than traditional classifier systems in changing environments
as well, where new classifiers might suddenly be called for?
....
3. Description of XCS
Why do you need an initial prediction estimate? Would it
not be more expedient to say "UNKNOWN"? When the classifier
system first experiences a value then it is known and the
initial prediction is set to the experienced value. In this
way the classifier's prediction would move to its 'true'
value quicker and would be independent upon arbitrary initial
values.
....
The only place the prediction error parameter is used is in
the fitness update calculation. Why not simply use the
absolute difference between
P and p sub j in the fitness update?
(L. Booker)
....
3.1 Performance Component
I have a hard time figuring out how the values in the
prediction array are calculated. It would make the paper
easier to understand if you included walk-through of this
calculation with numbers.
....
In traditional Classifier Systems, strength is also used to
determine which matched classifiers are allowed to post their
messages to M if the size of the message list is less than the
number of matched classifiers. How does XCS handle this?
....
How does XCS handle this?
3.2 Reinforcement Component
Could you explain the "MAM" technique, giving an example?
....
Subsequent descriptions of the way the error (e sub j) is used
suggest that the value should lie between zero and one,
though I could find no explicit statement confirming that. If
this is the case, then shouldn't the adjustment of (e sub j)
include a normalization of the difference between
P and (p sub j)?
(L. Booker)
....
Using the MAM technique, the first few estimates are an average of the
samples generated so far. However, given the order that parameters are
updated in (fitness, prediction error and then prediction) and the way
they depend on each other (fitness on prediction error, prediction error on
prediction), samples are not available for the first update of prediction
or the first two updates of fitness. Only the default values of the
parameters they depend on are available. This implies that averaging of
the values for prediction and fitness must be delayed until actual samples
have been generated for the parameters they depend on. I have tried running
XCS with and without this sort of delay and overall it works better with it.
Is this the correct way to implement the system? If so, why are fitness,
prediction error and prediction updated in that order? Would it not be
easier to update them in the reverse order? That way you would not need
the delay. (T. Kovacs)
....
3.3 Discovery Component
Does this imply that the only classifiers eligible for deletion
are those in the match set?
....
Why are you interested in allocating classifier resources approximately
equally to the different match sets? I would guess that, mostly due to
XCS generalization capabilities, different niches could require a different
number of classifiers to be completely covered. What is your opinion on
this point? (M. Dorigo)
....
When a new classifier is inserted in the current population
how are the classifier parameters (prediction, fitness and error)
initialized?
....
Which criteria have been used in initializing the expected
match set size in newly created classifer?
....
A classifier's probability of deletion is proportional to
its average match set size. How do you set the probability
of those which are never matched and thus, technically, have
a deletion probability of 0? Do you set their match set size
to 1 and, eventually, multiply by (pop_ave / initial strength)
once the unmatched classifier's initial strength falls below
the fraction of the population's average (item 2, page 9)?
....
3.4 The Fitness Calculation
You state earlier in the paper that different
niches have different payoff levels. This fact is pivotal
to developing classifier fitness based on accuracy. But why
then do you use a constant value of epsilon 0? Would it not
be more appropriate to use a measure more in line with the
predicted value, for example eps0 = 0.05*P (5% of the
prediction)?
....
Your accuracy measure seems to give essentially the same (low)
fitness to all classifiers that make more than a small number of
errors. Its hard to see how the GA can use these poor performers
to help discover more accurate classifiers. My guess is that XCS
learns accurate specific rules first, then uses those to discover
accurate generalizations. Is that right? How would XCS performance
be affected if you used a less severe accuracy criterion (e.g.
something proportional to the squared error)? (L. Booker)
....
From the description, it seems that fitness (F sub J) is an estimate
of the relative accuracy for the macroclassifier: the average established
by MAM is subsequently altered by the Widrow-Hoff delta rule with
parameter Beta. Relative accuracy (kappa prime sub j) is always a real
between 0.0 and 1.0, so the F sub J values should also be in that range,
or approach that range from whatever value of F sub I is used. But in
Figures 2 and 13 you show macroclassifiers with fitnesses as high
as 782.0 or 384.0. How come?
....
What are alpha, epsilon, and epsilon0 ? Can't find them
defined in the paper.
....
At the first sentence of page 11, you mention that the total of fitness
adjustments to the members of the Previous Action Set is constant, but I don't
understand why. I guess the total is Beta * (1.0 - sum of (F sub j) in the
previous action set), which does not seem to be constant. Since (F sub j)
is an estimate of the relative accuracy, whose sum is 1.0, I suppose the
discussion about mapping equal numbers of classifiers to each condition-action
map is still valid though. (2/19/02)
....
3.5 Macroclassifiers
The macroclassifier's mechanism seems quite odd in non-toy problems.
Even with a genotype length as short as 20, the probability of generating
2 identical rules is nearly null. So what good is it?
(F. Baiardi and A. Giani)
....
Suppose a new (micro) classifier is created.
If there already exists a classifier with same
condition/action string then the numerosity counter of the
macro classifier is incremented by one.
What happens to the other parameters such as the Time-Stamp used
for the GA?
Also, macro classifiers diminish the diversity in the population.
Don't you think this can have a negative effect in other
applications?
....
4.1 Generalization Hypothesis
So is this generalization mechanism essentially an intrinsic reward for
generality (as opposed to the more extrinsic G parameter used in your
BOOLE program). Is that right?
....
4.2.1 The 6-multiplexer
Why did you use this layered payoff landscape? Can XCS learn
a complete mapping given the simple landscape of 1 for correct
actions and 0 for incorrect actions? (L. Booker)
....
In Figure 3, the initial values of prediction, error, and fitness are
shown as 10.0, 0.0, and 10.0. Have these been multiplied by 1000
as the fitness was in Figures 2 and 8? (T. Kovacs)
....
4.2.2 The 11-multiplexer
What's the likelihood of discovering and maintaining such a complete
mapping in problems more complex than the boolean multiplexer? In real
problems, discovering good rules (where "good" means able to get a
large payoff) is difficult. Discovering enough meaningful rules, both
good and bad, to form a complete mapping seems quite unlikely, if
you only focus on the problem of maintaining the most accurate rules.
(F. Baiardi and A. Giani)
....
4.3.1 Woods2
What if crossover or mutation generates a sensor code other than
000 (blank), 010 or 011 (rocks), 110 or 111 (food)? For example,
what if you crossed at a point that combined 000 and 011 into
001 and 010? How would you interpret 001? Or is crossover defined
such that it cannot occur within a sensation and is mutation defined
such that it mutates between blank, O, Q, F, G, and #?
....
4.3.2 Experiments in Woods2
Why did you choose as exploration/exploitation strategy to decide at the
beginning of a run whether to explore or to exploit instead of choosing
at each step, as is more usual in reinforcement learning experiments?
(M. Dorigo)
....
5. Discussion
The empirical evidence suggests that XCS does indeed learn
complete maps of its payoff landscape. What isn't clear is how
XCS makes sure that each action has a representative in each match set.
Are there mechanisms that guarantee this? (L. Booker)
....
In your GA with fitness based on accuracy, why should the offspring
be more accurate? The search you are proposing seems more random
than genetic, because the GA is only used to remove inaccurate rules.
(F. Baiardi and A. Giani)
....
I want to write
a program to implement XCS, Hmmm >....< Can I find any source code
about XCS in internet? Like the SCS! If "not" can you tell me
some details about how to implement XCS, I just have a little idea!
....
5.2 Future Research Directions
Can I add temporary memory to XCS, as you proposed doing in your
paper on ZCS?
....
Q! << Click here to ask a new
question on this paper.