When you say "traditionally", do you mean most classifier systems are like that? Which ones do not use strength, or do not use it that way? ....
What do you mean by "payoff"? ....
What do you mean by a classifier's "accuracy"? ....
Are there many people working on classifier systems? ....
What exactly is the "strength" of a classifier? ....
Won't doing the GA in niches also address the first problem (preventing strong-niche classifiers from taking over the population), in addition to the last problem? ....
You seem to be making a value judgment between accurate and general classifiers, preferring the former. Why is this? Can't general classifiers be useful, too, to capture the first-order statistics of a broad range of input situations? ....
So would having a complete mapping mean that your system will work better than traditional classifier systems in changing environments as well, where new classifiers might suddenly be called for? ....
Why do you need an initial prediction estimate? Would it not be more expedient to say "UNKNOWN"? When the classifier system first experiences a value then it is known and the initial prediction is set to the experienced value. In this way the classifier's prediction would move to its 'true' value quicker and would be independent upon arbitrary initial values. ....
The only place the prediction error parameter is used is in the fitness update calculation. Why not simply use the absolute difference between
P and p sub j in the fitness update? (L. Booker) ....
I have a hard time figuring out how the values in the prediction array are calculated. It would make the paper easier to understand if you included walk-through of this calculation with numbers. ....
In traditional Classifier Systems, strength is also used to determine which matched classifiers are allowed to post their messages to M if the size of the message list is less than the number of matched classifiers. How does XCS handle this? ....How does XCS handle this?
Could you explain the "MAM" technique, giving an example? ....
Subsequent descriptions of the way the error (e sub j) is used suggest that the value should lie between zero and one, though I could find no explicit statement confirming that. If this is the case, then shouldn't the adjustment of (e sub j) include a normalization of the difference between
P and (p sub j)? (L. Booker) ....
Using the MAM technique, the first few estimates are an average of the samples generated so far. However, given the order that parameters are updated in (fitness, prediction error and then prediction) and the way they depend on each other (fitness on prediction error, prediction error on prediction), samples are not available for the first update of prediction or the first two updates of fitness. Only the default values of the parameters they depend on are available. This implies that averaging of the values for prediction and fitness must be delayed until actual samples have been generated for the parameters they depend on. I have tried running XCS with and without this sort of delay and overall it works better with it. Is this the correct way to implement the system? If so, why are fitness, prediction error and prediction updated in that order? Would it not be easier to update them in the reverse order? That way you would not need the delay. (T. Kovacs) ....
Does this imply that the only classifiers eligible for deletion are those in the match set? ....
Why are you interested in allocating classifier resources approximately equally to the different match sets? I would guess that, mostly due to XCS generalization capabilities, different niches could require a different number of classifiers to be completely covered. What is your opinion on this point? (M. Dorigo) ....
When a new classifier is inserted in the current population how are the classifier parameters (prediction, fitness and error) initialized? ....
Which criteria have been used in initializing the expected match set size in newly created classifer? ....
A classifier's probability of deletion is proportional to its average match set size. How do you set the probability of those which are never matched and thus, technically, have a deletion probability of 0? Do you set their match set size to 1 and, eventually, multiply by (pop_ave / initial strength) once the unmatched classifier's initial strength falls below the fraction of the population's average (item 2, page 9)? ....
You state earlier in the paper that different niches have different payoff levels. This fact is pivotal to developing classifier fitness based on accuracy. But why then do you use a constant value of epsilon 0? Would it not be more appropriate to use a measure more in line with the predicted value, for example eps0 = 0.05*P (5% of the prediction)? ....
Your accuracy measure seems to give essentially the same (low) fitness to all classifiers that make more than a small number of errors. Its hard to see how the GA can use these poor performers to help discover more accurate classifiers. My guess is that XCS learns accurate specific rules first, then uses those to discover accurate generalizations. Is that right? How would XCS performance be affected if you used a less severe accuracy criterion (e.g. something proportional to the squared error)? (L. Booker) ....
From the description, it seems that fitness (F sub J) is an estimate of the relative accuracy for the macroclassifier: the average established by MAM is subsequently altered by the Widrow-Hoff delta rule with parameter Beta. Relative accuracy (kappa prime sub j) is always a real between 0.0 and 1.0, so the F sub J values should also be in that range, or approach that range from whatever value of F sub I is used. But in Figures 2 and 13 you show macroclassifiers with fitnesses as high as 782.0 or 384.0. How come? ....
What are alpha, epsilon, and epsilon0 ? Can't find them defined in the paper. ....
At the first sentence of page 11, you mention that the total of fitness adjustments to the members of the Previous Action Set is constant, but I don't understand why. I guess the total is Beta * (1.0 - sum of (F sub j) in the previous action set), which does not seem to be constant. Since (F sub j) is an estimate of the relative accuracy, whose sum is 1.0, I suppose the discussion about mapping equal numbers of classifiers to each condition-action map is still valid though. (2/19/02) ....
The macroclassifier's mechanism seems quite odd in non-toy problems. Even with a genotype length as short as 20, the probability of generating 2 identical rules is nearly null. So what good is it?
(F. Baiardi and A. Giani) ....
Suppose a new (micro) classifier is created. If there already exists a classifier with same condition/action string then the numerosity counter of the macro classifier is incremented by one. What happens to the other parameters such as the Time-Stamp used for the GA? Also, macro classifiers diminish the diversity in the population. Don't you think this can have a negative effect in other applications? ....
So is this generalization mechanism essentially an intrinsic reward for generality (as opposed to the more extrinsic G parameter used in your BOOLE program). Is that right? ....
Why did you use this layered payoff landscape? Can XCS learn a complete mapping given the simple landscape of 1 for correct actions and 0 for incorrect actions? (L. Booker) ....
In Figure 3, the initial values of prediction, error, and fitness are shown as 10.0, 0.0, and 10.0. Have these been multiplied by 1000 as the fitness was in Figures 2 and 8? (T. Kovacs) ....
What's the likelihood of discovering and maintaining such a complete mapping in problems more complex than the boolean multiplexer? In real problems, discovering good rules (where "good" means able to get a large payoff) is difficult. Discovering enough meaningful rules, both good and bad, to form a complete mapping seems quite unlikely, if you only focus on the problem of maintaining the most accurate rules.
(F. Baiardi and A. Giani) ....
What if crossover or mutation generates a sensor code other than 000 (blank), 010 or 011 (rocks), 110 or 111 (food)? For example, what if you crossed at a point that combined 000 and 011 into 001 and 010? How would you interpret 001? Or is crossover defined such that it cannot occur within a sensation and is mutation defined such that it mutates between blank, O, Q, F, G, and #? ....
Why did you choose as exploration/exploitation strategy to decide at the beginning of a run whether to explore or to exploit instead of choosing at each step, as is more usual in reinforcement learning experiments? (M. Dorigo) ....
The empirical evidence suggests that XCS does indeed learn complete maps of its payoff landscape. What isn't clear is how XCS makes sure that each action has a representative in each match set. Are there mechanisms that guarantee this? (L. Booker) ....
In your GA with fitness based on accuracy, why should the offspring be more accurate? The search you are proposing seems more random than genetic, because the GA is only used to remove inaccurate rules.
(F. Baiardi and A. Giani) ....
I want to write a program to implement XCS, Hmmm >....< Can I find any source code about XCS in internet? Like the SCS! If "not" can you tell me some details about how to implement XCS, I just have a little idea! ....
Can I add temporary memory to XCS, as you proposed doing in your paper on ZCS? ....Q! << Click here to ask a new question on this paper.