The general philosophy is to choose parameter values in such a way that the new classifier will have a minimal effect on the system, at least initially.
If a classifier is created as a result of the "cover" operation, I set the prediction and error to the population mean prediction and mean error. The fitness is set to one-tenth of the population mean fitness. A very low fitness means that the classifier's contribution to the "system prediction" will be initially very small, since in the calculation its prediction will be weighted by its fitness.
The genetic algorithm creates two classifiers whenever it acts. If no crossover occurs, the offspring simply inherit the predictions of the parents. If crossover does occur, each offspring's prediction is set to the mean of the parents' predictions. Again, this is a policy of "minimal disruption". The offspring errors are set to one-fourth of the population mean error. The fitnesses are set to one-tenth of the population mean fitnesses.
All my more recent experiments have started with an empty population. However, in earlier experiments starting with a randomly generated population, I used 10.0 for the prediction, 0.0 for the error, and 0.01 for the fitness. These values are arbitrary and basically not very important, since the parameters will be rapidly changed as a classifier is actually used, especially under the MAM updating procedure.
I would say, just take care that the combination of parameter initialization and updating causes the system's early updates to result in "best guesses" as to the real values. But I have found that in general the choice of initial values has only a small effect on system performance.
When a new classifier is inserted in the current population how are the classifier parameters (prediction, fitness and error) initialized?