
For an ES, initialization is somewhat similar to sampling the population of the first generation. Good initialization is extremely important, especially as the neural nets get deeper. This is section is a bit more advanced - don't worry until you got some learning going.
#Nn sequential pytorch update
If that doesn't work debug your program further before adding the update rule. If the fitness values are all the same, try increasing your sigma until you see different fitness values. Print the std/mean of everything, especially of the fitness.
#Nn sequential pytorch code
First, try to write the code such that it evaluates a population only a single time (a single generation) without updating anything. Or in other words, the function calculate_population_fitness(pop, mother_vector, scores, targets) creates a result that depends only on scores and targets, both of which are constant within your loop over the population. You need to update and evaluate the model for every perturbation of the "mother" vector. The most obvious problem is that you only evaluate your model once (in the line scores = model(data)) before you start to loop over the population. Mother_vector = mother_vector + (LR / (POPULATION_SIZE * SIGMA)) * torch.matmul(pop.t(), normalized_fitness) # update mother vector with the fitness values Normalized_fitness = (fitness - an(fitness)) / torch.std(fitness) Pop = om_numpy(np.random.randn(POPULATION_SIZE, n_params)).float()įitness = calculate_population_fitness(pop, mother_vector) We dont use differnetiation in ESįor iteration in tqdm(range(ITERATIONS)): With torch.no_grad(): #autograd makes it slower + takes more memory. N_params = nn.utils.parameters_to_vector(model.parameters()).shape Nn.utils.vector_to_parameters(mother_params, model.parameters()) # now, we calculate the fitness of entire populationĭef calculate_population_fitness(pop, mother_vector): Params_try = mother_params + SIGMA*state_dict # in ES, our population is a slightly altered version of the mother parameters, so we implement a jitter function Nn.utils.vector_to_parameters(solution, model.parameters()) # solution is a vector of paramters like mother_parametrs # now, increasing loss means the model is learning Return 1/loss_func(y_pred, y_true) # we are maximizing the loss in ES, so take the reciprocal Mother_vector = nn.utils.parameters_to_vector(mother_parameters) X_test = X_test.reshape(X_test.shape, -1) X_train = X_train.reshape(X_train.shape, -1) X_train, X_test, y_train, y_test = torch.FloatTensor(X_train), torch.FloatTensor(X_test), torch.LongTensor(y_train), torch.LongTensor(y_test) X_train, X_test, y_train, y_test = train_test_split(x, y, train_size=train_size) (x_train, y_train), (x_test, y_test) = mnist.load_data() I am new to Pytorch, so if any glaring malpractices are there, please say so! # importsįrom sklearn.model_selection import train_test_split


This leads me to believe that the problem is in the ES optimizer.

I tried trying some StackOverflow recommendations, such as adjusting the hyperparameters (learning rate, hidden units), as well as using Leaky ReLu in case of a "dying ReLu" none of them worked. Maybe there are redundant iterations?Ĭolab Notebook, if you want to check it out: Additionally, it takes super long to train. Usually the first few outputs are different, then it "converges" at 0.0987 in the training set and 0.098 in the test set. I tried to incorporate what was on the notebook, and the code runs with no problems yet it outputs the same accuracy. I found a Colab notebook that does this exact thing, but on the sklearn "make_moons" dataset. For now, I am just trying to implement this with a Linear ANN. My task is to create a ANN with an Evolution Strategies algorithm as the optimizer (no derivation).
