Genetic Algorithms Demystified

Datetime:2016-08-23 03:13:36          Topic: Algorithm           Share

Introduction

Genetic Algorithmsbelong to a larger class of computer-based problem solving systems called Evolutionary Algorithms which use computational models that follow the principles of evolution and heredity in their design and implementation. The other variants of Evolutionary Algorithms are Evolutionary Programming and Evolution Strategies. Genetic Algorithms was developed by John Holland, his colleagues and students at the University of Michigan in early 1970s. Genetic Algorithms deal with optimization problems. Inspired by Darwin’s theory of evolution, Genetic Algorithms employ a repeated process of selection, attrition, and cross-breeding of potential solutions in search of the optimal one to a problem.

Applications

The needs for Optimization are due in part to scarcity of resources and in part to conflicting desires. As a daily ritual, we would like to choose the commuting arrangement that enable us to reach our destination in the manner that yields the cheapest fare, the shortest commuting time, and be as comfortable as possible. Is it achievable? We know that the answer is “you cannot have your cake and eat it too”. The relationship between these desires are contradictory and can only be resolved through optimization. In this commuting case, individual commuters will have to search all possible combination of commuting arrangements – taxis, buses, trains or a mix of them, evaluate and compare in order to reach an optimal solution. This is optimization at work.

On a more serious note, Genetic Algorithms have been used in the following areas with varying degree of success:

  1. Resource Allocation – Job shop scheduling, time-table planning, and the classical traveling salesperson problem
  2. Design Optimization – Network routing, satellite orbit selection
  3. Machine Learning – Optimize other machine learning systems, such as weights for neural networks, weights for case based reasoning system, and rules for classifier systems

Minds-on

We will step through the process of how genetic algorithms deal with optimization problem while introducing the core concepts of genetic algorithms along the way.

Core Concepts

  1. Problem Modeling
    • Define the problem requirements that is what we are optimizing for (e.g. balanced distribution of containers on a vessel).
    • Translate this problem into the genetic algorithms problem requirements (e.g. assignment of containers to locations in the vessel).
    • Identify the information already presents in the problem (e.g. container IDs, weights, Location IDs) as well as information that needs to be computed or derived (e.g. center of gravity).
    • Determine the fitness function for evaluating the “goodness” of each candidate solution (e.g. the lower the overall center of gravity of each feasible solution the better).
    • Encode the solution in a form that is analogous to biological’s chromosomes or sequence of DNA (e.g. an array of locations in the vessel with each location being filled with a container ID).
    • Determine the appropriate method to evolve feasible solutions (e.g. permutation on the order of container IDs to each location in the array (Figure 1), or permutation on the location indices to each container ID (Figure 2).
    Figure 1: Permutation by Container IDs
    Figure 2: Permutation by Location Indices
  2. Evolution Process
    • Start with an initial population of randomly generated candidate solutions (chromosomes).
    • Evaluate each solution and assigned a measure of its fitness by the fitness function for Selection in the next stage.
    • Select high fitness individuals from the current population to undergoes reproduction – Crossover and Mutation .
    • Combine the features of two parents to form two similar offspring as a way to conserves and exploits good genetic traits. This is known as Crossover.
    • Alter one or more components of a selected chromosome randomly to inject diversity into the population. This is known as Mutation. Mutation will help to break the dominance of elitist group by giving the weaker individuals a chance, albeit a very slim one, to survive.
    • Replace older members of the population with their offspring forming a new generation of population.
    • Repeat the process of evaluation, selection, crossover, mutation and replacement until some stopping criteria are met.
  3. Stopping Criteria
    • When a pre-determined number of iterations or generations is reached.
    • When a chromosome (solution) that meets or exceeds a target fitness value is found.
    • When all the chromosomes in the population have attained certain level of uniformity in terms of fitness.

Selection Methods

We shall take a closer look at some of the selection methods for selecting new population in genetic algorithms.

  1. Fitness Proportionate Selection
    • Calculate the fitness value for each chromosome, f i (i th chromosome).
    • Find the total fitness of the whole population, F .
    • Calculate the probability of selection for each chromosome, p i = f i / F .
    • Calculate the cumulative probability for each chromosome, q i = ∑ i p i .
    • Generate a random number between [0..1], r .
    • If r is less than q 1 (i.e., cumulative probability of first chromosome), then select the first chromosome, otherwise select the i th chromosome where q i-1 < r < q i .
    • If a few chromosomes possess overly large fitness values as compared to the majority, they will be selected too often. The consequence is too-quick and pre-mature convergence, resulting in sub-optimal solution.
  2. Tournament Selection
    • Select two individuals from the population.
    • Generate a random number between [0..1], r .
    • If r is less than a pre-determine number T (tournament size), than select the fitter of the two individuals, otherwise select the weaker one.
    • The two individuals are then returned to the original population and can be selected again.
    • This type of selection tends to favour the fitter one more as the tournament size increases.
  3. Rank Selection
    • Rank the individuals in the population in ascending order according to their fitness values.
    • The weakest one will get a ranking of 1, the next weakest one 2, and so on.
    • Use the ranking instead of the fitness value to calculate the probability of selection for each chromosome.
    • This method will prevent too-quick convergence that could happen with fitness proportionate selection method, but at the expense of slower convergence.

The choice of selection method is problem dependant and can greatly impact the optimization process and outcome. It may be decided after comparing the outcomes from these methods.

Crossover Operators

Crossover and mutation are two core operators of genetic algorithms. Similar to selection methods, there are many ways to perform crossover and mutation. Here we discuss the crossover operation first.

In crossover, segments of two parent chromosomes are randomly chosen and swapped to produce two offspring. Some of the crossover methods are:

  1. Single-point Crossover
    • Select two parent chromosomes for reproduction.
    • Select the position of crossover point randomly.
    • Copy those genes on the left side of the crossover point of parent 1’s chromosome to child 1.
    • Copy those genes on the right side of the crossover point of parent 2’s chromosome to child 1.
    • Child 1 is born.
    • Repeat the same process but swap the roles of the two parents to produce child 2.
    • The above process is illustrated in Figure 3.
    Figure 3: Single-point Crossover
  2. Multi-point Crossover
    • Select two parent chromosomes for reproduction.
    • Select the positions of two (or more) crossover points randomly.
    • Copy those genes outside of the two crossover points of parent 1’s chromosome to child 1.
    • Copy those genes between the two crossover points of parent 2’s chromosome to child 1.
    • Child 1 is born.
    • Repeat the same process but swap the roles of the two parents to produce child 2.
    • The above process is illustrated in Figure 4.
    Figure 4: Multi-point Crossover

Not all chosen chromosome pairs will undergo crossover. We introduce a new parameter called Probability of Crossover , p c . Before a crossover takes place, we generate a random number from the range of [0..1], if this random number is less than the probability of crossover, then crossover takes place otherwise aborts. This gives the expected number of chromosomes to undergo crossover at p i x population size. The probability of crossover is also known as crossover rate .

Mutation Operators

Mutation replaces the values of some randomly chosen genes of a chromosome by some arbitrary new values. One of the popular ways of mutation for a binary chromosome will be Bit Inversion which simply flips the values of randomly chosen bits from 1 to 0 or vice versa.

Similar to crossover, mutation does not always take place. It depends on a parameter called Probability of Mutation , p m . Before a mutation takes place, we generate a random number from the range of [0..1], if this random number is less than the probability of mutation, then mutation takes place else aborts. The probability of mutation is also known as mutation rate .

The objective of mutation is to inject diversity into the population, prompting the genetic algorithms to explore new solutions and as such lower the risk of being trapped in a local optimum (Figure 5).

Figure 5: Overcome Local Optimum through Mutation

Performance

The performance of genetic algorithms in problem solving depends on the following factors:

  • The encoding method
  • The selection method
  • The crossover operators
  • The mutation operators
  • The parameter settings, i.e. population size, crossover rate, and mutation rate

When to Use

Genetic algorithms can be applied to solve problems in the following situations:

  • When we do not know the ways to reasonably solve problems.
  • When it is not possible to enumerate all possible solutions (NP-complete).
  • But we know how to differentiate a good solution from a bad one.

Caveats

If genetic algorithms can converge to an optimal solution, it is equally likely that it can converge to a poor solution. If that occurs, it is most probably owing to poor problem modeling, premature convergence, a poor fitness function, poor parameter settings, or simply bad luck of the random number.

Do we know whether the final solution generated by genetic algorithms optimal or any way near there? The answer is we never know. If we knew how to find the optimal solution, we would not need to use genetic algorithms in the first place. Bearing in mind that the problems that require genetic algorithms to solve are NP-complete, i.e., they are very hard problems to solve.

In other words, there is no guarantee that genetic algorithms will find an optimal solution.

Open Source Tools

  • Python
    • Pyevolve – A complete python genetic algorithm framework.
  • PERL
    • AI::Genetic – A pure Perl genetic algorithm implementation.

Commercial Tools

Hands-on

Enough of talking, we shall explore the design and implementation of a genetic algorithms project to optimize a load distribution problem. The project is code named GALoadDistriMiser . The project was implemented in Java language and its execution can be viewed on YouTube by clicking on Figure 6 below:

Figure 6: Click to View GALoadDistriMiser on YouTube

Problem Description

The project is about the optimization of load distribution of 64 packages on a vessel. It is assumed that the holding space for the packages is divided into 64 rectangular spaces (Figure 7).

Figure 7: Stacks of Packages

The problem requirements state that

  1. The distribution of the weights of the packages should be more or less uniform.
  2. It is also necessary to ensure that the lighter packages are placed on top of the heavier ones.

Translate these into genetic algorithms objectives, we have:

  1. Optimize weight distribution horizontally to balance the vessel.
  2. Optimize weight distribution vertically to maintain low center of gravity.

To make the case study even more challenging, I have added two more requirements:

  1. Optimize unloading operation at each port of call. i.e. the activities of unloading and loading of obstructing packages have to be minimized at each port.
  2. Meeting as much as possible objectives 1 and 2 after unloading at each port of call so that no unnecessary reshuffling is needed. (Assuming no loading of new packages at each destination port.)

Problem Model

The following information are provided:

  • Package IDs.
  • Weights of the 64 packages.
  • Location IDs.

The GALoadDistriMiser problem is represented in a GA model in Figure 8 as follows:

Figure 8: GA Model
  • In the phenotypic space (i.e. the physical world), the candidate solutions are manifested as different configurations of stacks of 64 packages.
  • In the genotypic space (i.e. the genetic algorithms space), each candidate solution is represented as a chromosome of 64 genes identified by package IDs.
  • The Java program will evolve candidate solutions by permutating the 64 package IDs.
  • The found optimal solution is then mapped back to the phenotypic space.

Fitness Evaluation

To evaluate the quality of each candidate solution (chromosome) in meeting the four optimization objectives, I have to devise four different evaluation functions for each objective. The overall fitness is the sum of these four evaluation functions. Here, the smaller the overall fitness values the better the solution.

  1. Fitness function for evaluating objective 1:
    • Calculate the sum of the average weight deviation of each vertical stack (four packages per stack) from the overall average weight of the 64 packages,
    • F 0 = ∑(average weight of each stack of 4 packages – average weight of all 64 packages)
  2. Fitness function for evaluating objective 2: Two options are available:
    • Impose penalty when a heavier package is placed above a lighter one, PENALTY_WEIGHT += 1 ; or
    • Implement repair method to sort packages in each stack so that the packages are arranged in weight-ascending order from top down. If this option is chosen,then PENALTY_WEIGHT = 0 .
  3. Fitness Function for evaluating objective 3:
    • Impose penalty when a package due for later port is placed above one due for earlier port, PENALTY_PORT += 1 .
  4. Fitness Function for evaluating objective 4:
    • After unloading at each port, calculate the sum of the average weight deviation of each stack from the overall average weight of the remaining packages,
    • At the nth port, F n = ∑(average weight of each stack – average weight of remaining packages).

Finally, the fitness of a candidate solution is: F 0 + PENALTY_WEIGHT + PENALTY_PORT + ∑(F n ) => the smaller the value the better.

Selection Method

Linear Rank Selection method was used to select the parents for reproduction. It goes like this:

  1. Each chromosome in the population is ranked in increasing order of fitness from 1 to N , where N is the population size, assume R i is the ranking for the i th chromosome.
  2. Each chromosome is assigned a new fitness value using the cumulative relative fitness function q i = ∑ 1..i i , where i = (N – R i + 1) .
  3. Generate a random number r between 1 and N .
  4. If r is less than q 1 then select the first chromosome, otherwise select the i th chromosome where q i-1 < r < q i .

Crossover Operator

Order Crossover operator was used to breed offspring from two parent chromosomes. The following example illustrates the algorithm:

  1. Select two parent chromosomes P1 and P2, each with two cut-points | as follows:
    • P1: (J K L | M N O | Z Q R)
    • P2: (Q J N | L K Z | R M O)
  2. Create a premature child C1 by copying the elements from P1’s mid-section:
    • C1: (_ _ _ | M N O | _ _ _)
  3. From here onwards, only consider P2. Consider the elements from the second cut_point of P2, i.e. R M O. Only copy R to C1 as M and O already exists in C1.
    • C1: (_ _ _ | M N O | R _ _)
  4. Consider P2 from the start, and copy those elements that are not already in C1 to fill out the vacancies after R in C1.
    • C1: (_ _ _ | M N O | R Q J)
  5. Consider P2 from the start, and copy those elements that are not already in C1 to fill out the vacancies from the start of C1.
    • C1: (L K Z | M N O | R Q J)
  6. Child 1 has been delivered.
  7. The second child can similarly be bred by switching the roles of P1 and P2.

Order crossover operator creates children that preserve the order and position of elements inherited from one parent. It also preserves the relative order of the remaining elements inherited from the second parent.

Evolution Scheme

The optimization process of the GALoadDistriMiser was designed and coded according to the following scheme:

BEGIN
    INITIALISE random population with following parameters
        POPULATION SIZE (N)
        CROSSOVER RATE (CR)
        MUTATION RATE (MR)
        NUMBER OF EVOLUTIONS (STOPPING CRITERION)

    EVALUATE each candidate based on its fitness value;
    
     DO WHILE (STOPPING CRITERION is not met)

        IF (ADAPTIVE GA option is selected and best fitness value remains the same for 10,000 generation) 
            CR = CR * 1.10 (CR capped at 0.7) 
            MR = MR * 1.10 (MR capped at 0.1)
        END IF
        
        IF (REPAIR CENTRE OF GRAVITY option is selected)
            SORT each stack so that heavier packages are placed below the lighter ones
        END IF
    
        1    SELECT 2 parents randomly based on Linear Rank Selection method;
        2    Generate a random number c, if c < CR, THEN breed 2 offspring from these 2 parents using Order Crossover operator ELSE the 2 parents just move over to the intermediate population;
        3    For each member in the intermediate population, generate a random number m, if m < MR, MUTATE this member by randomly swapping 2 genes;
        4    IF size of intermediate population < initial population GOTO 1
        5    EVALUATE all offspring in the intermediate population;
        6    REPLACE the worst offspring with the best solution found so far;
        7    COMPLETE the forming of a new generation;
    LOOP
END

Observations

I have conducted many runs of GALoadDistriMiser under different parameter settings and options in an effort to understand the behaviour and performance of genetic algorithms. Some results are capture in Figure 9.

Figure 9: Performance of GALoadDistriMiser

The main observations are as follows:

  • Genetic algorithms evolution process will never follow a fixed path for each re-run because of stochastic nature. However, every well-designed genetic algorithms model will behave identically; they generally improve faster at the initial stage when population diversity is wide spread, but gradually slow down and become sluggish at the later stage when the population is dominated by near homogeneous individuals.
  • Owing to the multi-modal and non-continuity of the search space, a better solution could just be next to an infeasible one. So repairing an infeasible solution, if possible, could lead to the discovery of the next best solution, and thus reducing the search effort.
  • Static configuration of genetic algorithms parameters that may work very well at the initial stage of evolution will generally be of little use when the population reaches a certain local optimum. This results in genetic algorithms hitting a bottleneck situation after some time. One way to enable the genetic algorithms to break out of the local optimum and explore new search space is to allow genetic algorithms to change the control parameters dynamically based on some criteria. One of these could be to increase the crossover and mutation rates when there is no significant improvement in fitness value over say 10000 generations.

System Schema

GALoadDistriMiser was developed in Java and integrated with the following components (Figure 10):

  • JGAP (Java Genetic Algorithm Package) was included as the genetic algorithms library
  • Java Swing provides the GUI for user interaction and monitoring of the genetic algorithms progress;
  • MS Access was used for storing genetic algorithms parameters and solutions;
  • MS Excel, which links to MS Access, is the excellent and powerful tool for viewing the evolution log and solutions (based on the raw data, one can chart graphs and diagrams in Excel easily).
Figure 10: System Schema of GALoadDistriMiser

Summary

With minds-on and hands-on, you have learned the fundamental concepts of genetic algorithms and to put them into good use in the design and implementation of a genetic algorithms project.





About List