Graph Algorithms
Introduction
The three basic algorithms that search to find a path in a graph from the given origin to the given destination are the Breadth First Search BFS, the Depth First Search DFS and the Shortest Path SP algorithm. All three use the same basic approach and differ only in the manner in which they keep the To-Do information of the nodes to visit next. The elements of the To-Do collection should record the node already visited, that has an edge to the node we could choose to visit next. For the SP algorithm, we also record the distance to the origin, if our routing ends in this edge.
The BFS keeps the To-Do information as a queue, the DFS keeps the To-Do information as a stack, and the SP keeps the To-Do information as a priority queue, choosing at each step to remove the node with the shortest distance to the origin.
For this algorithm to work, we need to represent the graph as a collection of nodes (ArrayList or a HashMap can work) where each node can look up its list of neighbors (and for the SP, it also can determine the distance to each neighbor.
When we visit a node N, we add all of its neighbors M to the To-Do information, i.e we add adges (N -> M) to the To-Do collection. In the case of the SP algorithm, we also record the distance to each node M if we reached it through the node N.
As we remove the elements of the To-Do collection, we add them to the collection of Visited nodes (remembering the information on how each node has been reached). The Visited collection can be an ArrayList or a HashMap.
Here is a brief description of all three algorithms:
Search Algorithms - Stage 1: Path finding
Start with an empty To-Do information. Find in the collection of nodes the start node and add it to the To-Do information, with itself as the node we came from and the distance equal to zero.
Repeat the next two steps until the algorithm fails, or you move to the Backtracking stage.
If the To-Do information is empty then no path has been found. End the algorithm and report the failure.
Otherwise: remove the edge that leads to the node N from the To-Do information. Add the removed the edge to the Visited. (Mark the node N as visited.)
If the node we removed from the To-Do information is the destination move on to the Backtracking stage, with the destination node being the current node.
Add all neighbors M of the node N to the To-Do information as follows:
Do not add M to the To-Do information if M has been already visited (i.e. an edge (X -> M) for some X appears in the Visited).
When adding M to the To-Do information do the following:
For the DFS and BFS do not add, if the To-Do information already contains the node M (i.e. an edge (X -> M) for some X appears in the To-Do).
For the SP, add if the To-Do information does not contain the node M (i.e. an edge (X -> M) for some X appears in the To-Do).
If it already contains the node M check if the new distance is shorter that the one already recorded in the list. If the new distance is shorter, replace the previous entry for M in the To-Do information with the new one.
Mark every node you have added to the To-Do as a fringe node.
Search Algorithms - Stage 2: Backtracking
Mark the current node as a backtrack node.
While the current node does not equal the origin do the following:
remove the edge leading to the current node from the Visited and add it to the final path.
Set the current node to the node that led to the current node.
Mark the current node as a backtrack node.
Stop and print the routing: all nodes in the final path.
A Simple Example
Consider the following simple graph:
A ------ D |
| / | |
| / | |
| / | |
| / | |
| / | |
| / | |
B ------ C |
Looking for a path from A to C we will have:
------------- search step 1 ----------- |
Visited: empty ToDo: (A -> A 0) |
|
------------- search step 2 ----------- |
A marked as visited |
Visited: (A -> A 0) ToDo: (A -> B 0) |
(A -> D 0) |
B, D marked as fringe |
|
------------- search step 3 ----------- |
A, B marked as visited |
Visited: (A -> A 0) ToDo: (A -> D 0) |
(A -> B 0) (D -> C 0) |
B, C marked as fringe |
|
(We did not add (B -> C 0) as C is in ToDo) |
|
------------- search step 4 ------------ |
A, B, D marked as visited |
Visited: (A -> A 0) ToDo: (D -> C 0) |
(A -> B 0) |
(A -> D 0) |
C marked as fringe |
|
------------- search step 5 ----------- |
A, B, D, C marked as visited |
Visited: (A -> A 0) ToDo: |
(A -> B 0) |
(A -> D 0) |
(D -> C 0) |
|
---------- backtrack step 1 ----------- |
C - current, marked as backtrack |
|
---------- backtrack step 2 ----------- |
D - current, marked as backtrack |
Path: (D -> C 0) |
|
---------- backtrack step 3 ----------- |
A - current, marked as backtrack |
Path: (D -> C 0) |
(A -> D 0) |
|
---------- backtrack step 3 ----------- |
A - current: stop, print path: |
A -> D -> C |
Kruskal Algorithm for Minimum Spanning Tree
Finding the minimum spanning tree of a connected graph is the same problem as finding a minimal connected component.
Here is Kruskal algorithm illustrated on a concrete example:
A -----30------- B -----50------- F |
\ / | / |
\ / | / |
50 35 40 50 |
\ / | / |
\ / | / |
\ / | / |
E --15-- C ---25-- D |
We represent the graph as a priority queue of edges with the shortest edge given the highest priority:
(E C 15) |
(C D 25) |
(A B 30) |
(B E 35) |
(B C 40) |
(F D 50) |
(A E 50) |
(B F 50) |
At each step we remove the shortest edge from the priority queue and add it to the spanning tree, provided we do not introduce a cycle.
So we add the edges (E C 15), (C D 25), (A C 30) and (B C 35). When we try to add the edge (B C 40) we see that it would make a cycle, and so this edge is not needed and we discard it.
We then add edges (F D 50). We can stop now, because we have added five edges to a graps with six nodes and it can be easily shown that adding any other edge would make a cycle. Or we can keep removing the remaining edges from the list – seeing at each step that they would make a cycle. Here we have only the edge (B F 50) left and, yes, it would make a cycle.
The only snag in this algorithm is in figuring out whether the new edge creates a cycle with the edges we have already selected. For this we use the Union/Find algorithm.
Its name describes what it does. It allows us to keep track of a set of disjoint subsets of some set, find whether two objects are in the same subset, and construct a union of two subsets within this collection of subsets.
Union/Find Algorithm
Our goal is to keep track of which nodes are already connected with the edges we have added. At the beginning there are no edges, and every node is only connected to itself. After a few edges have been added, the nodes in each separate segment of the graph, i.e. the elements of a distinct subset all have links to other element of their set, with one of them being the representative with no further link.
So, if our graph consists of two segments with nodes 1 2 5 6 and 3 4 the link structure will look like an upside-down tree like this:
5 3 |
^ ^ |
/ \ | |
2 1 4 |
^ |
| |
6 |
i.e., node 6 is linked to node 2, nodes 2 and 1 are linked to node 5 and nod 4 is linked to node 3. Nodes 5 and 3 have no links - they are the representatives of the set of all nodes linked to it.
If we now connect nodes 2 and 4, we first find the representatives of each of them, and if they are different, we add a link from one representative to another. The order does not matter. It is clear, that if we add a link from the node 5 to 3 then the node 3 will become the representative of all nodes and the whole structure represents a single set.
Implementation details
We represent the nodes as an ArrayList of integers, each element representing one node. The value of each entry will be a link to some node that this one is connected to – either through a direct edge, or being connected to some component that contains the other node. At the start all links are -1, indicating this element is the representative of a subset.
Let us recall our earlier exampple:
A -----30------- B -----50------- F |
\ / | / |
\ / | / |
50 35 40 50 |
\ / | / |
\ / | / |
\ / | / |
E --15-- C ---25-- D |
We represent the graph as a priority queue of edges with the shortest edge given the highest priority:
(E C 15) |
(C D 25) |
(A B 30) |
(B E 35) |
(B C 40) |
(F D 50) |
(A E 50) |
(B F 50) |
We create a mapping from nodes to nodes that shows for each node a node it is connected to - either directly, or though some path. For each connected componenet, one node srves as the representative of the connected component and is not linked to any other node. At the beginning, there are no edges, and each node is a representative of the singleton component - itself.
+---+---+---+---+---+---+ |
Node: | A | B | C | D | E | F | |
+---+---+---+---+---+---+ |
Link: | * | * | * | * | * | * | |
+---+---+---+---+---+---+ |
We add edge (E C 15), i.e. nodes 2 and 4 are now connected:
Tree shape: |
+---+---+---+---+---+---+ E |
Node: | A | B | C | D | E | F | ^ |
+---+---+---+---+---+---+ | |
Link: | * | * | E | * | * | * | C |
+---+---+---+---+---+---+ |
We add edge (C D 25), i.e. nodes 2 and 3 are now connected and 2 is the representative for both of them.
Tree shape: |
+---+---+---+---+---+---+ E |
Node: | A | B | C | D | E | F | ^ |
+---+---+---+---+---+---+ / \ |
Link: | * | * | E | E | * | * | C D |
+---+---+---+---+---+---+ |
We add edge (A B 30), i.e. nodes 0 and 1 are now connected:
Tree shape: |
+---+---+---+---+---+---+ A E |
Node: | A | B | C | D | E | F | ^ ^ |
+---+---+---+---+---+---+ | / \ |
Link: | * | A | E | E | * | * | B C D |
+---+---+---+---+---+---+ |
We now have three connected components: Nodes B and A are in one of them, node F is a singleton, and nodes C, D, and E are in the third component.
The graph looks like this:
A -----30------- B F |
|
|
|
|
|
|
E --15-- C ---25-- D |
We add edge (B E 35), i.e. nodes 1 and 2 are now connected. That means we add a link from the representative for B, node 0 to the representative for the node E, node 2:
Tree shape: |
+---+---+---+---+---+---+ E |
Node: | A | B | C | D | E | F | ^ |
+---+---+---+---+---+---+ / | \ |
Link: | C | A | E | E | * | * | A C D |
+---+---+---+---+---+---+ ^ |
| |
B |
We still have two components. We try to add the edge (B C 40) to the graph, but the representative for node C is the same as the representative for the node B: node C itself, or node 2. Adding the edge would connect two nodes that already have a path between them, and so it would create a cycle. We discard this edge.
Adding the edge (F D 50) will be the last step - after this we have only one representative and all nodes are connected to each other:
Tree shape: |
F |
+---+---+---+---+---+---+ ^ |
Node: | A | B | C | D | E | F | | |
+---+---+---+---+---+---+ E |
Link: | C | A | E | E | F | * | ^ |
+---+---+---+---+---+---+ / | \ |
A C D |
^ |
| |
B |
Algorithm Details
The algorithm consists of two functions:
Node findRepresentative(Node n) that produces the representative nRep of the node n
and
void union(int n1, int n2) that finds the representatives n1Rep and n2Rep of the two nodes, and if they are not the same, sets the value of the link for the node n1Rep to n2Rep.
Additionally, one may include a function that counts the number of subset representatives, i.e. the number of nodes whose links are not referencing another node.