CS122 Lecture: Graphs Last Revised 4/7/98
Materials: DFS and BFS Transparencies
I. Introduction
A. The general trend in our discussion has been to move from the simplest
most specific data structures to increasingly flexible and general kinds
of structures. Thus, we have moved from primative structures through
sequential structures to a particular form of branching structure, the
the tree. We now focus on the most general kind of branching structure,
the graph. So general is this structure that all of the others we have
studied turn out to be just special kinds of graphs. Even apart from
this consideration, graphs are probably the most widely used of all
mathematical structures. We introduce them briefly now, focusing on
terminology and a few key operations. They are covered in much greater
depth in CS321.
B. Formally, a graph consists of a set of VERTICES (often denoted V) and a
set of EDGES (often denoted E) which connect the vertices. Each edge
is, in fact, a (possibly ordered) pair of vertices.
ex: A ----- B ---- C
\ \______ D _/
\ /
\_ E _/
V = { A, B, C, D, E }
E = { (A,B), (A,D), (A,E), (B,C), (C,D), (D,E) }
1. In an undirected graph, the order of the edges in the pairs does
not matter. The above example has been drawn as an undirected
graph - hence the edges could just as well be listed as:
E = { (B,A), (D,A), (E,A), (C,B), (D,C), (E,D) } or
E = { (B,A), (A,D), (E,A), (B,C), (D,C), (D,E) }
2. In a directed graph (digraph), the edges are ORDERED pairs. This
can be symbolized by drawing the edges with arrow heads, and by
enclosing the pairs in angle brackets rather than parentheses:
ex: The following is a digraph having the same general shape as
the graph we have been discussing:
ex: A ----> B ------> C
^ \______> D <_/
\ /
\_ E <_/
V = { A, B, C, D, E }
E = { <A,B>, <A,D>, <B,C>, <C,D>, <D,E>,
<E,A> }
3. In an edge of a digraph <V1,V2>, V1 is called the TAIL and V2 is
called the HEAD (cf. the way we draw the edge).
4. In either case, we say that an edge e is INCIDENT ON a vertex v if
v is either the tail or the head of the edge.
C. Other terminology
1. In an undirected graph, we say that vertices V1, V2 are ADJACENT if
(V1,V2) or (V2,V1) is in E. In a digraph, we say that V1 is ADJACENT
TO V2 (note implicit direction) if <V1,V2> is in E, and we likewise
say that V2 is ADJACENT FROM V1.
2. In an undirected graph, the DEGREE of a vertex is the number of
vertices it is adjacent with. In a digraph, the OUTDEGREE of a vertex
is the number of vertices it is adjacent to, and the INDEGREE of a
vertex is the number of vertices adjacent to it.
ex: in the undirected graph, A and B are adjacent, A and D are
adjacent, and A and E are adjacent, so the degree of A is 3.
in the digraph, A is adjacent to B and A is adjacent to D, so its
outdegree is 2. E is adjacent to A, so A's indegree is 1.
3. In a graph, a PATH from vertex Vs to vertex Vf is a set of vertices
Vs, V1, V2 .. Vn, Vf s.t. (Vs,V1), (V1,V2) .. (Vn,Vf) are in E.
In a digraph, a DIRECTED PATH from vertex Vs to vertex Vf is a set
of vertices Vs, V1, V2 .. Vn, Vf s.t. <Vs,V1>, <V1,V2> ..
<Vn,Vf> are
in E. (Note - if Vs is adjacent to Vf, then Vs,Vf is a path from
Vs to Vf).
ex: in the undirected graph, A, E, D forms a path from A to D, as
does A,B,C,D, A,D, as well as paths like A,D,A,D,A,D.
in the digraph, A,B,C,D and A,D are paths from A to D - but A,E,D
is not a path since neither the pair <A,E> nor the pair
<E,D> is in
E. (<E,A> and <D,E> are.)
4. A SIMPLE PATH is one in which all of the vertices (save possibly the
first and last) are unique. (Some writers call such a path ELEMENTARY,
and use the term simple for a path in which all the edges, but not
necessarily the nodes, are unique)
ex: A,B,C,D, A,D,E,A - but not A,D,E,A,D
5. A CYCLE is a simple path from some vertex to itself. In an undirected
graph, we add the requirement that the path have at least three edges,
to rule out considering something like ABA as a cycle in the graph A-B.
ex: in either A,D,E,A
6. A graph that contains no cycles is ACYCLIC.
7. A subgraph of a graph G is a graph G' such that V' is a subset of V
and E' is a subset of E. (Of course, only vertices in V' may appear
in the pairs in E' if G' is to be a graph).
8. A graph that contains a path connecting any pair of vertices V1,V2
(where V1 <> V2) is CONNECTED. A digraph that contains a directed
path from each vertex to each other vertex is STRONGLY CONNECTED.
ex: our graph is connected and our digraph is strongly connected.
a. If a digraph is not strongly connected, we sometimes say it is
connected (a much weaker condition) if the corresponding undirected
graph is connected. This corresponding undirected graph is one
that contains (V1,V2) in its set of edges iff <V1,V2> and/or
<V2,V1> is in the set of edges of the digraph.
b. If a digraph is not strongly connected, we sometimes say it is
ROOTED if there exists at least one vertex R such that there is
a directed path from R to each other vertex in the graph. Note
that a strongly connected digraph is always rooted, but the reverse
is not necessarily so. However, if a digraph is rooted then the
corresponding undirected graph is always connected.
9. In an unconnected graph, a CONNECTED COMPONENT is a connected subgraph
of maximal size. In an unconnected digraph, a STRONGLY CONNECTED
COMPONENT is a strongly connected subgraph of maximal size.
ex: The graph A---B----C----D E----F----G
is not connected. The connected components are
A---B----C----D and E----F----G
A----B----C is not a connected component because it is not of
maximal size.
D. Recall that we defined a graph in terms of a SET of edges, E. This
implies that there cannot be more that one edge connecting any pair
of vertices in a graph, or more than one edge connecting any pair of
vertices in the same direction in a digraph. A graph-like structure in
which this restriction is not met is called a MULTIGRAPH.
E. A graph/digraph in which each edge has a numerical value (weight or
cost) associated with it is called a WEIGHTED GRAPH or a NETWORK.
ex: If a transportation network is modelled by a graph, with cities
as vertices and roads as edges, then the weight of an edge might
be the length of the road in miles.
_____________ WENHAM
| / 5
| 4 BEVERLY
| / 3 |
DANVERS | 2
\ 3 |
SALEM
Note: sometimes a multigraph can be represented by a network in which
the weight assigned to each edge is the number of occurrences of
the corresponding edge in the multigraph.
F. Note that some familiar structures are in fact special kinds of graphs:
1. A list is an acyclic rooted digraph in which every vertex save the
root has indegree one and every vertex save one has outdegree one.
2. A tree is an acyclic rooted digraph.
II. External and Internal representations of graphs
A. Because of the many applications of graphs, it turns out to be
advantageous to consider several different ways of representing a graph
in memory. Often, it will turn out that one of these representations
will be vastly superior to others for a given application.
B. For representing a graph in an external file (e.g. as input to a
program), a simple representation is as follows:
1. First line of the file: two integers - number of vertices (n), number
of edges (e).
2. Next n lines - information on each of the vertices. (Can be omitted
if vertices are simply labeled by some scheme such as 1, 2, 3 .. or
A, B, C...
3. Next e lines - information on each of the edges:
a. Tail vertex
b. Head vertex
c. Weight and/or other information as needed.
ex: our sample undirected graph:
5 6
-- no information needed on vertices
A B
B C
C D
D A
D E
E A
(of course - order of vertices in the pairs does not matter in this
case)
ex: our four-town network:
4 5
BEVERLY
DANVERS
SALEM
WENHAM
BEVERLY DANVERS 3
BEVERLY SALEM 2
BEVERLY WENHAM 5
DANVERS SALEM 3
DANVERS WENAM 4
C. One simple internal representation is an ADJACENCY MATRIX. If there are
n vertices, then the matrix will have n rows and n columns. The elements
of the matrix may be of type boolean, or may be 0's and 1's.
1. For a graph, matrix element [i,j] will be 1 iff (Vi,Vj) or (Vj,Vi) is
in E.
ex: A B C D E
A 0 1 0 1 1
B 1 0 1 0 0
C 0 1 0 1 0
D 1 0 1 0 1
E 1 0 0 1 0
2. For a digraph, matrix element [i,j] will be 1 iff <Vi,Vj> is in E.
ex: A B C D E
A 0 1 0 1 0
B 0 0 1 0 0
C 0 0 0 1 0
D 0 0 0 0 1
E 1 0 0 0 0
3. Note that for a graph, the adjacency matrix will be symmetrical
around the diagonal. Wasted space can be avoided by storing only
half the matrix.
4. For a network, we can use a matrix in which the elements are the
weights associated with the edges. If no edge exists connecting a
given pair of vertices, it will often be expedient to store maxint -
i.e. the cost of going from one point to another along a nonexistent
path is infinite.
ex:
BEVERLY DANVERS SALEM WENHAM
BEVERLY maxint 3 2 5
DANVERS 3 maxint 3 4
SALEM 2 3 maxint maxint
WENHAM 5 4 maxint maxint
note: in the above, it may seem reasonable to use a value of 0 for
distance from a town to itself. However, the model is one of paths,
and we do not wish our algorithms to explore the possibility of driving
around in circles!
D. Adjacency list: A more flexible (and often more efficient) implementation
results if we associate with each vertex a linked list of edges incident
to that vertex. The benefit of this is that we can quickly find all the
edges associated with a given vertex by traversing the list, instead of
having to look through possibly hundreds of zero values to find a few
ones in a row of an adjacency matrix.
1. Normally what we do is use an array to represent the vertices. Each
array element contains the label on the vertex and possibly other
related information, plus a pointer to a linked list of nodes
describing edges of which the given vertex is the tail.
2. Each edge node contains the label on the tail and the head of the
edge, plus the weight if the graph is a network.
ex: BEVERLY DANVERS SALEM WENHAM
| | | |
Danvers Beverly Beverly Beverly
3 3 2 5
| | | |
Salem Salem Danvers Danvers
2 3 3 4
| |
Wenham Wenham
5 4
3. Note that for a graph, each edge will appear in the adjacency list
twice - once under each of the vertices it is incident on. (cf the
symmetry of the adjacency matrix). This will not ordinarily happen
with a digraph, of course.
E. Adjacency multilists
1. With adjacency lists, each edge in an undirected graph appears twice
in the list. Also, there is an obvious assymetry for digraphs - it
is easy to find the vertices a given vertex is adjacent to (simply
follow its adjacency list), but hard to find the vertices adjacent to
a given vertex (we must scan the adjacency lists of all vertices).
These can be rectified by a structure called an adjacency multilist.
2. An adjacency multilist is similar to adjacency lists, except that
each edge node appears on two linked lists - one for each of the
vertices it is incident on. In addition, in a digraph each vertex
has two lists associated with it - one of edges of which it is the
tail, and one of edges of which it is the head.
3. The following shows the adjacency multilist for our example
DIRECTED graph:
/--A B C D <--- E
/ | /| /| /| / |
/ ---+-------/ | / | / | / |
/ / | | / | / | / |
/ /-> A,B ---+-----/ | / | / |
/ | / | | / | / |
/ | /-->B,C | / + |
/ ----+-------------------+--/ /| |
/ / | | / | |
/ /---> A,D -------------> C,D / | |
/ / | |
/ /-> D,E |
/ |
/------------------------------------------------------> E,A
III. Operations on graphs
A. Searches
1. When we discussed trees, we saw that one class of operations that
was very important was traversal - the systematic visiting of every
node in the tree. For graphs, the corresponding operations are
called searches. In a search, we systematically visit as many
vertices as possible and as many edges as possible, starting from a
given starting vertex.
2. There are two basic search orders: depth first search (DFS) and
breadth-first search (BFS).
a. In DFS, we start at a vertex and move as far as we can down one
path from the vertex before exploring the other paths.
ex: on our sample undirected graph, starting at A, we would visit
vertices in the order A,B,C,D,E
b. In BFS, we explore all of the paths emanating from our starting
vertex before progressing further.
ex: on our sample undirected graph, starting at A, we would visit
vertices in the order A,B,D,E,C.
c. Note that either method requires some method of marking vertices
so that we do not visit them more than once. (This can be done
by including a mark field in the node for each vertex, initialized
to false before the search and set to true when the node is
visited. Or, if the order of visitation is important, we can use
a field that records when the node was visited, initially set to
0.)
d. Note that pre-order traversal on a tree is a DFS, and level-order
traversal on a tree is a BFS. We will see that DFS algorithms
make use of a stack or recursion, and BFS algorithms use a queue.
e. Note that if a graph is not connected (strongly connected), then
a search will only visit some of the vertices.
3. Ex: DFS on a digraph represented as an adjacency matrix.
TRANSPARENCY
4. Example: BFS in a digraph represented by adjacency lists
TRANSPARENCY
B. Other operations will be considered in CS321
Copyright ©1999 - Russell C. Bjork