Graphs

CS122 Lecture: Graphs                                   Last Revised 4/7/98

Materials: DFS and BFS Transparencies

I. Introduction

   A. The general trend in our discussion has been to move from the simplest
      most specific data structures to increasingly flexible and general kinds
      of structures.  Thus, we have moved from primative structures through
      sequential structures to a particular form of branching structure, the
      the tree.  We now focus on the most general kind of branching structure,
      the graph.  So general is this structure that all of the others we have
      studied turn out to be just special kinds of graphs.  Even apart from
      this consideration, graphs are probably the most widely used of all
      mathematical structures.  We introduce them briefly now, focusing on
      terminology and a few key operations.  They are covered in much greater
      depth in CS321.

   B. Formally, a graph consists of a set of VERTICES (often denoted V) and a
      set of EDGES (often denoted E) which connect the vertices.  Each edge 
      is, in fact, a (possibly ordered) pair of vertices.

      ex:              A ----- B  ---- C
                        \ \______ D _/
                         \       /
                          \_ E _/

        V = { A, B, C, D, E }
        E = { (A,B), (A,D), (A,E), (B,C), (C,D), (D,E) }

      1. In an undirected graph, the order of the edges in the pairs does
         not matter.  The above example has been drawn as an undirected
         graph - hence the edges could just as well be listed as:

        E = { (B,A), (D,A), (E,A), (C,B), (D,C), (E,D) } or
        E = { (B,A), (A,D), (E,A), (B,C), (D,C), (D,E) }

      2. In a directed graph (digraph), the edges are ORDERED pairs.  This
         can be symbolized by drawing the edges with arrow heads, and by
         enclosing the pairs in angle brackets rather than parentheses:

        ex: The following is a digraph having the same general shape as
            the graph we have been discussing:

      ex:              A ----> B ------> C
                        ^ \______> D <_/
                         \        /
                          \_ E <_/

        V = { A, B, C, D, E }
        E = { <A,B>, <A,D>, <B,C>, <C,D>, <D,E>,
<E,A> }

      3. In an edge of a digraph <V1,V2>, V1 is called the TAIL and V2 is
         called the HEAD (cf. the way we draw the edge).

      4. In either case, we say that an edge e is INCIDENT ON a vertex v if
         v is either the tail or the head of the edge.

   C. Other terminology

      1. In an undirected graph, we say that vertices V1, V2 are ADJACENT if
         (V1,V2) or (V2,V1) is in E.  In a digraph, we say that V1 is ADJACENT
         TO V2 (note implicit direction) if <V1,V2> is in E, and we likewise
         say that V2 is ADJACENT FROM V1.

      2. In an undirected graph, the DEGREE of a vertex is the number of
         vertices it is adjacent with.  In a digraph, the OUTDEGREE of a vertex
         is the number of vertices it is adjacent to, and the INDEGREE of a
         vertex is the number of vertices adjacent to it.

        ex: in the undirected graph, A and B are adjacent, A and D are 
            adjacent, and A and E are adjacent, so the degree of A is 3.

            in the digraph, A is adjacent to B and A is adjacent to D, so its
            outdegree is 2.  E is adjacent to A, so A's indegree is 1.

      3. In a graph, a PATH from vertex Vs to vertex Vf is a set of vertices
         Vs, V1, V2 .. Vn, Vf s.t. (Vs,V1), (V1,V2) .. (Vn,Vf) are in E.
         In a digraph, a DIRECTED PATH from vertex Vs to vertex Vf is a set
         of vertices Vs, V1, V2 .. Vn, Vf s.t. <Vs,V1>, <V1,V2> ..
<Vn,Vf> are
         in E.  (Note - if Vs is adjacent to Vf, then Vs,Vf is a path from
         Vs to Vf).

        ex: in the undirected graph, A, E, D forms a path from A to D, as
            does A,B,C,D, A,D, as well as paths like A,D,A,D,A,D.

            in the digraph, A,B,C,D and A,D are paths from A to D - but A,E,D
            is not a path since neither the pair <A,E> nor the pair
<E,D> is in
            E.  (<E,A> and <D,E> are.)

      4. A SIMPLE PATH is one in which all of the vertices (save possibly the
         first and last) are unique.  (Some writers call such a path ELEMENTARY,
         and use the term simple for a path in which all the edges, but not 
         necessarily the nodes, are unique)

        ex: A,B,C,D,  A,D,E,A - but not A,D,E,A,D

      5. A CYCLE is a simple path from some vertex to itself.  In an undirected
         graph, we add the requirement that the path have at least three edges, 
         to rule out considering something like ABA as a cycle in the graph A-B.

        ex: in either A,D,E,A

      6. A graph that contains no cycles is ACYCLIC.

      7. A subgraph of a graph G is a graph G' such that V' is a subset of V
         and E' is a subset of E.  (Of course, only vertices in V' may appear
         in the pairs in E' if G' is to be a graph).

      8. A graph that contains a path connecting any pair of vertices V1,V2
         (where V1 <> V2) is CONNECTED.  A digraph that contains a directed 
         path from each vertex to each other vertex is STRONGLY CONNECTED.

        ex: our graph is connected and our digraph is strongly connected.

         a. If a digraph is not strongly connected, we sometimes say it is
            connected (a much weaker condition) if the corresponding undirected
            graph is connected.  This corresponding undirected graph is one
            that contains (V1,V2) in its set of edges iff <V1,V2> and/or
            <V2,V1> is in the set of edges of the digraph.

         b. If a digraph is not strongly connected, we sometimes say it is
            ROOTED if there exists at least one vertex R such that there is
            a directed path from R to each other vertex in the graph.  Note
            that a strongly connected digraph is always rooted, but the reverse
            is not necessarily so.  However, if a digraph is rooted then the
            corresponding undirected graph is always connected.

      9. In an unconnected graph, a CONNECTED COMPONENT is a connected subgraph
         of maximal size.  In an unconnected digraph, a STRONGLY CONNECTED
         COMPONENT is a strongly connected subgraph of maximal size.

        ex: The graph           A---B----C----D         E----F----G 

         is not connected.  The connected components are 

                A---B----C----D         and             E----F----G 

         A----B----C is not a connected component because it is not of
         maximal size.

   D. Recall that we defined a graph in terms of a SET of edges, E.  This
      implies that there cannot be more that one edge connecting any pair
      of vertices in a graph, or more than one edge connecting any pair of
      vertices in the same direction in a digraph.  A graph-like structure in 
      which this restriction is not met is called a MULTIGRAPH.

   E. A graph/digraph in which each edge has a numerical value (weight or 
      cost) associated with it is called a WEIGHTED GRAPH or a NETWORK.

        ex: If a transportation network is modelled by a graph, with cities
            as vertices and roads as edges, then the weight of an edge might
            be the length of the road in miles.

                 _____________  WENHAM
                 |           / 5 
                 | 4    BEVERLY
                 |   / 3   |
                DANVERS    | 2
                     \ 3   |
                        SALEM

        Note: sometimes a multigraph can be represented by a network in which
              the weight assigned to each edge is the number of occurrences of
              the corresponding edge in the multigraph.

   F. Note that some familiar structures are in fact special kinds of graphs:

      1. A list is an acyclic rooted digraph in which every vertex save the
         root has indegree one and every vertex save one has outdegree one.

      2. A tree is an acyclic rooted digraph.  

II. External and Internal representations of graphs

   A. Because of the many applications of graphs, it turns out to be
      advantageous to consider several different ways of representing a graph
      in memory.  Often, it will turn out that one of these representations
      will be vastly superior to others for a given application.

   B. For representing a graph in an external file (e.g. as input to a
      program), a simple representation is as follows:

      1. First line of the file: two integers - number of vertices (n), number
         of edges (e).

      2. Next n lines - information on each of the vertices.  (Can be omitted
         if vertices are simply labeled by some scheme such as 1, 2, 3 .. or
         A, B, C...

      3. Next e lines - information on each of the edges:

         a. Tail vertex
         b. Head vertex
         c. Weight and/or other information as needed.

        ex: our sample undirected graph:

        5 6
        -- no information needed on vertices
        A B
        B C
        C D
        D A
        D E
        E A

        (of course - order of vertices in the pairs does not matter in this
         case)

        ex: our four-town network:

        4 5
        BEVERLY
        DANVERS
        SALEM
        WENHAM
        BEVERLY DANVERS 3
        BEVERLY SALEM 2
        BEVERLY WENHAM 5
        DANVERS SALEM 3
        DANVERS WENAM 4

   C. One simple internal representation is an ADJACENCY MATRIX. If there are
      n vertices, then the matrix will have n rows and n columns.  The elements
      of the matrix may be of type boolean, or may be 0's and 1's.

      1. For a graph, matrix element [i,j] will be 1 iff (Vi,Vj) or (Vj,Vi) is
         in E.  

        ex:               A  B  C  D  E
                        A 0  1  0  1  1
                        B 1  0  1  0  0
                        C 0  1  0  1  0
                        D 1  0  1  0  1
                        E 1  0  0  1  0

      2. For a digraph, matrix element [i,j] will be 1 iff <Vi,Vj> is in E.

        ex:               A  B  C  D  E
                        A 0  1  0  1  0
                        B 0  0  1  0  0
                        C 0  0  0  1  0
                        D 0  0  0  0  1
                        E 1  0  0  0  0

      3. Note that for a graph, the adjacency matrix will be symmetrical
         around the diagonal.  Wasted space can be avoided by storing only
         half the matrix.

      4. For a network, we can use a matrix in which the elements are the
         weights associated with the edges.  If no edge exists connecting a
         given pair of vertices, it will often be expedient to store maxint -
         i.e. the cost of going from one point to another along a nonexistent
         path is infinite.

        ex:

                BEVERLY DANVERS SALEM   WENHAM
        BEVERLY maxint  3       2       5       
        DANVERS 3       maxint  3       4
        SALEM   2       3       maxint  maxint
        WENHAM  5       4       maxint  maxint

        note: in the above, it may seem reasonable to use a value of 0 for
        distance from a town to itself.  However, the model is one of paths,
        and we do not wish our algorithms to explore the possibility of driving
        around in circles!

 D. Adjacency list: A more flexible (and often more efficient) implementation
      results if we associate with each vertex a linked list of edges incident
      to that vertex.  The benefit of this is that we can quickly find all the
      edges associated with a given vertex by traversing the list, instead of
      having to look through possibly hundreds of zero values to find a few
      ones in a row of an adjacency matrix.

      1. Normally what we do is use an array to represent the vertices.  Each
         array element contains the label on the vertex and possibly other
         related information, plus a pointer to a linked list of nodes 
         describing edges of which the given vertex is the tail.

      2. Each edge node contains the label on the tail and the head of the
         edge, plus the weight if the graph is a network.

        ex:     BEVERLY         DANVERS          SALEM          WENHAM
                   |               |               |               |
                Danvers         Beverly         Beverly         Beverly
                   3               3               2               5
                   |               |               |               |
                 Salem           Salem          Danvers         Danvers
                   2               3               3               4
                   |               |
                Wenham          Wenham
                   5               4

      3. Note that for a graph, each edge will appear in the adjacency list
         twice - once under each of the vertices it is incident on.  (cf the
         symmetry of the adjacency matrix).  This will not ordinarily happen 
         with a digraph, of course.

   E. Adjacency multilists

      1. With adjacency lists, each edge in an undirected graph appears twice
         in the list.  Also, there is an obvious assymetry for digraphs - it
         is easy to find the vertices a given vertex is adjacent to (simply 
         follow its adjacency list), but hard to find the vertices adjacent to 
         a given vertex (we must scan the adjacency lists of all vertices).
         These can be rectified by a structure called an adjacency multilist.

      2. An adjacency multilist is similar to adjacency lists, except that
         each edge node appears on two linked lists - one for each of the
         vertices it is incident on.  In addition, in a digraph each vertex
         has two lists associated with it - one of edges of which it is the
         tail, and one of edges of which it is the head.

      3. The following shows the adjacency multilist for our example
         DIRECTED graph:
                       
             /--A         B         C         D      <--- E
            /   |        /|        /|        /|     /     |
           / ---+-------/ |       / |       / |    /      |  
          / /   |         |      /  |      /  |   /       |
         / /-> A,B     ---+-----/   |     /   |  /        |    
        /       |     /   |         |    /    | /         |
       /        |    /-->B,C        |   /     +           |
      /     ----+-------------------+--/     /|           |
     /     /    |                   |       / |           |
    /     /---> A,D -------------> C,D     /  |           |
   /                                      /   |           |
  /                                      /-> D,E          |
 /                                                        | 
/------------------------------------------------------> E,A

III. Operations on graphs

   A. Searches

      1. When we discussed trees, we saw that one class of operations that
         was very important was traversal - the systematic visiting of every
         node in the tree.  For graphs, the corresponding operations are
         called searches.  In a search, we systematically visit as many 
         vertices as possible and as many edges as possible, starting from a
         given starting vertex.

      2. There are two basic search orders: depth first search (DFS) and
         breadth-first search (BFS).

         a. In DFS, we start at a vertex and move as far as we can down one
            path from the vertex before exploring the other paths.

        ex: on our sample undirected graph, starting at A, we would visit 
            vertices in the order A,B,C,D,E

         b. In BFS, we explore all of the paths emanating from our starting
            vertex before progressing further.

        ex: on our sample undirected graph, starting at A, we would visit 
            vertices in the order A,B,D,E,C.

         c. Note that either method requires some method of marking vertices
            so that we do not visit them more than once.  (This can be done
            by including a mark field in the node for each vertex, initialized
            to false before the search and set to true when the node is
            visited.  Or, if the order of visitation is important, we can use
            a field that records when the node was visited, initially set to
            0.)

         d. Note that pre-order traversal on a tree is a DFS, and level-order
            traversal on a tree is a BFS.  We will see that DFS algorithms
            make use of a stack or recursion, and BFS algorithms use a queue.

         e. Note that if a graph is not connected (strongly connected), then
            a search will only visit some of the vertices.

      3. Ex: DFS on a digraph represented as an adjacency matrix.  

         TRANSPARENCY

      4. Example: BFS in a digraph represented by adjacency lists

         TRANSPARENCY

   B. Other operations will be considered in CS321
Copyright ©1999 - Russell C. Bjork