Relative and Indexed Files

CS122 Lecture: Introduction to Relative and 
               Indexed File Organizations               last revised 1/20/97

Need: Handout #6; demonstration programs RELDEM.PAS/EXE, INDEXDEM.PAS/EXE

INTRO

   A. Thus far in our study of Pascal, we have been working with a particular
      type of file: the sequential file.  Sequential files are characterized
      by the requirement that their records be processed in order from first
      to last, without deviation.

      1. Such files were first developed in the days when primary input-output
         media were devices whose physical construction made this kind of
         access necessary: cards, magnetic tape etc.

      2. Though the physical characteristics of disk storage would allow for
         other types of files, sequential files are still the most efficient 
         type of file in terms of space utilization and access overhead.

   B. The use of sequential files typically means that we will have to process
      transactions in a batch mode, since the order of the transactions to be
      processed must correspond to the order of records in the file.  
      Sequential files cannot be used for interactive applications, because the
      transactions arrive in a random order and must be processed as they
      arrive.

   C. One way to develop an interactive application is to make use of internal
      storage, by copying the file into an array.  This approach, however,
      is of limited utility, since many interactive applications require files
      that are much too big to fit in main memory.  (Note that the capacity of
      disk storage on a typical computer system may be several hundreds or 
      thousands of times as large as that of main memory.)

   D. Therefore, most interactive computer applications rely on some form of
      direct access file organization, which allows the program to access any
      record in the file at any time.  Three major direct access file
      organizations are possible:

      1. Direct access based on a record's physical disk address.

      2. Relative file organization.

      3. Indexed file organization

I. Direct access based on a record's physical disk address

   A. Physical organization of a disk

      1. One or more platters, each with 1-2 surfaces.

      2. Each surface is divided into concentric tracks (typically several
         hundred).

      3. Each track is divided into fixed-size sectors or variable-size
         records, depending on the way the disk is organized.

      4. Any location on the disk can be specified by giving a triple:

         (surface, track, sector/record)

   B. One approach to achieving direct file access is to require the program
      to specify a record's physical location on the disk each time it wants
      to access it.

      1. This is the most time-efficient approach

      2. But it places the greatest burden on the programmer

      3. It is only feasible in environments where an entire disk, or a region
         of it, is dedicated to a single large file - thus giving the 
         programmer control of exactly where his data is placed.

   C. This approach to direct access is typically only used for programs
      that access very large databases (perhaps extending over multiple disks),
      and for the internal implementation of the next two organizations to
      be discussed.  We will not discuss it further in this course.

II. Relative File Organization

   A. Consider our friend the sequential file again.  Since the records are
      in a fixed order, it is possible to number them 1, 2, 3 ...

   B. With a little extra overhead on the part of the underlying system, it
      would be possible to access a record by specifying its relative number
      within the file.  Such a file is called a RELATIVE FILE.  In a relative
      file, read and write operations specify the specific record (1..size of
      file) to be processed.  Thus, one could:

        read record 3
        write record 17
        read record 12
        ...

   C. Notice that a relative file is very much the disk equivalent of a array.

      1. With a array, one can say

        somevariable := entry[i]

      or

        entry[i] := somevariable

      2. With a relative file, one can say (in essence)

        read (datafile, i somevariable)

      or

        write (datafile, i, somevariable)

        (This is not the actual VAX Pascal syntax.)

      3. There are two important differences, though.  

         a. When an array is declared, its maximum size is specified.  That
            amount of space is permanently allocated for the array (whether or
            not it is needed), and the space cannot grow.  However, a relative
            file can be made to grow at any time by writing data to a 
            previously non-used slot.

            Example: suppose a relative file contains 4 records numbered 1..4

            - if we now say 

                  write something to record #5

              the underlying system will extend the file by one slot.

            - if we were to say instead

                  write something to record #10

              the underlying system would extend the file by 6 slots - i.e.
              slots 5, 6, 7, 8, and 9 must be created in order to be able
              to create slot 10.

            The only limit on this ability to grow dynamically is the total
            space available on the disk (or the user's quota!)
           
         b. With a array, all the slots 1 .. size are presumed to contain 
            actual entries.  With a relative file, it is possible for any slot 
            within the file to be flagged as not containing any data.  A read 
            specifying such a slot would fail to get any data.

            Example: suppose in the above case we wrote a record to slot 10,
                     extending the file by 6 slots.  As a result, slots 5..9
                     would be created, but would not contain any data.  An
                     attempt to read data from them would fail.

            Notes:

            - The space for the slot actually exists in the file, but there is
              no data there.  Thus, this is not a space-saving device.

            - There is a difference between an empty slot and a slot that
              contains, say, all spaces.  The system can detect an empty
              slot by using an extra bit or more of data not accessible to the
              user which flags whether the slot has ever been written into.

   D. Actually, the VAX Pascal syntax for these operations is a bit more complex
      than the above.  We will discuss VAX Pascal features for relative files 
      later.  They will allow the programmer to do the following:

      1. Specify that he wishes to GET a specific record in the file.  (The
         underlying system will place the content of the corresponding
         record in the file variable, or set a flag indicating that the slot
         is vacant if that is the case.)

      2. Specify that he wishes to UPDATE( modify) a specific record in the
         file.

      3. Specify that he wishes to PUT a record to the file.  PUT in this
         context means "add a record that was not there before", whereas UPDATE
         means "modify an existing record". 

      4. Specify that he wishes to DELETE a specified record from the file.  
         The effect is to restore the slot to its original, no-data condition.

      Note: All of these are VAX Pascal extensions.  Standard Pascal has no
            provision for relative files or the next kind of file we will
            discuss: the indexed file.

   E. One important issue we will also need to discuss is how the program
      determines which record number to access.  For example, if a relative
      file contains information about books, how do we know whether a certain
      book (given its title or call number or whatever) is found in - say -
      slot 42 as opposed to some other slot?

      The solution to this problem - which we will discuss in detail later -
      relies on some sort of KEY TO ADDRESS TRANSFORMATION (MAPPING) FUNCTION:

                Logical         Mapping         Physical
                Key      --->   Function  --->  Slot Number

III. Indexed Files

   A. One problem with the relative file organization is that it requires the
      programmer to know the relative position in the file of the record he
      wants.  For example, if the file is a customer master file and the
      programmer knows the customer name, he must somehow translate this into
      a record number.  One technique for doing this, called hashing, will
      be considered later; for now we note that this task is burdensome.

   B. One approach to solving this problem is to construct an index for the
      file, which might record key values and corresponding record numbers:

        Key                             Record #

        AMALGAMATED WIDGET              37
        BILL'S BAKERY                   22
        ...

   C. This is what is done in an INDEXED FILE organization.  The burden of
      maintaining the index, however, is handled by the underlying system,
      not by the programmer.  The program can now say (in essence)

        read(datafile, record whose customer field = 'AMALGAMATED WIDGET', 
             somevariable)

     or

        write(datafile, record whose customer field = 'BILL''S BAKERY',
              somevariable)

   D. Again, the actual VAX Pascal syntax is a bit more complex than this.  (We
      will look at this shortly.)  Note though, that what the underlying 
      system will allow the programmer to do is this:

      1. The programmer may specify the value in a certain key field for the
         record he wants to process.

      2. When the programmer issues a GET command, the underlying system will
         find the correct record and read its value into the file variable
         for him, or set a flag indicating that no such record exists.

      3. The programmer may also use a UPDATE command to modify the contents of
         a record containing the specified key, or a PUT command to add a
         brand new record, or a DELETE command to eliminate a record.

   E. Further, with an indexed file it is possible to have indices for more 
      than one key field.  Thus, a progammer might be able to look up a 
      customer record by customer name or zip code, say.  (Caution must be
      exercised about indiscriminately adding indices, since each requires
      considerable space and processing overhead to maintain.)

   F. Often, the records in an indexed file will be physically ordered on the
      basis of some one key (the primary key).  In this case, we have what
      is known as an indexed-sequential file.  (Often, when a person refers
      to an indexed file, this is what he means.)

   G. We will now proceed to discuss relative and indexed files, in that order.

IV. Relative Files in VAX Pascal

   A. Conceptually, a relative file is a series of slots numbered 1 .. ,
      where the maximum record number is not fixed, but can be increased at
      any time, subject to storage constraints.

   B. VAX Pascal extends the basic Pascal language by providing a number of
      facilities for working with relative files.  These are illustrated by
      a demonstration relative file program (handout).

      1. A relative file is declared exactly like a sequential file, as
         FILE OF sometype.

         Example: declarations at start of RELDEM.PAS

      2. What distinguishes a relative file (or an indexed file for that matter)
         from a sequential file is how it is OPENed.  The relative file open
         statement has the following form:

         OPEN(filevariable, filename, history (old or new), 
              ORGANIZATION := RELATIVE, ACCESS_METHOD := sequential or direct)

         a. The ORGANIZATION clause is what declares the file to be relative.
            Note that a relative file is stored on disk in a somewhat different
            way than a sequential file is - thus, if the file is relative it
            must be declared such every time it is opened.

         b. Note that a relative file can be opened with one of two access
            methods.  If sequential access is specified, then the file is
            treated as if it were a sequential file.  When direct access is
            specified, both direct and sequential access are possible.

         Example: open in main program of RELDEM.PAS

      3. Certain sequential file concepts also apply to relative files:

         a. At any time, the file window is accessible through filevariable^.
            (This may or may not contain valid data, of course; but one can
            always put new data into the window.)

         b. RESET on a relative file will position the file window at the
            first non-empty slot (if there is one.)

         c. GET will advance the file window to the next non-empty slot
            (skipping intervening empty slots) - if there is one.

         d. EOF is true if the window is positioned past the last non-empty
            slot.  In this case, there is no valid data in the window.

         Example: procedure PrintAll in RELDEM.PAS (demonstrate)

         e. REWRITE can be used with a relative file - but since the effect
            is to delete all existing records and truncate the file to zero
            length, this is seldom useful.

      4. Certain other procedures are unique to direct access files.

         a. FIND(filevariable, slotnumber) positions the file window at a
            designated slot in the file.  Whether the result is valid
            data in the file window depends, of course, on whether or not
            that particular slot contains valid data.

         b. UFB(filevariable) is true if the window does NOT contain valid
            data.  In this sense, it is like EOF - except:

            i. EOF is true only when the window is positioned beyond the LAST
               valid slot in the file.

           ii. UFB can become true if the window is positioned on an empty
               slot in the middle of the file.

          iii. Whenever EOF is true, UFB is also true; but the reverse does not
               hold.

               Example: procedure PrintInfoOnStudentBySlot - note use of FIND
                        and UFB.

          c. UPDATE can be used to change the content of an existing slot
             found by FIND or GET or RESET - provided the operation was 
             successful (UFB was false after the find.)

             Example: procedure ChangeGPA in RELDEM.PAS (demonstrate)

          d. LOCATE and PUT can be used together to write new data into a
             formerly empty slot.

             i. LOCATE positions the file window on the desired slot.
                (LOCATE must be used instead of FIND, because there is nothing
                there to find yet!)

            ii. PUT transfers data into the slot.

             Example: Procedure AddStudent in RELDEM.PAS (demonstrate)

           iii. Note well the distinction between FIND/UPDATE and LOCATE/PUT:

                - FIND/UPDATE are used to change part or all of the data in
                  an already active slot.

                - LOCATE/PUT are used to put new data into a formerly empty
                  slot.

         e. DELETE removes the data from a slot, restoring it to its original,
            empty condition.  (The file must previously have been positioned
            on a non-empty slot by FIND or GET or RESET.)

            Example: procedure DeleteStudent in RELDEM.PAS (demonstrate)

      5. As an aside, we note that certain of these procedures can also be
         used with certain kinds of sequential files on the VAX - i.e. in
         some cases a sequential file can be treated in part as if it were
         a relative file by specifying

                ORGANIZATION := SEQUENTIAL, ACCESS_METHOD := DIRECT,
                                RECORD_TYPE := FIXED
        
         in the OPEN.  However, the capabilities are limited - certain
         procedures (e.g. DELETE) cannot be used.   Direct access can NEVER
         be used with TEXT files.

*** OMIT THE NEXT SECTION IF TIME IS SHORT:

V. The Key Problem for Relative Files and one Solution: Hashing

   A. In our discussion of relative files, we have seen that each time we
      access the file we must specify the slot number of the record we wish
      to work with.  The problem, in practice, is how to determine the slot
      number based on the logic of the information we are working with.

      Example: Suppose we wish to maintain a file of students enrolled at
               Gordon college.  An obvious key for this file is the student
               ID - a 7 digit number.  We could, theoretically, use
               the ID as a slot number for the relative file; but we cannot
               realistically do so.  

               Why? A seven digit ID can assume 10,000,000 different values;
               thus, we would need a file with 10 million slots.  Since the
               enrollment of Gordon hovers around the 1200 mark, this amounts
               to wasting about 99.98% of the file space!

   B. As we said in connection with indexed files, one solution is to keep
      a table, or index, that maps keys to locations in the file.  This, as
      we shall see, incurs considerable processing time and space overhead.

   C. Another solution that is often used is some form of HASHING.  We will
      discuss one illustration of a method that has many forms.  (Further
      discussion of hashing and related issues is left for a later course.)

   D. The basic idea is this: we devise a key to address transformation
      algorithm that converts a LOGICAL key (such as a student ID) to a
      PHYSICAL "home" slot number for the data associated with that key.

                               ________________
                              | Key to address |
                Logical  ---->| transformation | ----> Physical slot
                key           | algorithm      |       number
                               ----------------

      1. The nature of this transformation is such that the number of
         POSSIBLE logical keys is MUCH greater than the number of possible
         resulting slot numbers.

         Example: Let's say we decide to use a relative file with 2000 slots
                  for a Gordon student file (to allow room for growth etc.)
                  The transformation Gordon ID to Slot number maps 10,000,000
                  possible logical keys into only 2000 possible physical keys.

      2. As a result, any given algorithm has the possibility of mapping two
         different logical keys to the same physical slot.  Such keys are said
         to be synonyms, and the resultant condition is said to be a 
         collission.  Since only one record can be stored in any given slot in
         the table, we will have to devise some strategy for handling these
         collissions.

         1. One possible key-to-address transformation algorithm for
            our Gordon student ID is:

                  Physical slot := (ID mod 2000) + 1

            Note that ID mod 2000 will lie in the range 0..1999;  therefore 
            ID mod 2000 + 1 will lie in the range 1..2000 as desired.

            Example: slot number for ID 8400011 is 8400011 mod 2000 = 11 + 1 =
                     12

            Problem: calculate the slot number for your ID

         2. This is an example of the DIVISION-REMAINDER METHOD OF HASHING, 
            and is often used in practice.  (We shall see shortly, though, 
            that the choice of 2000 as a divisor is not a good one - there are 
            better ways to choose a divisor.)

         3. The problem that arises, however, that there are some 5000 different
            keys that will map to any one slot.  

            Example: ID's 8400011, 8402011, 8404011, 8500011, 8402011 and 
                     4995 others will all map to slot 12!

   E. How do we deal with the problem of synonym keys, leading to collissions?

      1. We use hashing to determine the HOME slot for a given record - where
         we would like to put it.

         Example: the home slot for 8400011 is slot 12

      2. If, when we go to insert a record in the file, we find that its home
         slot is already occupied by another record, we use one of two methods
         to establish an alternate slot.

         a. LINEAR OPEN ADDRESSING

         b. CHAINING

         Of these two, chaining involves some concepts we have not studied yet,
         so we consider only linear open addressing.  We may come back to
         chaining later in connection with our study of linked lists.

   F. Linear Open Addressing works like this:

      1. To insert a record:

         a. Compute the address of the home slot, using the hash function.

         b. If that slot is vacant, put the record there.

         c. Otherwise, begin looking at adjacent slots (in increasing slot
            number order) until a vacant slot is found.  Put the record in the
            first vacant slot.

            - If you reach the last slot in the file, then start searching with
              slot 1 (i.e. treat the slot numbers as if they wrap around modulo
              number of slots).

            - If you come full circle back to the home slot, then give up; the
              file is full.  (In this case, one could extend the file; but a new
              hash function would also be needed to take advantage of the
              added slots; this would mean repositioning every record already
              in the file as well.)

            Example: File with 5 slots (initially empty); hash function =
                     key mod 5 + 1

                     Insert 17 AARDVARK: goes into slot 3
                     Insert 23 BUFFALO: goes into slot 4
                     Insert 12 CAT: should go into slot 3, but ends up in slot 5
                     Insert 44 DOG: should go into slot 5, but ends up in slot 1

      2. To locate a record:

         a. Compute the address of the home slot, using the hash function.

         b. If that slot contains the record, we have succeeded (one must
            actually check the data stored to be sure the key matches.)  If
            the home slot is vacant, then the record is not in the file.

         c. If the home slot contains a record - but not the right one - then
            begin searching successive slots (as on insert), until either

            - The desired record is found.
            - A vacant slot is found (in which case, we conclude the record is
              not in the file, since otherwise insert would have found this
              slot and put the record there.)
            - You come full circle to the home slot (in which case conclude
              the record is not there because you have tried every one!)

            Example: Trace lookup of each of 17, 23, 12, 44 in turn
                     Trace lookup of records with key = 31, 30

      3. To delete a record:

         a. First locate the record as above.

         b. Now, can we just simply use the DELETE operation to vacate the
            slot?  No.  Why?  Because then a later lookup on another record
            may fail.

            Example: suppose we deleted 17 AARDVARK by vacating slot 3.
                     What would happen when we try to lookup 12?

         c. Therefore, we instead must replace the record with a dummy record
            that fills the slot, but will never match any key we are looking
            for.  (E.G. if our key is numeric, we might store the letter D
            in the key field of the record.)

            - On insert, we treat such a slot as if it were, in fact, vacant,
              and put a new record there if we need to.

            - On lookup, we treat such a slot as occupied, since it once was.

   G. Comments on efficiency of linear open addressing

      1. At first glance, it appears that hashing with linear open addressing
         could be terribly inefficient: it could degenerate to searching the
         entire file.

      2. On the other hand, if the record we want is, in fact, in its home
         slot or very near to it, then this method works quite well.

      3. The success of this method depends on two things:

         a. Allocating enough space in the file so that there are sufficient
            vacant slots to break up long searches.  (A good rule of thumb
            is to never allow more than 80% of the slots to actually be used -
            e.g. if we wish to store records on 1200 students, then use a file
            with at least 1500 slots, plus an appropriate hash function.)

         b. Choose a hash function that disperses the keys uniformly over the
            slots.

            i. An example of a very bad hash function: suppose we chose to
               hash student ID's over 2000 slots by using the first two
               digits of the ID concatenated with the last digit, and prepending
               a 1 if the ID is odd.

               - e.g. 8400001 hashes to 1841, 8400002 to 0842, 8500123 to 1843

               Observe that the only home slots this would ever generate for
               currently enrolled students are slots like:

                830, 832, 834, 836, 838, 840, 842, 844, 846, 848,
                850, 852, 854, 856, 858, 860, 862, 864, 866, 868,
                1831, 1833, 1835, 1837, 1839, 1841, 1843, 1845, 1847, 
                1851, 1853, 1855, 1857, 1859, 1861, 1863, 1865, 1867, 

           ii. Our example function (using a divisor of 2000) is not good,
               because it does not depend at all on the first three digits of
               the ID and does not disperse the middle digit well.

          iii. There is much to be said about this (in a later course).  For
               now, we note that one method which often works well is to use
               a prime divisor.  This, in turn, implies that the actual file
               size should be that prime number nearest the desired size.

               Example: Since 1999 is prime, we could use a file size of
                        1999 and division by 1999 for hashing.
                                 
VI. More About Indexed File 

   A. We have noted, the indexed file is, like the relative file, a
      direct-access file organization, whereby the user can directly specify
      which record he wishes to read or modify.  

   B. An indexed file is divided into two major regions: a data area, 
      containing records each of which contains one or more key fields, and an 
      index area containing one index for each key field.  In an indexed file, 
      one of the key fields is designated as the primary key; and most systems 
      require that the primary key be unique - that is, no two records in an 
      indexed file may contain the same value in the primary key field.  
      (Secondary keys, on the other hand, may or may not be unique at the file 
      designer's option.)

   C. Further, in most indexed file implementations, within the data area the 
      records are stored in ascending order of primary key.  Thus, the file is 
      physically a sequential file onto which one or more index structures have 
      been superimposed.  Hence, indexed files are often called indexed 
      sequential files.

   D. Let us consider the basic structure of the indexes:

      1. The primary index.

         a. The data area for the file is broken down into divisions which we 
            shall call "buckets" - each of which can contain some number of 
            records.  (Actually, if the record length is variable then the 
            number of records in a bucket may also vary depending on the size 
            of each.) A bucket is usually the unit of physical transfer of data 
            between disk and main memory.  
            
         b. Because the file is ordered on the primary key, all the records 
            within a given bucket lie within some range of primary key values, 
            and all keys in any bucket are greater than those in logically 
            preceeding buckets and less than those in succeeeding buckets.  
            
            Example: suppose the primary key of an indexed file were an
                     animal's name.  If all the data would fit into 4 buckets,
                     the file structure might look like this:

            Bucket 1: Aardvark .. Fox
            Bucket 2: Gopher .. Llama
            Bucket 3: Mouse .. Raccoon
            Bucket 4: Snake .. Zebra

         c. Within the primary index it is only necessary to store ONE KEY PER 
            BUCKET. 

            Example: for the above, it would be sufficient for indexing 
                     purposes to have only four entries in our primary index - 
                     namely, the value of the LAST KEY in each bucket - e.g.

            Fox         1
            Llama       2
            Raccoon     3
            Zebra       4

         d. To find any record, we would go to the index to determine which 
            bucket (if any) it must lie in; then we would go to the appropriate 
            bucket (which would be fetched from memory in one disk access) to 
            see if we can find the record we want.   (This could be done using
            a technique like binary search.)

      2. The secondary indexes (if any)

         a. In the case of the secondary indexes, life is more complicated.  
            Normally, the order of secondary keys does not correspond in any 
            way to the distribution of records among buckets.  Thus, the 
            secondary index must contain ONE ENTRY PER KEY VALUE - ie. roughly 
            ONE ENTRY PER RECORD.  (If duplicate values are allowed in a 
            secondary key field, then one index entry can be used to point to 
            more than one record containing that value.)  
            
         b. Thus, secondary indexes are always bigger than primary indexes, and 
            should be established only when really needed.

   E. The information we have discussed is basically what one needs to know
      in order to USE indexed files.  The implementation of the indices is not, 
      however, a trivial matter.

      1. Fortunately, in most cases these mechanisms are provided by the
         operating system or programming language.

         Example: The VAX/VMS operating system includes a component called
                  Record-Management Services (RMS) that provides support for
                  sequential, relative, and indexed files.  VAX Pascal (and
                  other languages) make use of this to support these file
                  organizations.

      2. Today, most indexed-file implementations make use of a data structure
         known as a B-Tree.  We cover this in CS321 - its a bit much to talk
         about now!  (However, one can USE indexed files without understanding
         B-Trees in much the same way one can drive a car without knowing
         automotive mechanics!)

   F. Indexed files in VAX Pascal

      1. As was the case with relative files, VAX Pascal incorporates language
         extensions that allow one to process indexed files.  Many of these
         are similar to the extensions for relative files.

      2. To work with an indexed file, certain declarations are needed:

         a. The key fields must be specified as part of the record declaration 
            for a file.
           
            i. This is done using the attribute KEY as part of the type 
               declaration for the field.
            
           ii. Keys are assigned numbers 0..254; with key 0 always being the 
               primary key field.
            
          iii. All programs that access the file must have the same number of 
               keys and the keys must be positioned in the same place in the 
               record - i.e. if key 1 occupies bytes 10..15 of the record 
               according to the declarations in one program then key 1 must be 
               declared to occupy bytes 10..15 in all other record declarations 
               for that file in any program accessing it.

               Example: Declaration of type StudentRec in INDEXDEM.PAS

               - This establishes that records in studentfile contain three 
                 keys: a primary key in bytes 1..7; a secondary key (key 1) in 
                 bytes 8..27; and another secondary key (key 2) in bytes 44..47.

         b. The file's ORGANIZATION must be declared to be INDEXED and its
            ACCESS_METHOD to be KEYED when it is OPENed:

            OPEN(filevariable, filename, history (old or new),
                 ORGANIZATION := INDEXED, ACCESS_METHOD := KEYED);

            Example: open in INDEXDEM.PAS
            
      3. As with sequential and relative files, access to a file component
         is through the file window.  The function UFB may be used to test
         whether the window contains valid data.
            
      4. The following procedures are available for positioning the window:

         a. The procedure FINDK for positioning the file to a record matching
            some value for a specified key.  FINDK requires 3-4 parameters: 
            - The file variable.
            - An integer designating which key is to be used.
            - The key value. 
            - (Optional) one of the following to denote the criterion to be 
              used in selecting a record:

              EQL - an exact match must be found. (Default if this parameter is
              not specified.)

              NXT - FINDK will find the first record whose value in the 
              specified key field exceeds the value specified in the call to 
              FINDK.  (The manual says GTR: but this has been changed in the
              most recent release.)

              NXTEQL - FINDK will attempt to find an exact match; but if there
              is none it will find the first record whose value in the specified
              key field exceeds the value specified in the call to FINDK.
              (The manual says GEQ: but this has been changed in the most 
              recent release.)

            If FINDK succeeds in finding a matching record, then that record is 
            made available in the file's buffer variable and UFB(filevar) 
            becomes false.  However, in any of the above cases, if FINDK cannot 
            find a record as specified it sets UFB true.

            Examples: findk calls in PrintStudentInfoByID, 
                      PrintStudentInfoByName, PrintByGPA in INDEXDEM.PAS
                      (demonstrate)

         b. The procedure GET for sequential processing of the file based on 
            the order of a particular key:
            - FINDK positions the file and establishes the key specified as the
              key-of-reference.
            - Subsequent calls to GET access the records in order on that key.

            Example: procedure PrintByGPA in INDEXDEM.PAS (demonstrate)
            
         c. The procedure RESETK may be used to position the file window at the 
            first record in the file based on the sequence of one of the keys.  
            Subsequent calls to get will follow that sequence.  It takes two 
            parameters: a file variable and a key number.

            Example: procedure PrintAlphabetically in INDEXDEM.PAS (demonstrate)

            Note: RESETK behaves like FINDK with a value = to the minimum 
            possible value for the key field and a comparison operator GEQ.
            
      5. The following procedures may be used to store data in the file:

         a. The procedure UPDATE may be used to replace the component just 
            found by FINDK, GET, or RESETK.  The process is this:

            - Use one of the above to access a record (typically FINDK)
            - Change one or more fields of the file variable.
            - Call UPDATE(filevar).

            Example: procedure ChangeGPA in INDEXDEM.PAS (demonstrate)

            Note: UPDATE modifies any secondary indexes as necessary; but it 
            may not be used to change a primary key.  To change a primary key, 
            one must delete the existing record and use put to add a new one.

         b. The procedure PUT for inserting new records in the file:
            - If an indexed file is opened for ACCESS := SEQUENTIAL, then PUT 
              must be used to insert records in ascending order of the primary 
              key value.  Each call to put will insert the record in the buffer 
              variable into the file, and will make entries in each index 
              associated with the file to correspond to the value(s) in the key 
              field(s) of that record. 
            - If an indexed file is opened for ACCESS := KEYED, then PUT may be 
              called to insert a new record at any time; the record is 
              automatically put into the proper position in the file according 
              to its primary key and all indexes are updated.  (There is no
              analogue to locate for relative files.)
            - PUT may not be used to insert a record having the same primary 
              key value as one already in the file; for that, update must be 
              used.

            Example: procedure AddStudent in INDEXDEM.PAS.  Note use of findk
                     first to avoid duplicating the primary key.  (demonstrate)
            Note: as with relative files, UPDATE is used to change an existing
                  record, PUT to create a brand new one.

         c. The procedure DELETE may be used to delete the component just found 
            with FINDK or GET or RESETK.

            Example: procedure DeleteStudent in INDEXDEM.PAS (demonstrate)

VII. Indexed files compared to relative files:

   A. An indexed file is generally bigger than a corresponding relative file
      holding the same data - though this may not be true if the relative file
      is quite sparse.

   B. Accessing a given data item in an indexed file generally involves two or
      more disk accesses - one or more to look up the record in the index, and
      one to get the record.  In a relative file with a good hashing scheme
      most records can be retrieved with one disk access.

   C. The major advantages of an indexed file are two:

      a. The ability to use more than one key to access records.
      b. The ability to access records sequentially based on key values.
Copyright ©1999 - Russell C. Bjork