CS122 Lecture: Introduction to Relative and
Indexed File Organizations last revised 1/20/97
Need: Handout #6; demonstration programs RELDEM.PAS/EXE, INDEXDEM.PAS/EXE
INTRO
A. Thus far in our study of Pascal, we have been working with a particular
type of file: the sequential file. Sequential files are characterized
by the requirement that their records be processed in order from first
to last, without deviation.
1. Such files were first developed in the days when primary input-output
media were devices whose physical construction made this kind of
access necessary: cards, magnetic tape etc.
2. Though the physical characteristics of disk storage would allow for
other types of files, sequential files are still the most efficient
type of file in terms of space utilization and access overhead.
B. The use of sequential files typically means that we will have to process
transactions in a batch mode, since the order of the transactions to be
processed must correspond to the order of records in the file.
Sequential files cannot be used for interactive applications, because the
transactions arrive in a random order and must be processed as they
arrive.
C. One way to develop an interactive application is to make use of internal
storage, by copying the file into an array. This approach, however,
is of limited utility, since many interactive applications require files
that are much too big to fit in main memory. (Note that the capacity of
disk storage on a typical computer system may be several hundreds or
thousands of times as large as that of main memory.)
D. Therefore, most interactive computer applications rely on some form of
direct access file organization, which allows the program to access any
record in the file at any time. Three major direct access file
organizations are possible:
1. Direct access based on a record's physical disk address.
2. Relative file organization.
3. Indexed file organization
I. Direct access based on a record's physical disk address
A. Physical organization of a disk
1. One or more platters, each with 1-2 surfaces.
2. Each surface is divided into concentric tracks (typically several
hundred).
3. Each track is divided into fixed-size sectors or variable-size
records, depending on the way the disk is organized.
4. Any location on the disk can be specified by giving a triple:
(surface, track, sector/record)
B. One approach to achieving direct file access is to require the program
to specify a record's physical location on the disk each time it wants
to access it.
1. This is the most time-efficient approach
2. But it places the greatest burden on the programmer
3. It is only feasible in environments where an entire disk, or a region
of it, is dedicated to a single large file - thus giving the
programmer control of exactly where his data is placed.
C. This approach to direct access is typically only used for programs
that access very large databases (perhaps extending over multiple disks),
and for the internal implementation of the next two organizations to
be discussed. We will not discuss it further in this course.
II. Relative File Organization
A. Consider our friend the sequential file again. Since the records are
in a fixed order, it is possible to number them 1, 2, 3 ...
B. With a little extra overhead on the part of the underlying system, it
would be possible to access a record by specifying its relative number
within the file. Such a file is called a RELATIVE FILE. In a relative
file, read and write operations specify the specific record (1..size of
file) to be processed. Thus, one could:
read record 3
write record 17
read record 12
...
C. Notice that a relative file is very much the disk equivalent of a array.
1. With a array, one can say
somevariable := entry[i]
or
entry[i] := somevariable
2. With a relative file, one can say (in essence)
read (datafile, i somevariable)
or
write (datafile, i, somevariable)
(This is not the actual VAX Pascal syntax.)
3. There are two important differences, though.
a. When an array is declared, its maximum size is specified. That
amount of space is permanently allocated for the array (whether or
not it is needed), and the space cannot grow. However, a relative
file can be made to grow at any time by writing data to a
previously non-used slot.
Example: suppose a relative file contains 4 records numbered 1..4
- if we now say
write something to record #5
the underlying system will extend the file by one slot.
- if we were to say instead
write something to record #10
the underlying system would extend the file by 6 slots - i.e.
slots 5, 6, 7, 8, and 9 must be created in order to be able
to create slot 10.
The only limit on this ability to grow dynamically is the total
space available on the disk (or the user's quota!)
b. With a array, all the slots 1 .. size are presumed to contain
actual entries. With a relative file, it is possible for any slot
within the file to be flagged as not containing any data. A read
specifying such a slot would fail to get any data.
Example: suppose in the above case we wrote a record to slot 10,
extending the file by 6 slots. As a result, slots 5..9
would be created, but would not contain any data. An
attempt to read data from them would fail.
Notes:
- The space for the slot actually exists in the file, but there is
no data there. Thus, this is not a space-saving device.
- There is a difference between an empty slot and a slot that
contains, say, all spaces. The system can detect an empty
slot by using an extra bit or more of data not accessible to the
user which flags whether the slot has ever been written into.
D. Actually, the VAX Pascal syntax for these operations is a bit more complex
than the above. We will discuss VAX Pascal features for relative files
later. They will allow the programmer to do the following:
1. Specify that he wishes to GET a specific record in the file. (The
underlying system will place the content of the corresponding
record in the file variable, or set a flag indicating that the slot
is vacant if that is the case.)
2. Specify that he wishes to UPDATE( modify) a specific record in the
file.
3. Specify that he wishes to PUT a record to the file. PUT in this
context means "add a record that was not there before", whereas UPDATE
means "modify an existing record".
4. Specify that he wishes to DELETE a specified record from the file.
The effect is to restore the slot to its original, no-data condition.
Note: All of these are VAX Pascal extensions. Standard Pascal has no
provision for relative files or the next kind of file we will
discuss: the indexed file.
E. One important issue we will also need to discuss is how the program
determines which record number to access. For example, if a relative
file contains information about books, how do we know whether a certain
book (given its title or call number or whatever) is found in - say -
slot 42 as opposed to some other slot?
The solution to this problem - which we will discuss in detail later -
relies on some sort of KEY TO ADDRESS TRANSFORMATION (MAPPING) FUNCTION:
Logical Mapping Physical
Key ---> Function ---> Slot Number
III. Indexed Files
A. One problem with the relative file organization is that it requires the
programmer to know the relative position in the file of the record he
wants. For example, if the file is a customer master file and the
programmer knows the customer name, he must somehow translate this into
a record number. One technique for doing this, called hashing, will
be considered later; for now we note that this task is burdensome.
B. One approach to solving this problem is to construct an index for the
file, which might record key values and corresponding record numbers:
Key Record #
AMALGAMATED WIDGET 37
BILL'S BAKERY 22
...
C. This is what is done in an INDEXED FILE organization. The burden of
maintaining the index, however, is handled by the underlying system,
not by the programmer. The program can now say (in essence)
read(datafile, record whose customer field = 'AMALGAMATED WIDGET',
somevariable)
or
write(datafile, record whose customer field = 'BILL''S BAKERY',
somevariable)
D. Again, the actual VAX Pascal syntax is a bit more complex than this. (We
will look at this shortly.) Note though, that what the underlying
system will allow the programmer to do is this:
1. The programmer may specify the value in a certain key field for the
record he wants to process.
2. When the programmer issues a GET command, the underlying system will
find the correct record and read its value into the file variable
for him, or set a flag indicating that no such record exists.
3. The programmer may also use a UPDATE command to modify the contents of
a record containing the specified key, or a PUT command to add a
brand new record, or a DELETE command to eliminate a record.
E. Further, with an indexed file it is possible to have indices for more
than one key field. Thus, a progammer might be able to look up a
customer record by customer name or zip code, say. (Caution must be
exercised about indiscriminately adding indices, since each requires
considerable space and processing overhead to maintain.)
F. Often, the records in an indexed file will be physically ordered on the
basis of some one key (the primary key). In this case, we have what
is known as an indexed-sequential file. (Often, when a person refers
to an indexed file, this is what he means.)
G. We will now proceed to discuss relative and indexed files, in that order.
IV. Relative Files in VAX Pascal
A. Conceptually, a relative file is a series of slots numbered 1 .. ,
where the maximum record number is not fixed, but can be increased at
any time, subject to storage constraints.
B. VAX Pascal extends the basic Pascal language by providing a number of
facilities for working with relative files. These are illustrated by
a demonstration relative file program (handout).
1. A relative file is declared exactly like a sequential file, as
FILE OF sometype.
Example: declarations at start of RELDEM.PAS
2. What distinguishes a relative file (or an indexed file for that matter)
from a sequential file is how it is OPENed. The relative file open
statement has the following form:
OPEN(filevariable, filename, history (old or new),
ORGANIZATION := RELATIVE, ACCESS_METHOD := sequential or direct)
a. The ORGANIZATION clause is what declares the file to be relative.
Note that a relative file is stored on disk in a somewhat different
way than a sequential file is - thus, if the file is relative it
must be declared such every time it is opened.
b. Note that a relative file can be opened with one of two access
methods. If sequential access is specified, then the file is
treated as if it were a sequential file. When direct access is
specified, both direct and sequential access are possible.
Example: open in main program of RELDEM.PAS
3. Certain sequential file concepts also apply to relative files:
a. At any time, the file window is accessible through filevariable^.
(This may or may not contain valid data, of course; but one can
always put new data into the window.)
b. RESET on a relative file will position the file window at the
first non-empty slot (if there is one.)
c. GET will advance the file window to the next non-empty slot
(skipping intervening empty slots) - if there is one.
d. EOF is true if the window is positioned past the last non-empty
slot. In this case, there is no valid data in the window.
Example: procedure PrintAll in RELDEM.PAS (demonstrate)
e. REWRITE can be used with a relative file - but since the effect
is to delete all existing records and truncate the file to zero
length, this is seldom useful.
4. Certain other procedures are unique to direct access files.
a. FIND(filevariable, slotnumber) positions the file window at a
designated slot in the file. Whether the result is valid
data in the file window depends, of course, on whether or not
that particular slot contains valid data.
b. UFB(filevariable) is true if the window does NOT contain valid
data. In this sense, it is like EOF - except:
i. EOF is true only when the window is positioned beyond the LAST
valid slot in the file.
ii. UFB can become true if the window is positioned on an empty
slot in the middle of the file.
iii. Whenever EOF is true, UFB is also true; but the reverse does not
hold.
Example: procedure PrintInfoOnStudentBySlot - note use of FIND
and UFB.
c. UPDATE can be used to change the content of an existing slot
found by FIND or GET or RESET - provided the operation was
successful (UFB was false after the find.)
Example: procedure ChangeGPA in RELDEM.PAS (demonstrate)
d. LOCATE and PUT can be used together to write new data into a
formerly empty slot.
i. LOCATE positions the file window on the desired slot.
(LOCATE must be used instead of FIND, because there is nothing
there to find yet!)
ii. PUT transfers data into the slot.
Example: Procedure AddStudent in RELDEM.PAS (demonstrate)
iii. Note well the distinction between FIND/UPDATE and LOCATE/PUT:
- FIND/UPDATE are used to change part or all of the data in
an already active slot.
- LOCATE/PUT are used to put new data into a formerly empty
slot.
e. DELETE removes the data from a slot, restoring it to its original,
empty condition. (The file must previously have been positioned
on a non-empty slot by FIND or GET or RESET.)
Example: procedure DeleteStudent in RELDEM.PAS (demonstrate)
5. As an aside, we note that certain of these procedures can also be
used with certain kinds of sequential files on the VAX - i.e. in
some cases a sequential file can be treated in part as if it were
a relative file by specifying
ORGANIZATION := SEQUENTIAL, ACCESS_METHOD := DIRECT,
RECORD_TYPE := FIXED
in the OPEN. However, the capabilities are limited - certain
procedures (e.g. DELETE) cannot be used. Direct access can NEVER
be used with TEXT files.
*** OMIT THE NEXT SECTION IF TIME IS SHORT:
V. The Key Problem for Relative Files and one Solution: Hashing
A. In our discussion of relative files, we have seen that each time we
access the file we must specify the slot number of the record we wish
to work with. The problem, in practice, is how to determine the slot
number based on the logic of the information we are working with.
Example: Suppose we wish to maintain a file of students enrolled at
Gordon college. An obvious key for this file is the student
ID - a 7 digit number. We could, theoretically, use
the ID as a slot number for the relative file; but we cannot
realistically do so.
Why? A seven digit ID can assume 10,000,000 different values;
thus, we would need a file with 10 million slots. Since the
enrollment of Gordon hovers around the 1200 mark, this amounts
to wasting about 99.98% of the file space!
B. As we said in connection with indexed files, one solution is to keep
a table, or index, that maps keys to locations in the file. This, as
we shall see, incurs considerable processing time and space overhead.
C. Another solution that is often used is some form of HASHING. We will
discuss one illustration of a method that has many forms. (Further
discussion of hashing and related issues is left for a later course.)
D. The basic idea is this: we devise a key to address transformation
algorithm that converts a LOGICAL key (such as a student ID) to a
PHYSICAL "home" slot number for the data associated with that key.
________________
| Key to address |
Logical ---->| transformation | ----> Physical slot
key | algorithm | number
----------------
1. The nature of this transformation is such that the number of
POSSIBLE logical keys is MUCH greater than the number of possible
resulting slot numbers.
Example: Let's say we decide to use a relative file with 2000 slots
for a Gordon student file (to allow room for growth etc.)
The transformation Gordon ID to Slot number maps 10,000,000
possible logical keys into only 2000 possible physical keys.
2. As a result, any given algorithm has the possibility of mapping two
different logical keys to the same physical slot. Such keys are said
to be synonyms, and the resultant condition is said to be a
collission. Since only one record can be stored in any given slot in
the table, we will have to devise some strategy for handling these
collissions.
1. One possible key-to-address transformation algorithm for
our Gordon student ID is:
Physical slot := (ID mod 2000) + 1
Note that ID mod 2000 will lie in the range 0..1999; therefore
ID mod 2000 + 1 will lie in the range 1..2000 as desired.
Example: slot number for ID 8400011 is 8400011 mod 2000 = 11 + 1 =
12
Problem: calculate the slot number for your ID
2. This is an example of the DIVISION-REMAINDER METHOD OF HASHING,
and is often used in practice. (We shall see shortly, though,
that the choice of 2000 as a divisor is not a good one - there are
better ways to choose a divisor.)
3. The problem that arises, however, that there are some 5000 different
keys that will map to any one slot.
Example: ID's 8400011, 8402011, 8404011, 8500011, 8402011 and
4995 others will all map to slot 12!
E. How do we deal with the problem of synonym keys, leading to collissions?
1. We use hashing to determine the HOME slot for a given record - where
we would like to put it.
Example: the home slot for 8400011 is slot 12
2. If, when we go to insert a record in the file, we find that its home
slot is already occupied by another record, we use one of two methods
to establish an alternate slot.
a. LINEAR OPEN ADDRESSING
b. CHAINING
Of these two, chaining involves some concepts we have not studied yet,
so we consider only linear open addressing. We may come back to
chaining later in connection with our study of linked lists.
F. Linear Open Addressing works like this:
1. To insert a record:
a. Compute the address of the home slot, using the hash function.
b. If that slot is vacant, put the record there.
c. Otherwise, begin looking at adjacent slots (in increasing slot
number order) until a vacant slot is found. Put the record in the
first vacant slot.
- If you reach the last slot in the file, then start searching with
slot 1 (i.e. treat the slot numbers as if they wrap around modulo
number of slots).
- If you come full circle back to the home slot, then give up; the
file is full. (In this case, one could extend the file; but a new
hash function would also be needed to take advantage of the
added slots; this would mean repositioning every record already
in the file as well.)
Example: File with 5 slots (initially empty); hash function =
key mod 5 + 1
Insert 17 AARDVARK: goes into slot 3
Insert 23 BUFFALO: goes into slot 4
Insert 12 CAT: should go into slot 3, but ends up in slot 5
Insert 44 DOG: should go into slot 5, but ends up in slot 1
2. To locate a record:
a. Compute the address of the home slot, using the hash function.
b. If that slot contains the record, we have succeeded (one must
actually check the data stored to be sure the key matches.) If
the home slot is vacant, then the record is not in the file.
c. If the home slot contains a record - but not the right one - then
begin searching successive slots (as on insert), until either
- The desired record is found.
- A vacant slot is found (in which case, we conclude the record is
not in the file, since otherwise insert would have found this
slot and put the record there.)
- You come full circle to the home slot (in which case conclude
the record is not there because you have tried every one!)
Example: Trace lookup of each of 17, 23, 12, 44 in turn
Trace lookup of records with key = 31, 30
3. To delete a record:
a. First locate the record as above.
b. Now, can we just simply use the DELETE operation to vacate the
slot? No. Why? Because then a later lookup on another record
may fail.
Example: suppose we deleted 17 AARDVARK by vacating slot 3.
What would happen when we try to lookup 12?
c. Therefore, we instead must replace the record with a dummy record
that fills the slot, but will never match any key we are looking
for. (E.G. if our key is numeric, we might store the letter D
in the key field of the record.)
- On insert, we treat such a slot as if it were, in fact, vacant,
and put a new record there if we need to.
- On lookup, we treat such a slot as occupied, since it once was.
G. Comments on efficiency of linear open addressing
1. At first glance, it appears that hashing with linear open addressing
could be terribly inefficient: it could degenerate to searching the
entire file.
2. On the other hand, if the record we want is, in fact, in its home
slot or very near to it, then this method works quite well.
3. The success of this method depends on two things:
a. Allocating enough space in the file so that there are sufficient
vacant slots to break up long searches. (A good rule of thumb
is to never allow more than 80% of the slots to actually be used -
e.g. if we wish to store records on 1200 students, then use a file
with at least 1500 slots, plus an appropriate hash function.)
b. Choose a hash function that disperses the keys uniformly over the
slots.
i. An example of a very bad hash function: suppose we chose to
hash student ID's over 2000 slots by using the first two
digits of the ID concatenated with the last digit, and prepending
a 1 if the ID is odd.
- e.g. 8400001 hashes to 1841, 8400002 to 0842, 8500123 to 1843
Observe that the only home slots this would ever generate for
currently enrolled students are slots like:
830, 832, 834, 836, 838, 840, 842, 844, 846, 848,
850, 852, 854, 856, 858, 860, 862, 864, 866, 868,
1831, 1833, 1835, 1837, 1839, 1841, 1843, 1845, 1847,
1851, 1853, 1855, 1857, 1859, 1861, 1863, 1865, 1867,
ii. Our example function (using a divisor of 2000) is not good,
because it does not depend at all on the first three digits of
the ID and does not disperse the middle digit well.
iii. There is much to be said about this (in a later course). For
now, we note that one method which often works well is to use
a prime divisor. This, in turn, implies that the actual file
size should be that prime number nearest the desired size.
Example: Since 1999 is prime, we could use a file size of
1999 and division by 1999 for hashing.
VI. More About Indexed File
A. We have noted, the indexed file is, like the relative file, a
direct-access file organization, whereby the user can directly specify
which record he wishes to read or modify.
B. An indexed file is divided into two major regions: a data area,
containing records each of which contains one or more key fields, and an
index area containing one index for each key field. In an indexed file,
one of the key fields is designated as the primary key; and most systems
require that the primary key be unique - that is, no two records in an
indexed file may contain the same value in the primary key field.
(Secondary keys, on the other hand, may or may not be unique at the file
designer's option.)
C. Further, in most indexed file implementations, within the data area the
records are stored in ascending order of primary key. Thus, the file is
physically a sequential file onto which one or more index structures have
been superimposed. Hence, indexed files are often called indexed
sequential files.
D. Let us consider the basic structure of the indexes:
1. The primary index.
a. The data area for the file is broken down into divisions which we
shall call "buckets" - each of which can contain some number of
records. (Actually, if the record length is variable then the
number of records in a bucket may also vary depending on the size
of each.) A bucket is usually the unit of physical transfer of data
between disk and main memory.
b. Because the file is ordered on the primary key, all the records
within a given bucket lie within some range of primary key values,
and all keys in any bucket are greater than those in logically
preceeding buckets and less than those in succeeeding buckets.
Example: suppose the primary key of an indexed file were an
animal's name. If all the data would fit into 4 buckets,
the file structure might look like this:
Bucket 1: Aardvark .. Fox
Bucket 2: Gopher .. Llama
Bucket 3: Mouse .. Raccoon
Bucket 4: Snake .. Zebra
c. Within the primary index it is only necessary to store ONE KEY PER
BUCKET.
Example: for the above, it would be sufficient for indexing
purposes to have only four entries in our primary index -
namely, the value of the LAST KEY in each bucket - e.g.
Fox 1
Llama 2
Raccoon 3
Zebra 4
d. To find any record, we would go to the index to determine which
bucket (if any) it must lie in; then we would go to the appropriate
bucket (which would be fetched from memory in one disk access) to
see if we can find the record we want. (This could be done using
a technique like binary search.)
2. The secondary indexes (if any)
a. In the case of the secondary indexes, life is more complicated.
Normally, the order of secondary keys does not correspond in any
way to the distribution of records among buckets. Thus, the
secondary index must contain ONE ENTRY PER KEY VALUE - ie. roughly
ONE ENTRY PER RECORD. (If duplicate values are allowed in a
secondary key field, then one index entry can be used to point to
more than one record containing that value.)
b. Thus, secondary indexes are always bigger than primary indexes, and
should be established only when really needed.
E. The information we have discussed is basically what one needs to know
in order to USE indexed files. The implementation of the indices is not,
however, a trivial matter.
1. Fortunately, in most cases these mechanisms are provided by the
operating system or programming language.
Example: The VAX/VMS operating system includes a component called
Record-Management Services (RMS) that provides support for
sequential, relative, and indexed files. VAX Pascal (and
other languages) make use of this to support these file
organizations.
2. Today, most indexed-file implementations make use of a data structure
known as a B-Tree. We cover this in CS321 - its a bit much to talk
about now! (However, one can USE indexed files without understanding
B-Trees in much the same way one can drive a car without knowing
automotive mechanics!)
F. Indexed files in VAX Pascal
1. As was the case with relative files, VAX Pascal incorporates language
extensions that allow one to process indexed files. Many of these
are similar to the extensions for relative files.
2. To work with an indexed file, certain declarations are needed:
a. The key fields must be specified as part of the record declaration
for a file.
i. This is done using the attribute KEY as part of the type
declaration for the field.
ii. Keys are assigned numbers 0..254; with key 0 always being the
primary key field.
iii. All programs that access the file must have the same number of
keys and the keys must be positioned in the same place in the
record - i.e. if key 1 occupies bytes 10..15 of the record
according to the declarations in one program then key 1 must be
declared to occupy bytes 10..15 in all other record declarations
for that file in any program accessing it.
Example: Declaration of type StudentRec in INDEXDEM.PAS
- This establishes that records in studentfile contain three
keys: a primary key in bytes 1..7; a secondary key (key 1) in
bytes 8..27; and another secondary key (key 2) in bytes 44..47.
b. The file's ORGANIZATION must be declared to be INDEXED and its
ACCESS_METHOD to be KEYED when it is OPENed:
OPEN(filevariable, filename, history (old or new),
ORGANIZATION := INDEXED, ACCESS_METHOD := KEYED);
Example: open in INDEXDEM.PAS
3. As with sequential and relative files, access to a file component
is through the file window. The function UFB may be used to test
whether the window contains valid data.
4. The following procedures are available for positioning the window:
a. The procedure FINDK for positioning the file to a record matching
some value for a specified key. FINDK requires 3-4 parameters:
- The file variable.
- An integer designating which key is to be used.
- The key value.
- (Optional) one of the following to denote the criterion to be
used in selecting a record:
EQL - an exact match must be found. (Default if this parameter is
not specified.)
NXT - FINDK will find the first record whose value in the
specified key field exceeds the value specified in the call to
FINDK. (The manual says GTR: but this has been changed in the
most recent release.)
NXTEQL - FINDK will attempt to find an exact match; but if there
is none it will find the first record whose value in the specified
key field exceeds the value specified in the call to FINDK.
(The manual says GEQ: but this has been changed in the most
recent release.)
If FINDK succeeds in finding a matching record, then that record is
made available in the file's buffer variable and UFB(filevar)
becomes false. However, in any of the above cases, if FINDK cannot
find a record as specified it sets UFB true.
Examples: findk calls in PrintStudentInfoByID,
PrintStudentInfoByName, PrintByGPA in INDEXDEM.PAS
(demonstrate)
b. The procedure GET for sequential processing of the file based on
the order of a particular key:
- FINDK positions the file and establishes the key specified as the
key-of-reference.
- Subsequent calls to GET access the records in order on that key.
Example: procedure PrintByGPA in INDEXDEM.PAS (demonstrate)
c. The procedure RESETK may be used to position the file window at the
first record in the file based on the sequence of one of the keys.
Subsequent calls to get will follow that sequence. It takes two
parameters: a file variable and a key number.
Example: procedure PrintAlphabetically in INDEXDEM.PAS (demonstrate)
Note: RESETK behaves like FINDK with a value = to the minimum
possible value for the key field and a comparison operator GEQ.
5. The following procedures may be used to store data in the file:
a. The procedure UPDATE may be used to replace the component just
found by FINDK, GET, or RESETK. The process is this:
- Use one of the above to access a record (typically FINDK)
- Change one or more fields of the file variable.
- Call UPDATE(filevar).
Example: procedure ChangeGPA in INDEXDEM.PAS (demonstrate)
Note: UPDATE modifies any secondary indexes as necessary; but it
may not be used to change a primary key. To change a primary key,
one must delete the existing record and use put to add a new one.
b. The procedure PUT for inserting new records in the file:
- If an indexed file is opened for ACCESS := SEQUENTIAL, then PUT
must be used to insert records in ascending order of the primary
key value. Each call to put will insert the record in the buffer
variable into the file, and will make entries in each index
associated with the file to correspond to the value(s) in the key
field(s) of that record.
- If an indexed file is opened for ACCESS := KEYED, then PUT may be
called to insert a new record at any time; the record is
automatically put into the proper position in the file according
to its primary key and all indexes are updated. (There is no
analogue to locate for relative files.)
- PUT may not be used to insert a record having the same primary
key value as one already in the file; for that, update must be
used.
Example: procedure AddStudent in INDEXDEM.PAS. Note use of findk
first to avoid duplicating the primary key. (demonstrate)
Note: as with relative files, UPDATE is used to change an existing
record, PUT to create a brand new one.
c. The procedure DELETE may be used to delete the component just found
with FINDK or GET or RESETK.
Example: procedure DeleteStudent in INDEXDEM.PAS (demonstrate)
VII. Indexed files compared to relative files:
A. An indexed file is generally bigger than a corresponding relative file
holding the same data - though this may not be true if the relative file
is quite sparse.
B. Accessing a given data item in an indexed file generally involves two or
more disk accesses - one or more to look up the record in the index, and
one to get the record. In a relative file with a good hashing scheme
most records can be retrieved with one disk access.
C. The major advantages of an indexed file are two:
a. The ability to use more than one key to access records.
b. The ability to access records sequentially based on key values.
Copyright ©1999 - Russell C. Bjork