Linked List
LISTS
Let's consider two different data structures for storing a list of things: an array and a linked list.
An array is a pretty obvious way to store a list, with a big advantage: it enables very fast access of each item. However, it has two disadvantages.
First, if we want to insert an item at the beginning or middle of an array, we have to slide a lot of items over one place to make room. This takes time proportional to the length of the array.
Second, an array has a fixed length that can't be changed. If we want to add items to the list, but the array is full, we have to allocate a whole new array and move all the ints from the old array to the new one.
1 | public class AList { |
LINKED LISTS (a recursive data type)
We can avoid these problems by choosing a Scheme-like representation of lists. A linked list is made up of nodes. Each node has two components: an item, and a reference to the next node in the list. These components are analogous to car
and cdr
. However, our node is an explicitly defined object.
1 | public class ListNode { // ListNode is a recursive type |
Let's make some ListNodes
.
1 | ListNode l1 = new ListNode(), l2 = new ListNode(), l3 = new ListNode(); |
Now let's link them together.
1 | l1.next = l2; |
What about the last node? We need a reference that doesn't reference anything. In Java, this is called null
.
1 | l3.next = null; // Java keyword, reserved for null pointer; |
To simplify programming, let's add some constructors to the ListNode class.
1 | public ListNode(int i, ListNode n) { |
These constructors allow us to emulate Scheme's cons
operation.
1 | ListNode l1 = new ListNode(7, new ListNode(0, new ListNode(6))); |
Linked lists vs. array lists
Linked lists have several advantages over array-based lists. Inserting an item into the middle of a linked list takes just a small constant amount of time, if you already have a reference to the previous node (and don't have to walk through the whole list searching for it). The list can keep growing until memory runs out.
1 | public void insertAfter(int item) { |
However, linked lists have a big disadvantage compared to arrays. Finding the nth
item of an array takes a tiny, constant amount of time. Finding the nth item of a linked list takes time proportional to n
. You have to start at the head of the list and walk forward n - 1
nodes, one next
at a time.
Many of the data structures we will study in this class will be attempts to find a compromise between arrays and linked lists. We'll learn data structures that are fast for both arbitrary lookups (like arrays) and arbitrary insertions (like linked lists).
Lists of Objects
For greater generality, let's change ListNodes
so that each node contains not an int, but a reference to any Java object. In Java, we can accomplish this by declaring a reference of type Object
.
1 | public class SListNode { |
The S
in SListNode
stands for singly-linked. This will make sense when we contrast these lists with doubly-linked lists later. You'll see the SListNode class in next week's lab and homework.
A List Class
There are two problems with SListNodes.
- Suppose
x
andy
are pointers to the same shopping list. Suppose we insert a new item at the beginning of the list thusly:
1 | x = new SListNode("soap", x); |
y
doesn't point to the new item; y
still points to the second item in x'slist. If y goes shopping for x, he'll forget to buy soap.
- How do you represent an empty list? The obvious way is
x = null
. However, Java won't let you call a SListNode method--or any method--on a null object. If you writex.insertAfter(item)
when x is null, you'll get a run-time error, even though x is declared to be a SListNode. (There are good reasons for this, which you'll learn later in the course.)
The solution is a separate SList
class, whose job is to maintain the head (first node) of the list. We will put many of the methods that operate on lists in the SList
class, rather than the SListNode
class.
1 | public class SList { |
Now, when you call x.insertFront("fish")
, every reference to that SList
can see the change.
Another advantage of the SList
class is that it can keep a record of the SList's
size (number of SListNodes
). Hence, the size can be determined more quickly than if the SListNodes
had to be counted.
THE public
AND private
KEYWORDS
Thus far, we've usually declared fields and methods using the public
keyword. However, we can also declare a field or method private
. A private
method or field is invisible and inaccessible to other classes, and can be used only within the class in which the field or method is declared.
Why would we want to make a field or method private?
- To prevent data within an object from being corrupted by other classes.
- To ensure that you can improve the implementation of a class without causing other classes that depend on it to fail.
In the following example, EvilTamperer
tries to get around the error checking code of the Date
class by fiddling with the internals of a Date
object.
1 | public class Date { |
1 | public class EvilTamperer { |
However, javac won't compile EvilTamperer
, because the Date
class has declared its vulnerable parts private
. setMonth
is an internal helper method used within the Date class, whereas the Date constructor
is a public part of the interface of the Date class. Error-checking code in the constructor ensures that invalid Dates are not constructed.
Here are some important definitions.
The interface of a class is a set of prototypes for public methods (and sometimes public fields) (Java code), plus descriptions of the methods' behaviors (plain English).
An Abstract Data Type (ADT) is a class that has a well-defined interface, but its implementation details are firmly hidden from other classes. That way, you can change the implementation of a class without jeopardizing the programs that depend on it. The Date class is an ADT. We'll implement lots of ADTs this semester.
An invariant is a fact about a data structure that is always true (assuming the code is bug-free), no matter what methods are called by external classes. For example, the Date ADT enforces the invariant that a Date object always represents a valid date. An invariant is enforced by allowing access to certain fields only through method calls.
An ADT is often a good thing to aspire to. In most of your classes, you should declare all fields private, as well as helper functions meant only for internal use, so that you can maintain sensible invariants on your data structures.
However, not all classes are ADTs! Some classes are nothing more than data storage units, and do not need to enforce any invariants. In such classes, all fields may be declared public.
The SList ADT
Last lecture, I created an SList
class to solve the problems of representing empty lists and inserting items at the beginning of a list. Today, I want to introduce another advantage of the SList
class.
We want the SList ADT to enforce two invariants:
- An SList's
size
variable is always correct. - A list is never circularly linked; there is always a tail node whose
next
reference is null.
Both these goals are accomplished by making sure that only the methods of the SList
class can change the lists' internal data structures. SList
ensures this by two means:
- The fields of the
SList
class (head and size) are declaredprivate
. - No method of
SList
returns an SListNode.
The first rule is necessary so that the evil tamperer can't change the fields and corrupt the SList
or violate invariant (1). The second rule prevents the evil tamperer from changing list items, truncating a list, or creating a cycle in a list, thereby violating invariant (2).
DOUBLY-LINKED LISTS
As we saw last class, inserting an item at the front of a linked list is easy. Deleting from the front of a list is also easy. However, inserting or deleting an item at the end of a list entails a search through the entire list, which might take a long time. (Inserting at the end is easy if you have a 'tail' pointer, as you will learn in Lab 3, but deleting is still hard.)
A doubly-linked list is a list in which each node has a reference to the previous node, as well as the next node.
1 | class DListNode { |
1 | class DList { |
DLists
make it possible to insert and delete items at both ends of the list, taking constant running time per insertion and deletion. The following code removes the tail node (in constant time) if there are at least two items in the DList.
1 | tail.prev.next = null; |
You'll need a special case for a DList
with no items. You'll also need a special case for a DList
with one item, because tail.prev.next
does not exist. (Instead, head needs to be changed.)
Let's look at a clever trick for reducing the number of special cases, thereby simplifying our DList
code. We designate one DListNode
as a sentinel, a special node that does not represent an item. Our list representation will be circularly linked, and the sentinel
will represent both the head and the tail of the list. Our DList
class no longer needs a tail
pointer, and the head
pointer points to the sentinel
.
1 | class DList { |
The invariants of the DList ADT are more complicated than the SList invariants. The following invariants apply to the DList
with a sentinel
.
- For any DList d, d.head != null. (There's always a sentinel.)
- For any DListNode x, x.next != null.
- For any DListNode x, x.prev != null.
- For any DListNode x, if x.next == y, then y.prev == x.
- For any DListNode x, if x.prev == y, then y.next == x.
- A
DList's
"size" variable is the number ofDListNodes
, NOT COUNTING the sentinel (denoted by "head"), that can be accessed from the sentinel by a sequence of "next" references.
An empty DList
is represented by having the sentinel's prev
and next
fields point to itself.
Here's an example of a method that removes the last item from a DList
.
1 | public void removeBack() { |