6.19 Smalltalk‑like Collection Classes
The Smalltalk-like collection classes are based on the language Smalltalk-80 and require that their collected objects singly or multiply inherit from the abstract base class
RWCollectable.
These collection classes have a few disadvantages: they are slightly slower and not completely typesafe, and their objects are slightly larger. These disadvantages are easily outweighed by the power of these classes, and their clean programming interface. Most importantly, the Smalltalk-like collection classes are well-suited for heterogeneous collections and polymorphic persistence.
Many of the Essential Tools Module Smalltalk-like classes have a typedef to either the corresponding Smalltalk names, or to a generic name. This typedef is activated by defining the preprocessor macro RW_STD_TYPEDEFS. Although you are free to use typedefs, we do encourage you to use the actual class names to make your code more maintainable.
6.19.1 Tables of the Smalltalk-like Classes
The following table and figure summarize the Essential Tools Module Smalltalk-like classes.
Table 14 lists all 17 classes, along with their typedefs, iterators, and implementations. Following is the class hierarchy.
Table 14 – Smalltalk-like collection classes and their iterators, typedefs, and implementations
Class | Iterator | Smalltalk typedef (deprecated) | Implemented as |
| | Bag | Dictionary of occurrences |
| | SortedCollection | Binary tree |
| | | B-tree in memory |
| | | B-tree of associations |
| | Collection | Abstract base class |
| | | Doubly-linked list |
| | | Hash table |
| | Dictionary | Hash table of associations |
| | IdentityDictionary | Hash table of associations |
| | IdentitySet | Hash table |
| | OrderedCollection | Vector of pointers |
| | Sequenceable | Abstract base class |
| | Set | Hash table |
| | LinkedList | Singly-linked list |
| (n/a) | Queue | Singly-linked list |
| (n/a) | Stack | Singly-linked list |
| RWSortedVectorIterator | | Vector of pointers, using insertion sort |
The class hierarchy of the Smalltalk-like collection classes
Some of these classes use multiple-inheritance: this hierarchy is shown relative to the RWCollectable base class.
6.19.2 Example
To orient ourselves, we always like to start with an example. The following example uses a
SortedCollection to store and order a set of
RWCollectableStrings.
SortedCollection is actually a typedef for the Smalltalk-like collection class
RWBinaryTree. Objects inserted into it are stored in order according to their relative values as returned by the virtual function
compareTo(). (See
Section 6.17.4.)
Here is the code:
#define RW_STD_TYPEDEFS 1 //1
#include <rw/bintree.h>
#include <rw/collstr.h> //2
#include <rw/rstream.h>
using std::cout;
using std::endl;
int main(){
// Construct an empty SortedCollection
SortedCollection sc; //3
// Insert some RWCollectableStrings:
sc.insert(new RWCollectableString("George")); //4
sc.insert(new RWCollectableString("Mary"));
sc.insert(new RWCollectableString("Bill"));
sc.insert(new RWCollectableString("Throkmorton"));
// Now iterate through the collection printing all members:
RWCollectableString* str; //5
SortedCollectionIterator sci(sc); //6
while( str = (RWCollectableString*)sci() ) //7
cout << *str << endl; //8
sc.clearAndDestroy();
return 0;
}
Program Output:
Bill
George
Mary
Throkmorton
The following explains the code line-by-line:
When run, the program prints out the four collected strings in order; for class
RWCollectableString, this means lexicographical order.
6.19.3 Choosing a Smalltalk-like Collection Class
Now that you have reviewed the list of Smalltalk-like collection classes, their class hierarchy, and an example of how they work, you may be wondering how you can use them. This section gives an overview of the various Smalltalk-like collection classes to help you choose an appropriate one for your problem. You can also see Appendix A, on choosing.
6.19.3.1 Bags Versus Sets Versus Hash Tables
Class
RWHashTable is the easiest Smalltalk-like collection class to understand. It uses a simple hashed lookup to find the
bucket where a particular object occurs, then does a linear search of the bucket to find the object. A key concept is that multiple objects that test
isEqual to each other can be inserted into a hash table.
Class
RWBag is similar to
RWHashTable, except that it
counts occurrences of multiple objects with the same value; that is, it retains only the first occurrence and merely increments an occurrence count for subsequent ones.
RWBag is implemented as a dictionary, where the key is the inserted object and the value is the occurrence count. This is the same way the Smalltalk
Bag object is implemented. Note that this implementation differs significantly from many other C++
Bag classes which are closer to the
RWHashTable class and not true
Bags.
Class
RWSet is similar to its base class
RWHashTable, except that it doesn't allow duplicates. If you try to insert an object that
isEqual to an object already in
RWSet, the object will be rejected.
Class
RWIdentitySet, which inherits from
RWSet, retrieves objects on the basis of
identity instead of value. Because
RWIdentitySet is a
Set, it can take only one
instance of a given object.
Note that the ordering of objects in any of these classes based on hash tables is not meaningful. If ordering is important, you should choose a sequenceable class.
6.19.3.2 Sequenceable Classes
Classes inheriting from
RWSequenceable have an innate ordering. You can speak meaningfully of their first or last object, or of their 6th or
ith object.
These classes are implemented generally as either a vector, or a singly-linked or doubly-linked list. Vector-based classes make good stacks and queues, but poor insertions in the middle. If you exceed the capacity of a vector-based collection class, it will automatically resize, but it may exact a significant performance penalty to do so.
Note that the binary and B-tree classes can be considered sequenceable in the sense that they are sorted, and therefore have an innate ordering. However, their ordering is determined internally by the relative value of the collected objects, rather than by an insertion order. In other words, you cannot arbitrarily insert an object into a sorted collection in any position you want because it might not remain sorted. Hence, these classes are subclassed separately.
6.19.3.3 Dictionaries
Sometimes referred to as maps, dictionaries use an external key to find a value. The key and value may be of different types, and in fact usually are. You can think of dictionaries as associating a given key with a given value. For example, if you were building a symbol table in a compiler, you might use the symbol name as the key, and its relocation address as the value. This approach would contrast with your approach for a Set, where the name and address would have to be encapsulated into one object.
The Essential Tools Module provides two dictionary classes:
RWHashDictionary, implemented as a hash table, and
RWBTreeDictionary, implemented as a B-tree. For these classes, both keys and values must inherit from the abstract base class
RWCollectable.
6.19.4 Virtual Functions Inherited From RWCollection
The Smalltalk-like collection classes inherit from the abstract base class
RWCollection, which in turn inherits from the abstract base class
RWCollectable, described in
Section 6.16,
Section 6.17,
Section 6.18, and
Section 6.19.1. (Thus do we produce collections of collections, but that is another story.)
An
abstract base class is a class intended to be inherited by some other class, not used as itself
per se. If you think of it as a kind of virtual class, you can easily project the meaning of virtual functions. These virtual functions provide a blueprint of functionality for the derived class. As an abstract base class,
RWCollection provides a blueprint for collection classes by declaring various virtual functions, such as
insert(),
remove(),
entries(), and so on.
This section describes the virtual functions inherited by the Smalltalk-like collections. Any of these collections can be expected to understand them.
6.19.4.1 insert()
You can put a pointer to an object into a collection by using the virtual function insert():
virtual RWCollectable* insert(RWCollectable*);
This function inserts in the way most natural for the collection. Faced with a stack, it pushes an item onto the stack. Faced with a queue, it appends the item to the queue. In a sorted collection, it inserts the new item so that items before it compare less than itself, items after it compare greater than itself, and items equal compare equal, if duplicates are allowed. See the example in
Section 6.19.2 for an example using
insert().
You must always insert pointers to real objects. Since all
RWCollection classes need to dereference their contents for some methods such as
find(), inserting a zero will cause such methods to crash. If you must store an empty object, we suggest you create and insert a default constructed object of the appropriate type, such as
RWCollectable*.
6.19.4.2 find() and Friends
You can use the following virtual functions to test how many objects a collection contains, and whether it contains a particular object:
virtual bool contains(const RWCollectable*) const;
virtual unsigned entries() const;
virtual RWCollectable* find(const RWCollectable*) const;
virtual bool isEmpty() const;
virtual unsigned occurrencesOf(const RWCollectable*) const;
The function isEmpty() returns true if the collection contains no objects. The function entries() returns the total number of objects that the collection contains.
The function contains() returns true if the argument is equal to an item within the collection. The meaning of is equal to depends on the collection and the type of object being tested. Hashing collections use the virtual function isEqual() to test for equality, after first hashing the argument to reduce the number of possible candidates to those in one hash bucket. (Here it is important that all items which are isEqual with each other hash to the same value!). Sorted collections search for an item that compares equal to the argument; in other words, an item for which compareTo() returns zero.
The virtual function occurrencesOf() is similar to contains(), but returns the number of items that are equal to the argument.
The virtual function find() returns a pointer to an item that is equal to its argument.
The following example, which builds on the example in
Section 6.19.2, uses
find() to find occurrences of
Mary in the collection, and
occurrencesOf() to find the number of times
Mary occurs:
#define RW_STD_TYPEDEFS 1
#include <rw/bintree.h> //1
#include <rw/collstr.h>
#include <rw/rstream.h>
using std::cout;
using std::endl;
int main(){
// Construct an empty SortedCollection
SortedCollection sc;
// Insert some RWCollectableStrings:
sc.insert(new RWCollectableString("George"));
sc.insert(new RWCollectableString("Mary"));
sc.insert(new RWCollectableString("Bill"));
sc.insert(new RWCollectableString("Throkmorton"));
sc.insert(new RWCollectableString("Mary")); //2
cout << sc.entries() << endl; //3
RWCollectableString dummy("Mary"); //4
RWCollectable* t = sc.find( &dummy ); //5
if(t){ //6
if(t->isA() == dummy.isA()) //7
cout << *(RWCollectableString*)t << endl; //8
}
else
cout << "Object not found.\n"; //9
cout << sc.occurrencesOf(&dummy) << endl; //10
sc.clearAndDestroy();
return 0;
}
Program Output:
5
Mary
2
6.19.4.3 remove() Functions
To search for and remove particular items, you can use the functions remove() and removeAndDestroy():
virtual RWCollectable* remove(const RWCollectable*);
virtual void removeAndDestroy(const RWCollectable*);
The function remove() looks for an item that is equal to its argument and removes it from the collection, returning a pointer to it. It returns nil if no item is found.
The function
removeAndDestroy() is similar except it deletes the item instead of returning it, using the virtual destructor inherited by all
RWCollectable items. You must be careful when using this function that the item was actually allocated off the heap,
not the stack, and that it is
not shared with another collection.
The following example, which expands on the previous one, demonstrates the use of the virtual function removeAndDestroy():
RWCollectable* oust = sc.remove(&dummy); //11
delete oust; //12
sc.removeAndDestroy(&dummy); //13
6.19.4.4 apply() Functions
To efficiently examine the members of a Smalltalk-like collection, use the member function apply():
virtual void apply(RWapplyCollectable ap, void* x);
The first argument, RWapplyCollectable, is a typedef:
typedef void (*RWapplyCollectable)(RWCollectable*, void*);
In other words, RWapplyCollectable is a pointer to a function with prototype:
void yourApplyFunction(RWCollectable* item, void* x)
where yourApplyFunction is the name of the function. You must supply this function. It will be called for each item in the collection, in whatever order is appropriate for the collection, and passed as a pointer to the item as its first argument. You must be careful that you cast the pointer item to the proper derived class. The second argument x is passed through from the call to apply(), and is available for your use. For example, you could use it to hold a handle to a window on which the object is to be drawn.
The apply-functions generally employ the most efficient method for examining all members of the collection. This is their great advantage. Their disadvantage is that they are slightly clumsy to use, requiring you to supply a separate function. The functional equivalent to apply() in the Smalltalk world is do. It takes just one argument: a piece of code to be evaluated for each item in the collection.
6.19.4.5 Functions clear() and clearAndDestroy()
To remove all items from the collection, you can use the functions clear() and clearAndDestroy():
virtual voidclear();
virtual voidclearAndDestroy();
The function
clearAndDestroy() not only removes the items, but also calls the virtual destructor for each item. You must use this function with care. The function
does check to see if the same item occurs more than once in a collection (by building an
RWIdentitySet internally), and thereby deletes each item only once. However, it cannot check whether an item is shared between two
different collections. You must also be certain that every member of the collection was allocated off the heap.
6.19.5 Other Functions Shared by All RWCollections
There are several other functions that are shared by all classes that inherit from
RWCollection. Note that these are
not virtual functions.
6.19.5.1 Class Conversions
The following functions allow any collection class to be converted into an
RWBag,
RWSet,
RWOrdered, or a
SortedCollection (that is, an
RWBinaryTree):
RWBag asBag() const;
RWSet asSet() const;
RWOrdered asOrderedCollection() const;
RWBinaryTree asSortedCollection() const
Note that since these functions mimic similar Smalltalk methods, they return a copy of the new collection by value. For large collections, this can be very expensive. Consider using operator+=() instead.
6.19.5.2 Inserting and Removing Other Collections
You can use these functions to respectively insert or remove the contents of their argument.
void operator+=(const RWCollection&);
void operator-=(const RWCollection&);
6.19.5.3 Selection
The function select():
typedef bool (*RWtestCollectable)(const RWCollectable*,
const void*);
RWCollection* select(RWtestCollectable tst, void*);
evaluates the function pointed to by tst for each item in the collection. It inserts those items for which the function returns true into a new collection of the same type as self and returns a pointer to it. This new collection is allocated off the heap, so you are responsible for deleting it when done.
6.19.6 Virtual Functions Inherited from RWSequenceable
Collections that inherit from the abstract base class
RWSequenceable, which inherits from
RWCollectable, have an innate, meaningful ordering. This section describes the virtual functions inherited from
RWSequenceable which make use of that ordering. For example, the following virtual functions allow access to the
ith item in the collection:
virtual RWCollectable*& at(size_t i);
virtual const RWCollectable* at(size_t i) const;
Remember that the first item in any collection is at position i=0. The compiler chooses which function to use on the basis of whether or not your collection has been declared const: the second variant of the function is for const collections, the first for all others. The first variant can also be used as an lvalue, as in the following example:
RWOrdered od;
od.insert(new RWCollectableInt(0)); // 0
od.insert(new RWCollectableInt(1)); // 0 1
od.insert(new RWCollectableInt(2)); // 0 1 2
delete od(1); // Use variant available for RWOrdered
od.at(1) = new RWCollectableInt(3); // 0 3 2
As you might expect, the operations above are efficient for the class
RWOrdered, which is implemented as a vector, but relatively inefficient for a class implemented as a linked-list, because the entire list must be traversed to find a particular index.
The following virtual functions return the first or last item in the collection, respectively, or nil if the collection is empty:
virtual RWCollectable* first() const;
virtual RWCollectable* last() const;
The next virtual function returns the index of the first object that is equal to the argument, or the special value RW_NPOS if there is no such object:
virtual size_t index(const RWCollectable*) const;
Here is an example of the index function in use. The output shows that the index of the variable we were searching for was found at position 1.
RWOrdered od;
od.insert(new RWCollectableInt(6)); // 6
od.insert(new RWCollectableInt(2)); // 6 2
od.insert(new RWCollectableInt(4)); // 6 2 4
RWCollectableInt dummy(2);
size_t inx = od.index(&dummy);
if (inx == RW_NPOS)
std::cout << "Not found." << std::endl;
else
std::cout << "Found at index " << inx << std::endl;
Program Output:
Found at index 1
Finally, you can use the following function to insert an item at a particular index:
virtual RWCollectable* insertAt(size_t i, RWCollectable* c);
In the example below, the code uses the function insertAt to insert 4 at position 1.
RWOrdered od;
od.insert(new RWCollectableInt(6)); // 6
od.insert(new RWCollectableInt(2)); // 6 2
od.insertAt(1, new RWCollectableInt(4)); // 6 4 2
6.19.7 A Note on How Objects are Found
You may save yourself some difficulty by remembering the following point: the virtual functions of the object within the collection, not those of the target, are called when comparing or testing a target for equality.
The following code fragment illustrates the point:
SortedCollection sc;
RWCollectableString member;
sc.insert(&member);
RWCollectableString target;
RWCollectableString* p = (RWCollectableString*)sc.find(&target);
In this example, the virtual functions of
member within the collection
RWCollectableString are called, not the virtual functions of
target. In other words:
member.compareTo(&target); //This will get called.
target.compareTo(&member); //Not this.
6.19.7.1 Hashing
Hashing is an efficient method for finding an object within a collection. All the collection classes that use hashing follow the same general strategy. First, member function hash() of the target is called to find the proper bucket within the hash table. A bucket is implemented as a singly-linked list. Because all the members of a bucket have the same hash value, the bucket is linearly searched to find the exact match. This is done by calling member function isEqual() of the candidate (see above) with each member of the bucket as the argument. The first argument that returns true is the chosen object. Be careful not to design your class so that two objects that test true for isEqual() can have different hash values, since this algorithm will fail for such objects.
In general, because of this combination of hashing and linear searching, as well as the complexity of most hashing algorithms, the ordering of the objects within a hash collection will not make a lot of sense. Hence, when the apply() function or an iterator scans through the hashing table, the objects will be visited in what appears to be random order.