next up previous contents index
Next: Summary Up: Replica Control Previous: Synchronization   Contents   Index


This section contains a discussion of which files to be cached and when!

Which files should be cached:

The SEER [26] predictive caching system tries to cache the critical set as well as the average set by observing user behaviour (e.g., logging file references) and computing a reference distance between files as a measure of how closely-related files are (pair-wise). It then combines this information with least-recently-used, LRU, information and user-specified hints.

If the cache size is big enough it might be the full set. If the cache size is limited, a choice between the other three must be made. I choose to cache the current working set, since that solution is the one easiest implemented (in a simple LRU fashion). A choice of the critical set would have resulted in some loss of transparency, because it requires some sort of user-assisted cache management [45] such as the hoarding facility in Coda [22], [42] or as the user-specified hints in SEER [26]. In order to cache the average set, some sort of mechanism to collect usage statistics would have to be deployed, e.g., a spying agent as in D-NFS [7].

The drawback of choosing the simple solution is that the other solutions are very likely to result in fewer cache misses during disconnection--especially Coda has experienced high availability in disconnected mode due to the hoarding mechanism [42]. On the other hand, my choice of a simple LRU solution makes it possible for me, at a later stage, to change my mind. Programs to compute a priority list of the files in a critical set (e.g., using an algorithm similar to Coda's hoard walking) or an average set could be used. Once a priority list had been computed, the files simply needed to be cached in reversed priority order (lowest priority first), e.g., by opening and closing them for reading one at a time, making the LRU method keep the files with highest priority in the cache. Probably not a very efficient (nor especially transparent) way of heating up the cache [16] or doing demand hoard walking [42] (there is plenty room for improvement here), but it is possible. If the full set is desired then the priority list should simply contain all files (and the cache should be large enough).


The Bayou people are planning to use partial replication [54], in which case--I think--it will be the critical set.

next up previous contents index
Next: Summary Up: Replica Control Previous: Synchronization   Contents   Index