FoxBrain Online

where imagination envelopes technology...

Monday, July 04, 2005

Deep Mirage

Re-read that The Anatomy of a Large-Scale Hypertextual Web Search Engine doc, and again, got some ideas. File system wise... Why can't a filesystem for such a thing be arranged as a linked queue? Here's what I mean:

When you add records (download page/file), you add them at the head of the queue... Whenever you re-download a page, you mark the old version of the page in the queue as "deleted". You then run another process that operates on the tail of the queue... it will remove a record, if it's marked as "deleted", it will just get rid of it, if it's not deleted, it will move it to the head of the queue (or possibly re-download it?). Also, the insertion process will add the page to the index, while a removal process will remove the page from the index, etc (that way, the relative location within the file doesn't matter much).

One problem is that individual chunks (sectors?) of the underlying thing will have to be managed manually (usually you can't grow/shrink the file from either side... but I can imagine it's not as difficult if you just erase the beginning or ending sectors).