Three times now I have set out to document the federation search data flows and each time I make a new site. This is usually motivated by some bug. Seems I don't ever get the documentation finished after the bug has been found and fixed. matrix ![]()
Our new work mimics the freeform data entry used in the example from SigMod Example Unbound.
# Sitemap
The scrape runs every six hours on a schedule that shifts with daylight savings time. The scrape is built from scripts that manipulates files in directories. Some files are rolled up from similarly named files in subdirectories. github ![]()
Our federation wide search runs a scrape four times a day to update flat-file indices that are searched on demand with a plugin and several related tools. github ![]()
We return again to the collection of mostly Ruby scripts that implement federation search motivated by fending off slow decay based on growth and evolution in the federation itself.
# Applications
A good way to understand the federation is to write a sitemap scraper. matrix ![]()
We add restrictions to Scrape Pages so that it finds more relevant content.
This page displays reachable titles following links forward or backwards two hops.
# Resources
We collect various counts while scraping and report them as a text file. json ![]()
We'll mine the search index logs for insight as to what is happening in the federation.
All sites found, organized by domain name, excluding sites with less than ten pages.