Command Line Queries

Handy commands with notes on how to use them when logged into the search server's terminal.

We now report more actionable failures in the scrape logs. We've had this for 3 days so we look at the last 12 logs and tally fails by site.

cd logs cat `ls -t | head -12` |\ grep sitemap: |\ sort | uniq -c | sort -n

Look for retired sites that somehow got indexed again. Maybe we can catch this when it happens?

ls retired |\ while read r; do ls -ld sites/$r; done 2>/dev/null

Leading spaces in a title show up as files that look like a command line option in shell. A prevalent problem: 157.

cd sites ls -d */pages/-* | wc -l

We find the top ten of the 35 exhibiting sites.

ls -d */pages/-* |\ cut -d '/' -f1 |\ sort | uniq -c | sort -nr

69 hexa.viki.wiki 16 wiki.ralfbarkow.ch 14 lua.dojo.fed.wiki 12 dreyeck.ch 9 found.ward.bay.wiki.org 3 don.noyes.asia.wiki.org 2 uvp.viki.wiki 2 roots.ward.bay.wiki.org 2 marc.tries.fed.wiki 2 lfi.wiki.dbbs.co