Sunday, October 4, 2015

What IS the Internet Archive and how NOT to search it?

From the previous post, you now know that Internet Archive aka THE WAYBACK MACHINE is the place to go for a LOT of raw data, digital archives, digitized films, etc, but how to search it--here's an article on how NOT to search it with specific instructions on what to do--it's lengthy but worth it.

Courtesy of Ancestry Insider at http://www.ancestryinsider.org/2015/09/how-to-navigate-around-internet-archive.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AncestryInsider+%28The+Ancestry+Insider%29
 

Tuesday, September 29, 2015

How to Navigate Around the Internet Archive Search Bug

There is a bug in Internet Archive’s “Search Inside” a book feature. Don’t trust it. Let me tell you what to do instead.
Let’s say you found your way to a book on Internet Archive (IA). It is A Complete History of Fairfield County, Ohio (at https://archive.org/details/cu31924028848483) by Hervey Scott. You want to see if Jonas Messerly is mentioned in it. You select the search magnifying glass up in the upper-right corner.
Internet Archive's title search icon
You search for “Messerly” and, oops, you just searched IA for titles rather than searching inside that single title.
Internet Archive's title search results
Wait, don’t cuss me out yet; that’s not the bug. That’s just user error and a user interface annoyance.
You find another search magnifying glass icon on the right-hand side about half way down the page. The context help popup says “search inside.” You select the icon.
Internet Archive's search inside icon
The page changes a bit and the search icon disappears.
The search inside icon is in a different place in the Internet Archive's full screen view.
Instead of instigating a search, what you’ve just done is switched from one book viewer to another. People  in the know tell me that this failure to search is not a bug. Because the design is supposed to do this, it is a WAD, “working as designed.” Fine. Let’s compromise and call it a user interface flaw. But this is still not the bug of which I speak.
The search inside icon has disappeared. The search-all-of-IA box is still up in the upper-right corner of the screen. You fell for that one once before. “Fool me once…” After looking in vain for another search icon, you notice that the search box you previously dismissed, the one that searched for book titles, is now labeled “Search inside”.
The search inside box is at the top in the Internet Archive's full screen view.
Also not the bug of which I speak. It’s another user error and user interface annoyance.
Now comes the bug. You search for “Messerly” and IA erroneously states “No matches were found.”
The Internet Archive's full screen view with no matches found message
Rather than depend on just the “Search Inside” results, check the raw text. To do this, select the italic I—the “About this book” icon. In the popup, select Plain Text. That brings you to a page containing the raw text from the book. Now use your browser search (^F) to search for Messerly.
Some raw text from an Internet Archive book
There he is on page 73. Now back up to the book viewer and advance to page 73.
Mention of Jonas Messerly in a history of Fairfield County, Ohio
One of the distinct advantages of Internet Archive over Google Books is that downloaded PDF files are searchable. I tested the above book and found that Adobe Reader is not affected by the search bug. You can download from IA with the confidence that your offline study will not be affected.
Mention of Jonas Messerly in a history of Fairfield County, Ohio
Be aware that OCR errors are unaffected by any of this. If a word was not recognized when scanned, then all of these methods will fail to find it.
Finally, the Internet Archive is a non-profit organization that accomplishes amazing things with very little money. No one should be surprised that there are flaws in their software. We are all in their debt. They accept contributions at https://archive.org/donate.

No comments:

Post a Comment