1. Origins of Ord

  2. This is a story about the making of ord.zeke.sikelianos.com, a tool for linguistic inquiry.
  3. Everyone knows Wikipedia is a fount of knowledge, but there's more magic than meets the eye.
  4. On every Wikipedia article page, the sidebar contains a link to the same article in other languages.
  5. Let's open the browser console and play with this data.
  6. Cool, Wikipedia uses jQuery, so we get it for free in the console.
  7. Let's get that data!
  8. Time to switch from the browser to the editor and build a scraper: wikipedia-translator
  9. We'll use cheerio, a tiny implementation of core jQuery designed specifically for node.
  10. Yay! Now we have a standalone scraping module. But something's missing...
  11. Each translation result has a language code, but that's a bit opaque. What the heck language is af?
  12. Wikipedia has its own system for codifying langauges, based loosely on other ISO and IETF standards.
  13. So let's scrape that HTML table and stick the data in a new node module: wikipedias
  14. Now we have two node modules that provide all the wikipedia data we need. The next step is to a build a small web app that wraps these modules.
  15. ord.zeke.sikelianos.com is tiny. It has but one route.
  16. And thanks to Express, turning Ord into a JSON webservice is a one-liner.
  17. Works on mobile too, with a live media-query love.
  18. One last thing: Let's try to sort the results by etymological similarity.
  19. There's an npm module for that.
  20. Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
  21. We can use it to determine how similar (or different) two words are.
  22. Takeaways

  23. Wikipedia is amazing.
  24. Node makes it easy to build complex projects composed of smaller modules, each with their own set of responsibilities.
  25. Smaller modules are easier to test.
  26. The web browser is more than a means for consuming information: It's also a tool for collecting and manipulating data.