I'm in this class about databases. We had to get together in a group and build a database, developing ways to describe things one might not usually think to spend time "classifying." My group decided our database would be about produce, and we spent weeks deciding things like whether "potatoes" and "sweet potatoes" were different enough to merit separate records, and if so, how could we quantify that which made them different enough so that we could apply the same standards to other produce. When we had created our produce data storing masterpiece, we traded empty databases with another team, the idea being that we'd each try and sort some data into the other's structure to see how well it held up.
When it came time to trade, I was really looking forward to seeing what our trade group had decided to classify. The rules of the assignment said we could do anything that wasn't "overly bibliographic in nature." In other words, you silly librarians to be, don't reinvent the wheel and organize your book collections. They'd made their database about--squeal!--TSHIRTS!! So Friday night, after a week of working, I "treated" myself to, yes, you are understanding me correctly, **entering data about my tshirts into a database!** I had fun doing it!! Organizing and T-shirts combined? What could be better? Except maybe getting graded for thinking of ways the experience of organizing t-shirts into a database could be improved!
I'm spending the day working on a midterm for my database class. (I do have another class, but it's not really expanding the horizons of my nerdiness in the same fashion.) Here is an *actual answer* I'm going to turn in on my midterm. (It's worth noting that my teacher a) seems to have a good sense of humor and b) seems to appreciate a "real world analogy.")
I'm writing about three ways one can evaluate the "closeness" between a search term and some search results. There is a fiendish amount of math involved in this process, but I treat the math the way I sometimes (oh, my nerd street cred is going to drop if I write this) treat the songs in LOTR. I see them and I skip right over them:
The second model is the Probabilistic Model. In this model, each term in the search query is weighted. The IRS returns documents that have the highest score, given the prevalence of each of the weighted terms in the document. (It's not an exact metaphor, but I think of this like the points for each letter on Scrabble tiles. Words with lots of vowels—like low weighted search terms—will clear your tray but not get you a big score. Pull off “quixotic” however, and you'll get a higher “status value.” The equations in the book make my head spin and my eyes water, but Scrabble I can wrap my head around.)That's right. I busted out a Scrabble metaphor on the LAST question of my midterm. Who else has been this nerdy today. Come on, fess up. Let's hear the story.