Learning about… algorithms and machine learning with ONADC

Algorithms. It’s a fancy word for something I do every day, and you should, too — Getting the computer to do your bidding.

Okay, it can get a little more complex than that. But not much.

I’ll back up. About a week ago, I went to the DC chapter’s Online News Association meetup. It’s a once-a-month activity I really try to make as often as I can. The topics of the talks at these events vary. The rain outside was pouring so hard that I had to stop in a drugstore for some respite, and my denim jacket turned into a sopping rag.

But! this month’s topic really spoke to me. On “Algorithims and Machine Learning”, from NICARian friends Justin Myers of the Chronicle for Higher Education and Derek Willis of the New York Times (longtime readers will also recognize him as the grad school professor who really cemented the beginning of my career path).

There’s a Storify of the event here, but I also wanted to lay out some of my thoughts around what I learned.

Justin had one of the best examples of an algorithm I’ve ever seen, demonstrating sorting order of bars in a bar charts. Is bar one bigger than two? If yes, move it to the right of bar 2? Cycle through all the bars, again and again, until no bars in the entire sequence change position. Then, you have confirmed your order is accurate.

Machine learning, which is a term I approach with trepidation, is really just a more complex algorithm. It’s just that the rules you give the computer are more specific, and ultimately flexible, so the machine can seem to “learn” what you would like it to do. Understanding which specific instruction to give the computer becomes easier when it tries to do a task, and you see where it makes a mistake. Then, you clarify that point. This technique seems quite similar to the education of a human.

A very common application for using these strategies is for verifying names. How do we know if John H. Smith is the same as John Smith? What repeatable strategies can we give the computer to help?

Second time I heard the Python “dedupe” library brought up to help with this kind of work. Also, the FEC-standardize library from friend Chase Davis also mentioned. For more info, here’s Chase talking about the project on Source.

Place names, particularly addresses, use similar strategies. I hadn’t thought of this before, but Derek brought up machine learning in the context of how EveryBlock works (oh, I guess it’s “worked”, now, that’s sad). Think this has a lot of geographic implication I’ll be thinking more about.

Be careful of what steps or instructions you give the computer as it will take you very literally.

We discussed the importance of avoiding getting overwhelmed by having all of these tools at your disposal. My contribution (I’m actually not sure why I was talking at this point, but I felt the need to answer a question, and I suppose Derek and Justin are just sort of used to my need to be vocal, maybe I just need more sleep or less caffeinated tea or something) was to write out the steps you want the computer to take. Sometimes I do that, and I always find that the process goes better when I do. Sort of like sketching a visualization, but sketching out the logic. There will always be more tools, you know.

Lots of jokes about college interns and making them do work. Or rather, have the computer do it instead.

Or, or, be the college intern that writes the code to teach the computer to do this stuff. (Worked for me.) Learn to code and get your journalism job using tech for informing the public. Ending soapbox now…

Regardless, I think automated instructions, and machine learning at large really speak to some of my broader desires to use tech for data analysis in addition to presentation. I’ve been saying that for a while, but I think I’m approaching scratching the surface on this. Hope something big is coming…soon.

Related posts you might enjoy: