I’ve gotten into the habit of posting daily learnings on Twitter, but some things require a more in-depth reminder. I also haven’t done as much paying as forward as I’d like (but I’m having a TON of fun! and dealing with health problems! but mostly fun!)
I’d like to try to start posting more helpful tips here, partially as a notebook for myself, and partially to help others with similar issues.
Today’s problem: I needed to search for a few lines of text, which could be contained in any one of nine files with 100,000 lines each. Opening all of the files took a very long time on my computer, not to mention executing a search.
Enter the “grep” command in Terminal, that allows you to quickly search files using the power of the computer.
Type:
grep -r -P '^sometext/t' /path/to/a/folder/with/files/to/search
What does that mean?
The -r tells the command to search recursively. Don’t just search the folder I specify in my path, search all files and folders contained within that top folder.
-P tells us to use Perl-type regular expressions, allowing us to use /t to signify a tab in the actual string we are searching for. This is useful for data files that use tabs to delimite, or separate, different columns. This is often used in very heavy data files, too heavy to save in an Excel format.
After a space, I type the string of text I want to search for. Using Perl-type regular expressions, ^ signifies the start of a line, sometext is literally…some text, and /t, as mentioned earlier, represents a tab.
After another space, I specify the path to the directory, or folder, that contains the files I wish to search.
And the computer prints out the lines I’m looking for, telling me which specific file in the folder they are in.
BONUS: I neglected to do this, but smart people tell me I can save these lines of text to my own data file by appending > path/to/some/file.txt at the end of the command. > sends the content of the command somewhere, the path says where to send it.
That’s nice, but why do I care?
This gets back to my interests in text analysis. Give me a speech, and I can bring up all the times a politician said a certain word, without doing it by hand. Valuable skill to learn, especially for the type of work we do in DC. And helps me make better use of my time. Isn’t having more time, and outsourcing repetitive tasks to the computer, something we can all benefit from?