« Back to Processing

Least-used letter pairs

| 1 Comment | Published on July 11, 2008
The content that follows was originally published on the Don Havey website at http://donhavey.com/blog/tutorials/processing/least-used-letter-pairs/

Least-used letter pairsCuriobot is in the middle of a structural makeover and this morning I decided that the code needed a little condensing. Most popular Javascript applications simplify function and variable names to just a few letters, and some CSS-heavy pages use a similar unique identifier to cut down on bandwidth. This technique also obfuscates your code, if you’re worried about people stealing it and using it for their own (evil) intentions.

In a big old “Web 2.0″ (*gag*) application, condensing your code can reduce your page load times by a serious percentage. I think it’s especially important if you’re offering content for mobile users, where every kilobyte counts. Replacing 300 occurrences of 12-character variable names with 2-character names can save you 10*300 bytes = 3kb per download. That’s probably 5%, maybe 10% of your total script/CSS/HTML weight… so it adds up on high-traffic applications. Of course, as most web developers are quick to point out, I pity the poor bastard who has to try to understand and/or modify a function called “zx()”. Ever taken a look at Google’s scripts? Totally incomprehensible for that reason.

Technically, you could name up to 676 (26×26) functions, variables, or classes using a two-letter pair, but the problem you run into is that when your content is not perfectly isolated from your page’s structure, you can’t simply search-and-replace the page to find every occurrence of the classname “ea” without running into it in a sentence somewhere (“I went to the beach” or “Welcome to my neat-o website”).

For the minimum amount of content-structure collisions, I want to favor the two-letter combinations that are used least frequently in my content. So I wrote a quick Processing app to examine any given text and return an ordered set of letter pairs, sorted according to their frequency. Here are the results for a few sample texts:

The pairs that are not in bold were not found at all in the sample text (use them for function/class/variable name replacements). Those that were found are displayed by frequency.

Here’s the code. Try inputting text that corresponds to your particular usage for better results.

So the next time you’re thinking of renaming “load_excellent_content()” to “le()”… don’t. Try “vq()”.

Categories: Processing

Tags: / / / / / / / /

1 Comment

Matt says:

Thanks for posting this Don, it is nice to have as a reference to look at and choose reference combinations from!

Leave a Response

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>