Search This Blog

Friday, October 28, 2011

Don Count Spaces When Counting Words


Don't count spaces when counting words.

Over the last couple of days I've seen numerous examples of people posting about how to count words in a sentence. Disturbingly, these postings recommend suggest counting the number of spaces in the sentence and use that as the basis of a word count.You may be asking why this is a problem. Well,...




Over the last couple of days I've seen numerous examples of people posting about how to count words in a sentence. Disturbingly, these postings recommend suggest counting the number of spaces in the sentence and use that as the basis of a word count.

You may be asking why this is a problem. Well, consider the following sentence:
The total number of words    \t     in this sentence,is 10.


As you can see, simply counting spaces isn't going to work. There's the special characters (the \t) to take care of, the multiple spaces, and the words separated by a comma without a space. So, if counting spaces doesn't work, what does? The answer is to use a regular expression, and you are going to love how simple it is. There's a simple regular expression that matches words, and takes care of all the guff demonstrated above; all you need to match a word is use \w+. Here's a quick sample:


Regex regex = new Regex("\\w+",  RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.CultureInvariant); string input = "The total number of words       \t        this sentence is 10."; MatchCollection match = regex.Matches(input);