Using Java’s String.split() to divide a string delimited by whitespace AND other things.


I have a file containing lines that look like:

#### #.##% #.##% ## ###### someLongStringAtTheEnd
__ (leading blanks
where #### is an integer number and ##.## is a floating point number. The two floats are labeled as percentages by the literal percent signs.

As far as spaces go, .split( ” ” ) works just fine. If you want more than one ‘whitespace” you have two choices: Write your own RegExp naming each whitespace character after double back slashes to escape them from the Java compiler AND Java runtime:

…split( “[\\ \\t]+” );

catches both spaces and tabs…

However, the world of Regular Expressions offers something even neater: \s, which stands for all Ascii whitespace.

Of course, you still have to escape it
…split( “[\\s]” )

and add the percent sign, and a “+” outside the braces to allow it to take one or more of the specified characters:
…split( “[\\s%]+” )

Here is a nice discussion of whitespace delimiters, larger-than-ascii character sets, etc.:
http://stackoverflow.com/questions/1822772/java-regular-expression-to-match-all-whitespace-characters

Bill

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s