Is there a Python function that will trim whitespace (spaces and tabs) from a string?
So that given input " t example stringt " becomes "example string".
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
For whitespace on both sides, use str.strip:
s = " t a string examplet " s = s.strip()
For whitespace on the right side, use str.rstrip:
s = s.rstrip()
For whitespace on the left side, use str.lstrip:
s = s.lstrip()
You can provide an argument to strip arbitrary characters to any of these functions, like this:
s = s.strip(' tnr')
This will strip any space, t, n, or r characters from both sides of the string.
The examples above only remove strings from the left-hand and right-hand sides of strings. If you want to also remove characters from the middle of a string, try re.sub:
import re
print(re.sub('[s+]', '', s))
That should print out:
astringexample
Method 2
In Python trim methods are named strip:
str.strip() # trim str.lstrip() # left trim str.rstrip() # right trim
Method 3
For leading and trailing whitespace:
s = ' foo t ' print s.strip() # prints "foo"
Otherwise, a regular expression works:
import re
pat = re.compile(r's+')
s = ' t foo t bar t '
print pat.sub('', s) # prints "foobar"
Method 4
You can also use very simple, and basic function: str.replace(), works with the whitespaces and tabs:
>>> whitespaces = " abcd ef gh ijkl "
>>> tabs = " abcde fgh ijkl"
>>> print whitespaces.replace(" ", "")
abcdefghijkl
>>> print tabs.replace(" ", "")
abcdefghijkl
Simple and easy.
Method 5
#how to trim a multi line string or a file
s=""" line one
tline twot
line three """
#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.
s1=s.splitlines()
print s1
[' line one', 'tline twot', 'line three ']
print [i.strip() for i in s1]
['line one', 'line two', 'line three']
#more details:
#we could also have used a forloop from the begining:
for line in s.splitlines():
line=line.strip()
process(line)
#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:
for line in my_file:
line=line.strip()
process(line)
#moot point: note splitlines() removed the newline characters, we can keep them by passing True:
#although split() will then remove them anyway..
s2=s.splitlines(True)
print s2
[' line onen', 'tline twotn', 'line three ']
Method 6
No one has posted these regex solutions yet.
Matching:
>>> import re
>>> p=re.compile('\s*(.*\S)?\s*')
>>> m=p.match(' t blah ')
>>> m.group(1)
'blah'
>>> m=p.match(' tbl ah t ')
>>> m.group(1)
'bl ah'
>>> m=p.match(' t ')
>>> print m.group(1)
None
Searching (you have to handle the “only spaces” input case differently):
>>> p1=re.compile('\S.*\S')
>>> m=p1.search(' tblah t ')
>>> m.group()
'blah'
>>> m=p1.search(' tbl ah t ')
>>> m.group()
'bl ah'
>>> m=p1.search(' t ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
If you use re.sub, you may remove inner whitespace, which could be undesirable.
Method 7
Whitespace includes space, tabs and CRLF. So an elegant and one-liner string function we can use is translate.
' hello apple'.translate(None, ' ntr')
OR if you want to be thorough
import string ' hello apple'.translate(None, string.whitespace)
Method 8
(re.sub(‘ +’, ‘ ‘,(my_str.replace(‘n’,’ ‘)))).strip()
This will remove all the unwanted spaces and newline characters. Hope this help
import re
my_str = ' a b n c '
formatted_str = (re.sub(' +', ' ',(my_str.replace('n',' ')))).strip()
This will result :
‘ a b n c ‘ will be changed to ‘a b c’
Method 9
something = "t please_ t remove_ all_ nnnnwhitespacesnt "
something = "".join(something.split())
output:
please_remove_all_whitespaces
Adding Le Droid’s comment to the answer.
To separate with a space:
something = "t please t remove all extra nnnnwhitespacesnt "
something = " ".join(something.split())
output:
please remove all extra whitespaces
Method 10
Having looked at quite a few solutions here with various degrees of understanding, I wondered what to do if the string was comma separated…
the problem
While trying to process a csv of contact information, I needed a solution this problem: trim extraneous whitespace and some junk, but preserve trailing commas, and internal whitespace. Working with a field containing notes on the contacts, I wanted to remove the garbage, leaving the good stuff. Trimming out all the punctuation and chaff, I didn’t want to lose the whitespace between compound tokens as I didn’t want to rebuild later.
regex and patterns: [s_]+?W+
The pattern looks for single instances of any whitespace character and the underscore (‘_’) from 1 to an unlimited number of times lazily (as few characters as possible) with [s_]+? that come before non-word characters occurring from 1 to an unlimited amount of time with this: W+ (is equivalent to [^a-zA-Z0-9_]). Specifically, this finds swaths of whitespace: null characters (), tabs (t), newlines (n), feed-forward (f), carriage returns (r).
I see the advantage to this as two-fold:
- that it doesn’t remove whitespace between the complete words/tokens that you might want to keep together;
-
Python’s built in string method
strip()doesn’t deal inside the string, just the left and right ends, and default arg is null characters (see below example: several newlines are in the text, andstrip()does not remove them all while the regex pattern does).text.strip(' ntr')
This goes beyond the OPs question, but I think there are plenty of cases where we might have odd, pathological instances within the text data, as I did (some how the escape characters ended up in some of the text). Moreover, in list-like strings, we don’t want to eliminate the delimiter unless the delimiter separates two whitespace characters or some non-word character, like ‘-,’ or ‘-, ,,,’.
NB: Not talking about the delimiter of the CSV itself. Only of instances within the CSV where the data is list-like, ie is a c.s. string of substrings.
Full disclosure: I’ve only been manipulating text for about a month, and regex only the last two weeks, so I’m sure there are some nuances I’m missing. That said, for smaller collections of strings (mine are in a dataframe of 12,000 rows and 40 odd columns), as a final step after a pass for removal of extraneous characters, this works exceptionally well, especially if you introduce some additional whitespace where you want to separate text joined by a non-word character, but don’t want to add whitespace where there was none before.
An example:
import re
text = ""portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="066c6f6b2875696b636273626346646a676e646a676e2865696b">[email protected]</a>, ,dd invites,subscribed,, master, , , , dd invites,subscribed, , , , r, , , ff dd n invites, subscribed, , , , , alumni spring 2012 deck: https: www.dropbox.com s, n i69rpofhfsp9t7c practice 20ignition - 20june tn .2134.pdf 2109 nnnnklkjsdf""
print(f"Here is the text as formatted:n{text}n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[s_]+?W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:n{text}n")
clean_text = text.strip(' ntr') # strip out whitespace?
print()
print(f"Here is the text, formatted as is:n{clean_text}n")
print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)
This outputs:
Here is the text as formatted: "portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e08a898dce938f8d8584958485a0828c8188828c8188ce838f8d">[email protected]</a>, ,dd invites,subscribed,, master, , , , dd invites,subscribed, ,, , , ff dd invites, subscribed, , , , , alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition - 20june .2134.pdf 2109 klkjsdf" using regex to trim both the whitespaces and the non-word characters that follow them. "portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3e545753104d51535b5a4b5a5b7e5c525f565c525f56105d5153">[email protected]</a>, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf" Very nice. What about 'strip()'? Here is the text, formatted as is: "portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="761c1f1b5805191b131203121336141a171e141a171e5815191b">[email protected]</a>, ,dd invites,subscribed,, master, , , , dd invites,subscribed, ,, , , ff dd invites, subscribed, , , , , alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition - 20june .2134.pdf 2109 klkjsdf" Here is the text, after stipping with 'strip': "portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b5dfdcd89bc6dad8d0d1c0d1d0f5d7d9d4ddd7d9d4dd9bd6dad8">[email protected]</a>, ,dd invites,subscribed,, master, , , , dd invites,subscribed, ,, , , ff dd invites, subscribed, , , , , alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition - 20june .2134.pdf 2109 klkjsdf" Are 'text' and 'clean_text' unchanged? 'True'
So strip removes one whitespace from at a time. So in the OPs case, strip() is fine. but if things get any more complex, regex and a similar pattern may be of some value for more general settings.
Method 11
If using Python 3: In your print statement, finish with sep=””. That will separate out all of the spaces.
EXAMPLE:
txt="potatoes"
print("I love ",txt,"",sep="")
This will print:
I love potatoes.
Instead of:
I love potatoes .
In your case, since you would be trying to get ride of the t, do sep=”t”
Method 12
try translate
>>> import string
>>> print 'trn hello rn world trn'
hello
world
>>> tr = string.maketrans(string.whitespace, ' '*len(string.whitespace))
>>> 'trn hello rn world trn'.translate(tr)
' hello world '
>>> 'trn hello rn world trn'.translate(tr).replace(' ', '')
'helloworld'
Method 13
If you want to trim the whitespace off just the beginning and end of the string, you can do something like this:
some_string = " Hello, world!n " new_string = some_string.strip() # new_string is now "Hello, world!"
This works a lot like Qt’s QString::trimmed() method, in that it removes leading and trailing whitespace, while leaving internal whitespace alone.
But if you’d like something like Qt’s QString::simplified() method which not only removes leading and trailing whitespace, but also “squishes” all consecutive internal whitespace to one space character, you can use a combination of .split() and " ".join, like this:
some_string = "t Hello, nt world!n " new_string = " ".join(some_string.split()) # new_string is now "Hello, world!"
In this last example, each sequence of internal whitespace replaced with a single space, while still trimming the whitespace off the start and end of the string.
Method 14
Generally, I am using the following method:
>>> myStr = "Hin Stack Over r flow!"
>>> charList = [u"u005Cn",u"u005Cr",u"u005Ct"]
>>> import re
>>> for i in charList:
myStr = re.sub(i, r"", myStr)
>>> myStr
'Hi Stack Over flow'
Note: This is only for removing “n”, “r” and “t” only. It does not remove extra spaces.
Method 15
This will remove all whitespace and newlines from both the beginning and end of a string:
>>> s = " nt n some n text n "
>>> re.sub("^s+|s+$", "", s)
>>> "some n text"
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0