How can I extract whatever follows the last slash in a URL in Python? For example, these URLs should return the following:
URL: http://www.test.com/TEST1 returns: TEST1 URL: http://www.test.com/page/TEST2 returns: TEST2 URL: http://www.test.com/page/page/12345 returns: 12345
I’ve tried urlparse, but that gives me the full path filename, such as page/page/12345.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You don’t need fancy things, just see the string methods in the standard library and you can easily split your url between ‘filename’ part and the rest:
url.rsplit('/', 1)
So you can get the part you’re interested in simply with:
url.rsplit('/', 1)[-1]
Method 2
One more (idio(ma)tic) way:
URL.split("/")[-1]
Method 3
rsplit should be up to the task:
In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'
Method 4
You can do like this:
head, tail = os.path.split(url)
Where tail will be your file name.
Method 5
urlparse is fine to use if you want to (say, to get rid of any query string parameters).
import urllib.parse
urls = [
'http://www.test.com/TEST1',
'http://www.test.com/page/TEST2',
'http://www.test.com/page/page/12345',
'http://www.test.com/page/page/12345?abc=123'
]
for i in urls:
url_parts = urllib.parse.urlparse(i)
path_parts = url_parts[2].rpartition('/')
print('URL: {}nreturns: {}n'.format(i, path_parts[2]))
Output:
URL: http://www.test.com/TEST1 returns: TEST1 URL: http://www.test.com/page/TEST2 returns: TEST2 URL: http://www.test.com/page/page/12345 returns: 12345 URL: http://www.test.com/page/page/12345?abc=123 returns: 12345
Method 6
os.path.basename(os.path.normpath('/folderA/folderB/folderC/folderD/'))
>>> folderD
Method 7
Here’s a more general, regex way of doing this:
re.sub(r'^.+/([^/]+)$', r'1', url)
Method 8
First extract the path element from the URL:
from urllib.parse import urlparse
parsed= urlparse('https://www.dummy.example/this/is/PATH?q=/a/b&r=5#asx')
and then you can extract the last segment with string functions:
parsed.path.rpartition('/')[2]
(example resulting to 'PATH')
Method 9
Use urlparse to get just the path and then split the path you get from it on / characters:
from urllib.parse import urlparse
my_url = "http://example.com/some/path/last?somequery=param"
last_path_fragment = urlparse(my_url).path.split('/')[-1] # returns 'last'
Note: if your url ends with a / character, the above will return '' (i.e. the empty string). If you want to handle that case differently, you need to strip the last trailing / character before you split the path:
my_url = "http://example.com/last/"
# handle URL ending in `/` by removing it.
last_path_fragment = urlparse(my_url).path.rstrip('/', 1).split('/')[-1] # returns 'last'
Method 10
The following solution, which uses pathlib to parse the path obtained from urllib.parse allows to get the last part even when a terminal slash is present:
import urllib.parse
from pathlib import Path
urls = [
"http://www.test.invalid/demo",
"http://www.test.invalid/parent/child",
"http://www.test.invalid/terminal-slash/",
"http://www.test.invalid/query-params?abc=123&works=yes",
"http://www.test.invalid/fragment#70446893",
"http://www.test.invalid/has/all/?abc=123&works=yes#70446893",
]
for url in urls:
url_path = Path(urllib.parse.urlparse(url).path)
last_part = url_path.name # use .stem to cut file extensions
print(f"{last_part=}")
yields:
last_part='demo'
last_part='child'
last_part='terminal-slash'
last_part='query-params'
last_part='fragment'
last_part='all'
Method 11
extracted_url = url[url.rfind("/")+1:];
Method 12
Split the url and pop the last element
url.split('/').pop()
Method 13
Split the URL and pop the last element
const plants = ['broccoli', 'cauliflower', 'cabbage', 'kale', 'tomato'];
console.log(plants.pop());
// expected output: "tomato"
console.log(plants);
// expected output: Array ["broccoli", "cauliflower", "cabbage", "kale"]
Method 14
url ='http://www.test.com/page/TEST2'.split('/')[4]
print url
Output: TEST2.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0