How to get everything after last slash in a URL?

How can I extract whatever follows the last slash in a URL in Python? For example, these URLs should return the following:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

I’ve tried urlparse, but that gives me the full path filename, such as page/page/12345.

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Method 9

Method 10

Method 11

Method 12

Method 13

Method 14

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You don’t need fancy things, just see the string methods in the standard library and you can easily split your url between ‘filename’ part and the rest:

url.rsplit('/', 1)

So you can get the part you’re interested in simply with:

url.rsplit('/', 1)[-1]

Method 2

One more (idio(ma)tic) way:

URL.split("/")[-1]

Method 3

rsplit should be up to the task:

In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'

Method 4

You can do like this:

head, tail = os.path.split(url)

Where tail will be your file name.

Method 5

urlparse is fine to use if you want to (say, to get rid of any query string parameters).

import urllib.parse

urls = [
    'http://www.test.com/TEST1',
    'http://www.test.com/page/TEST2',
    'http://www.test.com/page/page/12345',
    'http://www.test.com/page/page/12345?abc=123'
]

for i in urls:
    url_parts = urllib.parse.urlparse(i)
    path_parts = url_parts[2].rpartition('/')
    print('URL: {}nreturns: {}n'.format(i, path_parts[2]))

Output:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

URL: http://www.test.com/page/page/12345?abc=123
returns: 12345

Method 6

os.path.basename(os.path.normpath('/folderA/folderB/folderC/folderD/'))

>>> folderD

Method 7

Here’s a more general, regex way of doing this:

    re.sub(r'^.+/([^/]+)$', r'1', url)

Method 8

First extract the path element from the URL:

from urllib.parse import urlparse
parsed= urlparse('https://www.dummy.example/this/is/PATH?q=/a/b&r=5#asx')

and then you can extract the last segment with string functions:

parsed.path.rpartition('/')[2]

(example resulting to 'PATH')

Method 9

Use urlparse to get just the path and then split the path you get from it on / characters:

from urllib.parse import urlparse

my_url = "http://example.com/some/path/last?somequery=param"
last_path_fragment = urlparse(my_url).path.split('/')[-1]  # returns 'last'

Note: if your url ends with a / character, the above will return '' (i.e. the empty string). If you want to handle that case differently, you need to strip the last trailing / character before you split the path:

my_url = "http://example.com/last/"
# handle URL ending in `/` by removing it.
last_path_fragment = urlparse(my_url).path.rstrip('/', 1).split('/')[-1]  # returns 'last'

Method 10

The following solution, which uses pathlib to parse the path obtained from urllib.parse allows to get the last part even when a terminal slash is present:

import urllib.parse
from pathlib import Path

urls = [
    "http://www.test.invalid/demo",
    "http://www.test.invalid/parent/child",
    "http://www.test.invalid/terminal-slash/",
    "http://www.test.invalid/query-params?abc=123&works=yes",
    "http://www.test.invalid/fragment#70446893",
    "http://www.test.invalid/has/all/?abc=123&works=yes#70446893",
]

for url in urls:
    url_path = Path(urllib.parse.urlparse(url).path)
    last_part = url_path.name  # use .stem to cut file extensions
    print(f"{last_part=}")

yields:

last_part='demo'
last_part='child'
last_part='terminal-slash'
last_part='query-params'
last_part='fragment'
last_part='all'

Method 11

extracted_url = url[url.rfind("/")+1:];

Method 12

Split the url and pop the last element
url.split('/').pop()

Method 13

Split the URL and pop the last element

const plants = ['broccoli', 'cauliflower', 'cabbage', 'kale', 'tomato'];

console.log(plants.pop());
// expected output: "tomato"

console.log(plants);
// expected output: Array ["broccoli", "cauliflower", "cabbage", "kale"]

Method 14

url ='http://www.test.com/page/TEST2'.split('/')[4]
print url

Output: TEST2.

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating