I am using BeautifulSoup and parsing some HTMLs.
I’m getting a certain data from each HTML (using for loop) and adding that data to a certain list.
The problem is, some of the HTMLs have different format (and they don’t have the data that I want in them).
So, I was trying to use exception handling and add value null to the list (I should do this since the sequence of data is important.)
For instance, I have a code like:
soup = BeautifulSoup(links)
dlist = soup.findAll('dd', 'title')
# I'm trying to find content between <dd class='title'> and </dd>
gotdata = dlist[1]
# and what i want is the 2nd content of those
newlist.append(gotdata)
# and I add that to a newlist
and some of the links don’t have any <dd class='title'>, so what I want to do is add string null to the list instead.
The error appears:
list index out of range.
What I have done tried is to add some lines like this:
if not dlist[1]:
newlist.append('null')
continue
But it doesn’t work out. It still shows error:
list index out of range.
What should I do about this? Should I use exception handling? or is there any easier way?
Any suggestions? Any help would be really great!
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Handling the exception is the way to go:
try:
gotdata = dlist[1]
except IndexError:
gotdata = 'null'
Of course you could also check the len() of dlist; but handling the exception is more intuitive.
Method 2
You have two options; either handle the exception or test the length:
if len(dlist) > 1:
newlist.append(dlist[1])
continue
or
try:
newlist.append(dlist[1])
except IndexError:
pass
continue
Use the first if there often is no second item, the second if there sometimes is no second item.
Method 3
A ternary will suffice. change:
gotdata = dlist[1]
to
gotdata = dlist[1] if len(dlist) > 1 else 'null'
this is a shorter way of expressing
if len(dlist) > 1:
gotdata = dlist[1]
else:
gotdata = 'null'
Method 4
Taking reference of ThiefMaster♦ sometimes we get an error with value given as ‘n’ or null and perform for that required to handle ValueError:
Handling the exception is the way to go
try:
gotdata = dlist[1]
except (IndexError, ValueError):
gotdata = 'null'
Method 5
For anyone interested in a shorter way:
gotdata = len(dlist)>1 and dlist[1] or 'null'
But for best performance, I suggest using False instead of 'null', then a one line test will suffice:
gotdata = len(dlist)>1 and dlist[1]
Method 6
for i in range (1, len(list))
try:
print (list[i])
except ValueError:
print("Error Value.")
except indexError:
print("Erorr index")
except :
print('error ')
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0