BeautifulSoup webscrape .asp only searches last in list

def get_NYSE_tickers():

 an = ['A', 'B', 'C', 'D', 'E', 'F', 'H', 'I', 'J', 'K', 'L',
       'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
       'X', 'Y', 'Z', '0']

 for value in an:
     resp = requests.get(
         'https://www.advfn.com/nyse/newyorkstockexchange.asp?companies={}'.format(value))
     soup = bs.BeautifulSoup(resp.text, 'lxml')
     table = soup.find('table', class_='market tab1')
     tickers = []
     for row in table.findAll('tr', class_='ts1',)[0:]:
         ticker = row.findAll('td')[1].text
         tickers.append(ticker)
     for row in table.findAll('tr', class_='ts0',)[0:]:
         ticker = row.findAll('td')[1].text
         tickers.append(ticker)
     with open("NYSE.pickle", "wb") as f:
         while("" in tickers):
             tickers.remove("")
         pickle.dump(tickers, f)

 print(tickers)


get_NYSE_tickers()

My problem is that when I run this script my output is only the data contained in the ‘0’ page. Its always the last value in the list.

Also I would like to know if there is a way to combine the

for row in table.findAll('tr', class_='ts1',)[0:]:
         ticker = row.findAll('td')[1].text
         tickers.append(ticker)
     for row in table.findAll('tr', class_='ts0',)[0:]:
         ticker = row.findAll('td')[1].text
         tickers.append(ticker)

into one block of code as class_=’ts0′,’ts1′ doesn’t seem to quite get it.

I would like to see all ticker symbols from
https://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A, https://www.advfn.com/nyse/newyorkstockexchange.asp?companies=B,
https://www.advfn.com/nyse/newyorkstockexchange.asp?companies=C etc. etc.

in a single pickle or csv file:

“[‘AVX’, ‘AHC’, ‘RNT’, ‘AAN’, ‘AXF’, ‘DVK’, ‘RCW’, ‘SAD’, ‘ABB’, ‘ANF’, ‘ABM’, ‘IMW’, ‘SZM’, ‘SZI’, ‘ICT’, ‘ACN’, ‘ABD’, ‘ATN’, ‘AYI’, ‘AEA’, ‘ASX’, ‘ACM’, ‘AEG’, ‘AEB’, ‘AEH’, ‘AET’, ‘AMG’, ‘AG’, ‘A’, ‘ADC’, ‘AGU’, ‘APD’, ‘ARG’, ‘AQD’, ‘ALZ’, ‘ALF’, ‘ALK’, ‘ALB’, ‘ALU’, ‘ACL’, ‘AXB’, ‘ARE’, ‘AYE’, ‘AGN’, ‘AMO’, ‘ADS’, ‘AOI’, ‘AZM’, ‘AIB’, ‘ALY’, ‘ALM’, ‘ALJ’, ‘AWP’, ‘AMB’, ‘AKT’, ‘ACO’, ‘HES’, ‘AMX’, ‘ACC’, ‘AEP’, ‘AXP’, ‘AFE’, ‘AIG’, ‘AVF’, ‘AOB’, ‘ARP’, ‘AWR’, ‘AVD’, ‘ACF’, ‘AGP’, ‘AMN’, ‘AHS’, ‘AP’, ‘AXR’, ‘AU’, ‘NLY’, ‘ATV’, ‘ANH’, ‘APA’, ‘AIT’, ‘WTR’, ‘ARB’, ‘ARJ’, ‘ADM’, ‘AWI’, ‘ARM’, ‘AHT’, ‘AHL’, ‘AGO’, ‘AZN’, ‘ATO’, ‘ATT’, ‘AUO’, ‘AN’, ‘NEH’, ‘AVR’, ‘AXA’, ‘AZZ’, ‘AKS’, ‘AAR’, ‘AIR’, ‘RNT.A’, ‘AAN.A’, ‘CBJ’, ‘SQT’, ‘IWK’, ‘EOA’, ‘REU’, ‘MHG’, ‘ABT’, ‘AKR’, ‘BJV’, ‘ODY’, ‘RDF’, ‘MKY’, ‘BFN’, ‘ACE’, ‘ATU’, ‘ADX’, ‘ASF’, ‘AAP’, ‘AEO’, ‘AEV’, ‘AED’, ‘AER’, ‘AES’, ‘ACS’, ‘AFL’, ‘AGCO’, ‘AEM’, ‘GRO’, ‘NOW’, ‘AYR’, ‘AAI’, ‘ALQ’, ‘ABA’, ‘ALG’, ‘AIN’, ‘ACV’, ‘AA’, ‘AFN’, ‘ALX’, ‘Y’, ‘ATI’, ‘ALE’, ‘AB’, ‘AZ’, ‘AFC’, ‘ALL’, ‘ANR’, ‘MO’, ‘ACH’, ‘ABK’, ‘AKF’, ‘AEE’, ‘AXL’, ‘ADY’, ‘AEL’, ‘AFG’, ‘AM’, ‘AFF’, ‘ANL’, ‘ARL’, ‘ASI’, ‘AMT’, ‘AWK’, ‘APU’, ‘ABC’, ‘AME’, ‘AMP’, ‘APH’, ‘APC’, ‘AGL’, ‘AXE’, ‘AHR’, ‘AOC’, ‘AIV’, ‘ATR’, ‘ARA’, ‘ABR’, ‘ACI’, ‘ARD’, ‘ARW’, ‘ABG’, ‘ASH’, ‘ALC’, ‘AIZ’, ‘AF’, ‘ATG’, ‘AHD’, ‘T’, ‘ATW’, ‘ALV’, ‘AZO’, ‘AVB’, ‘AVY’, ‘AVA’, ‘AVP’, ‘AXS’ …]”

Contents hide

Answers:

Method 1

Method 2

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I simply changed resp.text to resp.content and it printed all of them.
Also your emptying the array every time. Put the tickers[] outside the loop or print it everytime in the loop.

from bs4 import BeautifulSoup 
import requests

def get_NYSE_tickers():

 an = ['A', 'B', 'C', 'D', 'E', 'F', 'H', 'I', 'J', 'K', 'L',
       'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
       'X', 'Y', 'Z', '0']
 tickers = []
 for value in an:
     resp = requests.get(
         'https://www.advfn.com/nyse/newyorkstockexchange.asp?companies={}'.format(value))
     soup =  BeautifulSoup(resp.content, 'lxml')
     table = soup.find('table', class_='market tab1')
     for row in table.findAll('tr', class_='ts1')[0:]:
         ticker = row.findAll('td')[1].text
         tickers.append(ticker)
     for row in table.findAll('tr', class_='ts0')[0:]:
         ticker = row.findAll('td')[1].text
         tickers.append(ticker)
print(tickers)
get_NYSE_tickers()

Method 2

import requests
from bs4 import BeautifulSoup
from string import ascii_uppercase
import pandas as pd


goals = list(ascii_uppercase)


def main(url):
    with requests.Session() as req:
        allin = []
        for goal in goals:
            r = req.get(url.format(goal))
            df = pd.read_html(r.content, header=1)[-1]
            target = df['Symbol'].tolist()
            allin.extend(target)
    print(allin)


main("https://www.advfn.com/nyse/newyorkstockexchange.asp?companies={}")

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating