What is the correct approach on making a For Loop with a Cheerio Object?

Simply put, I’m scraping data from a website and storing it in a database.

The relevant fields are links, names, prices and item condition.

The way I’m handling this right now is to iterate through each Element and pushing them into their respective lists. Then adding it to a database with a For Loop. So, for example:

var names= [];
$(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .valtitle.lovewrap.padr4 .underlinedlinks").each(function(){
            names.push($(this).text());
        });
...
for (x in names){
                var sql = "REPLACE INTO `item` (`link`, `title`, `price`, `date`, `item_condition`, `country`) VALUES (?)";
                var values = [links[x], names[x], prices[x], '', states[x], cc];
            
                con.query(sql, [values], function(err, result){
                    if (err) throw err;
                    });
            }

This is very naive, as it hopes all Elements exist and that they align perfectly, which as worked well so far, until I’ve noticed some listings on the website I’m scraping do not have an Item Condition element, so it gets skipped and the lists get desynced, resulting in the wrong values being paired up.

I understand the answer I’m looking for has to do with the .each function, but I’m not exactly sure how to go about it. I suppose I have to go the highest point, it being .midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 and go from there. Adding a NULL value if it doesn’t find an Element.

Below is the full (relevant) code:

const $ = c.load(response.data);

        $(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .splittable .splittablecell1 .padr2.bhserp-txt1.bhserp-new1").each(function(){
            var fixedStr = $(this).text().replace(/,|£|$|s|[(GBP)]|[(USD)]/g, '');
            prices.push(Number(fixedStr));
        });

        $(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .valtitle.lovewrap.padr4 .underlinedlinks").each(function(){
            names.push($(this).text());
        });

        $(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .splittable .splittablecell1.bhserp-txt1 .padl1.labinfo").each(function(){
            if ($(this)){
                states.push($(this).text());
            }
            else{
                console.log("Mistake here, pick me up!"); // I understand what I'm doing here does not make sense and is wrong as I've stated, but since that's what made me realize what I needed to do, I'm leaving it.
                states.push("None");
            }
        });

        $(".midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2 .valtitle.lovewrap.padr4 .underlinedlinks").each(function(){
            var tempLink = $(this).attr('href');
            var fixedLinks = tempLink.split("=");
            var fixedLinks = fixedLinks[1].split("&");
            links.push("https://www.ebay.co.uk/itm/" + fixedLinks[0]);
        });
...
con.connect(function(err){
            if (err) throw err;
            console.log("Connected!");
            for (x in names){
                var sql = "REPLACE INTO `item` (`link`, `title`, `price`, `date`, `item_condition`, `country`) VALUES (?)";
                var values = [links[x], names[x], prices[x], '', states[x], cc];
            
                con.query(sql, [values], function(err, result){
                    if (err) throw err;
                    });
            }
        });

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You should iterate the elements. If you try to get prices separately from links you will have a bad experience. Something like:

for(let div of $('.product').get()){
  let item = {
    link: $(div).find('a').attr('href')
    price: $(div).find('.price').text(),
  }
  // insert item into the db
}

Method 2

pguardiario’s answer worked perfectly, I’ll leave here the code I ended up with for future reference:

for(let div of $('.midbox .framebox .frameboxcells .displaybox .displayboxbottom .dt.bg0 .serptablecell2-adv .serptablebasestyle2').get()){
        
        var tempLink = $(div).find('.underlinedlinks').attr('href');
        var fixedLinks = tempLink.split("=");
        var fixedLinks = fixedLinks[1].split("&");

        var fixedStr = $(div).find('.padr2.bhserp-txt1.bhserp-new1').text().replace(/,|£|$|s|[(GBP)]|[(USD)]/g, '');
        
        let item = {
            link: "https://www.ebay.co.uk/itm/" + fixedLinks[0],
            name: $(div).find('.valtitle.lovewrap.padr4 .underlinedlinks').text(),
            price: Number(fixedStr),
            condition: $(div).find('.padl1.labinfo').text()

        }
}


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x