I'm trying to make a simple scraping loop to pick up titles from dynamic pages. I've made a small script that works the way I expected. Here is the working script:
from selenium import webdriverdriver = webdriver.Chrome('C:/Users/user/Downloads/chromedriver_win32/chromedriver.exe')url = "https://www.youtube.com/user/LinusTechTips/videos"driver.get(url)videos = driver.find_elements_by_xpath('.//*[@id="dismissable"]')for video in videos: title = video.find_element_by_xpath('.//*[@id="video-title"]').text print(title)
It correctly crawls through divs containing titles and other details and scrapes titles. But this script only seems to work on youtube. I've tried it on craigslist, amazon, bookstoscrape, rightmove and hostelworld but it doesn't seem to work on any of those pages. Here is the script for hostelworld:
from selenium import webdriverdriver = webdriver.Chrome('C:/Users/user/Downloads/chromedriver_win32/chromedriver.exe')url = "https://www.hostelworld.com/s? q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08- 14&to=2020-08-16&guests=2&page=1"driver.get(url)cards = driver.find_elements_by_xpath('.//*[@id="__layout"]/div/div[1]/div[4]/div/div/div[3]')for card in cards: title = card.find_element_by_xpath('.//* [@id="__layout"]/div/div[1]/div[4]/div/div/div[3]/div[2]/div[1]/h2/a').text print(title)
I'm pretty sure the cards class name is correct from finding it with a search in Chrome dev tools. I think title xpath is correct because it prints correctly if I use it outside the loop. I think the loop is correct too because if I change the cards variable to:
cards = driver.find_elements_by_class_name('property-card')
it prints title once for every card on the page.
But when I add .
to the title xpath it returns an error saying "Message: no such element: Unable to locate element: ...". I'm using .
to prepend the expression so it only searches the parent element getting iterated through, not the whole page. But for some reason adding .
throws the error on all websites I tried except youtube.
I'm trying to stick to xpaths as much as possible because not all websites have good class and id conventions.