I’ve spent a few days trying to solve the “Error parsing input URL, no data was scraped” Facebook scraping error, and finally found a solution which I’ll share with you through this post.
I wanted to make this blog more social, so I thought I could use some of the WordPress’s excellent social plugins. I’ve installed and setup the necessary plugins, and one of them was supposed to automatically post on my chosen social networks once a post got published.
Upon publishing my first post, it appeared on Twitter, but not on Facebook. Instead, when trying to share a post from this blog, the Facebook share box would display “404 Page not found” (in other cases it would say “Welcome to nginx!”) as the post preview.
Googling around I came across Facebook’s Open Graph Object Debugger or for short, “Facebook debugger”. This is a tool where you can input an URL and have the Facebook bot scrape it and report back on any errors it finds.
I entered my blog, and the response was “Error parsing input URL, no data was scraped.”.
As more searching suggested, this error can be triggered by a few things:
– site is on a spam list;
– invalid HTML markup;
– server refusing connections for non-browsers;
Searching for my domain on spam lists turned out no results.
Next, was invalid HTML markup. Someone said that if there are any markup errors the Facebook bot will stop and not scrape your site. I checked with the W3 Validator and found some errors.
I spent some time investigating my current theme, trying to fix the markup but with no success.
I also checked for server refusing connections for non-browsers, but this worked as expected.
Right when I was about to give up, because I was getting tired of investing so much time in something which should have worked right from the start, I found this Stackoverflow thread.
It turns out, that if you have IPv6 enabled, the Facebook bot will first try to use that address to scrape your site.
I’m using nginx as a web server, so in order to enable IPv6 support I had to add
to my nginx sites configuration file.
Going back to the Facebook debugger and fetching the new scrape information returned the correct response.
I hope this will be of help to anyone struggling with this error.