Some time ago i wrote a post about web crawling using google´s api (See here). However, it lacks of HTML labels recognition support and it becomes tedious to find key components on web pages.
In this post, i will try to show you how to successfully recognize web page’s key HTML labels such as title, div, etc using a library named BeautifulSoup using the programming language Python. For this reason, we need to have basic HTML and python knowledge. For experiment purposes i will be using the native python installation on OSX 10.11.5 “El Capitan”.