Want to learn more? Take the full course at https://campus.datacamp.com/courses/w... at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
--
These days, many of us are familiar with the letters "HTML", the Hypertext Markup Language read by web browsers to render and display website content.
For us, this means that when we want to scrape the content from a particular website, we are often given the HTML code, and therefore, the focus of this lesson will be learning to navigate HTML to arrive at the content we may be interested in retrieving. In doing so, we will use a simple toy model of HTML as we become comfortable with how the code looks behind the scenes, and see that we can picture a tree-like structure of HTML to easily interpret the task of HTML navigation.
Here we are given a first look at a very simple example of HTML. Throughout this lesson, we will be using this same bit of HTML while we gain intuition towards HTML navigation, before playing with much longer and more involved examples.
To give you an idea of what a web browser would show, on the right side is how Firefox displays this HTML on my computer.
The elements contained within the angle brackets are called "HTML tags", which, in well-formatted HTML, usually come in pairs. The pair contains a starting tag without a forward slash, and stopping tag with a forward slash.
The root tag containing the main HTML content are those with the text "html" within the brackets.
We also have a body tag, defining the body of the html; a div tag defining a section of the body; and several p tags defining paragraphs within the body.
We notice that the HTML tags are nested within others, such as the body tag nested within the html root tag; a div tag nested within the body tag; two paragraph tags nested with in the div tag; etc..
This nesting gives rise to a hierarchy in the HTML which can be visualized as a tree structure as displayed here. The vocabulary we use to describe moving around the tree comes from looking at a family tree: As we move from left to right, we are moving forward generations; as we move top to bottom, we are moving between the same generation, and moving between siblings if the elements come from the same parent element.
For example, the children of the body element are the same as the second generation descendants of the root html element; in order from top to bottom, div is the first child, p is the second. These two elements (the div and p elements) are siblings as they are both children of the same parent, the parent being the body element.
The two third generation descendants of the html element are circled here. Both of these elements are paragraph p elements, and both are children of the same div element parent, and hence siblings.
Note that these two circled paragraph elements are not descendants of the third, uncircled paragraph element. This is graphically represented by the fact that, although the two circled paragraph elements are further to the right on the tree, we cannot follow a path from the uncircled paragraph element to the two circled ones. This is analogous to saying that, even though your niece is a future generation, she is not your descendant.
To sum up, we got our first glimpse at HTML, introduced ourselves to html tags, visualized the HTML code as a tree, and learned how to describe navigating that tree's structure!
Time to give it a try!
#PythonTutorial #HyperText #Markup #DataCamp
Информация по комментариям в разработке