CSV File Handling With Pandas Part II

in StemSocial8 months ago

In the last post we see about reading a CSV file using pandas library. In this post, we will talk about indexing the data (selecting rows and columns) from the dataframe using pandas. For this post as well, we will be using the same CSV file as we did in the previous post. If you need the link to the previous post you can find it at the end of this post.

image.png

Image Source

Say for example, I want to see every data of the fifth rows i.e. the fourth index. Then we can use iloc() to index that data. It stands for "integer location" and indexes the specific rows or columns using integer value.

titanic_data.iloc[4]

Here you can see the data in the fifth rows

image.png

If I want to print everything from fifth row till tenth row, then I can write the following code:

titanic_data.iloc[4:10]

image.png

If you want to print everything starting from the fifth row then you can write titanic_data.iloc[4:]. We can also reverse the data with this code: titanic_data.iloc[::-1]. See the output below for your reference.

image.png

Now lets say you want a specfic data like maybe the age of passengers in 15th row. By observing the above dataframe, you can notice that "Name" column is in index position 3. So we can write the following code to extract our data.

titanic_data.iloc[14,3]

You will get the following output:

image.png

Suppose we only want to display the first five rows and the first five columns. We can do this by writing titanic_data.iloc[0:5, 0:5]. The output will be as below:

image.png

Please notice that in all of the data extraction that I have done above using slicing method similar to that of lists and arrays, the stop position is exclusive.

Next similar to the iloc() function is a loc() function. Both of these functions are used for same purposes but iloc is used for position or location based indexing using integer where as loc is used for labelling indexing. For example in the above code we have extracted the name of passengers from the fifteenth row. Now using loc function to retrieve the same data, we need to use label as in below code:

titanic_data.loc[14,"Name"]

image.png

You can do many things with this function as you did with iloc function above. For example, if you want to get first five rows and columns using loc function, you can write: titanic_data.loc[0:6, 'PassengerId':'Age']. Now you will get the following desired output:

image.png


Previous Pandas Tutorial Post
  1. Introduction to Pandas Library and Series
  2. Pandas DataFrame
  3. CSV File Handling With Pandas