In Chapter 3, I learned how to use Python to deal with and clean data, using two important libraries: NumPy and Pandas. NumPy excels at performing mathematical operations on large collections of numbers, whereas Pandas assists in organizing, correcting, and cleaning up unorganized data. Because it can handle far larger files, automate repetitive activities, and give you much more control, Python is far superior to spreadsheets alone. Additionally, connecting to additional tools is simple in case you wish to undertake more complex tasks in the future.
For the Chapter 3 exercise, I used a video game sales dataset in Google Collab. First, I uploaded the file and looked at the first few rows to get an idea of what the data looked like. Then I cleaned it up by getting rid of any rows that were missing information and deleting any duplicates to keep everything neat and correct. I also changed some of the column names to make the data easier to read and work with.
After cleaning the data, I grouped it in different ways to find some cool insights, like which gaming platforms sold the most, which genres were the most popular, and which video games had the highest sales overall. Then, I used NumPy to figure out the average sales for each genre and how much the sales numbers changed from game to game.
Overall, this exercise showed me how useful Pandas really is when working with data. Pandas makes it easy to clean up messy datasets, organize information, and find important insights without a lot of complicated steps.
No comments:
Post a Comment