Report

Cleaning and Structuring the Data

Min Experience

0 years

Location

remote

JobType

internship

Overview

About the role

Your manager is impressed with your progress but points out that the data is messy. Before we can analyze it effectively, we need to clean and structure the data properly. Your task is to: Handle missing values Remove duplicate or inconsistent data Standardize the data format Let's get started! Task 1: Identify Issues in the Data Your manager provides you with an example dataset where some records are incomplete or incorrect. Here's an example: { "users": [ {"id": 1, "name": "Amit", "friends": [2, 3], "liked_pages": [101]}, {"id": 2, "name": "Priya", "friends": [1, 4], "liked_pages": [102]}, {"id": 3, "name": "", "friends": [1], "liked_pages": [101, 103]}, {"id": 4, "name": "Sara", "friends": [2, 2], "liked_pages": [104]}, {"id": 5, "name": "Amit", "friends": [], "liked_pages": []} ], "pages": [ {"id": 101, "name": "Python Developers"}, {"id": 102, "name": "Data Science Enthusiasts"}, {"id": 103, "name": "AI & ML Community"}, {"id": 104, "name": "Web Dev Hub"}, {"id": 104, "name": "Web Development"} ] } Problems: User ID 3 has an empty name. User ID 4 has a duplicate friend entry. User ID 5 has no connections or liked pages (inactive user). The pages list contains duplicate page IDs. Task 2: Clean the Data We will: Remove users with missing names. Remove duplicate friend entries. Remove inactive users (users with no friends and no liked pages). Deduplicate pages based on IDs.

About the company

CodeWithHarry is an online platform that provides programming tutorials and courses.

Skills

python

data science

data cleaning

data structuring