Data Manipulation#
Rename Columns#
The columns can be renamed directly:
df3.rename(columns = {"Product": "Computer Part"})
df3
Computer Part Price
0 Tablet 250
1 Printer 100
2 Laptop 1200
3 Monitor 300
4 Mouse 10
Use the pandas loc function to locate the required rows:
df3.loc[df3['Product'] == 'Printer']
Product Price
1 Printer 100
The method used a comparison, and returns the row data, this can be extended to use other comparisons. If the data has been cleaned one can make the search more flexible:
df3[df3['Product'].str.contains('Printer')]
Product Price
1 Printer 100
The relevant row can be located using the index:
df3.loc[1, ['Price']] = [95]
df3
Product Price
0 Tablet 250
1 Printer 95
2 Laptop 1200
3 UHD Screen 400
One can separately specify the condition:
update = df3['Product'] == 'Printer'
update
0 False
1 True
2 False
3 False
df3.loc[update, ['Price']] = [105]
df3
Product Price
0 Tablet 250
1 Printer 105
2 Laptop 1200
3 UHD Screen 400
update is only a handle or alias and not an action.
Wide to Long Form#
Melt#
Pandas melt is a generic command to change the format:
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D'],
'points': [88, 91, 99, 94],
'assists': [12, 17, 24, 28],
'rebounds': [22, 28, 30, 31]})
#view DataFrame
df
team points assists rebounds
0 A 88 12 22
1 B 91 17 28
2 C 99 24 30
3 D 94 28 31
#reshape DataFrame from wide format to long format
df = pd.melt(df, id_vars='team', value_vars=['points', 'assists', 'rebounds'])
df
team variable value
0 A points 88
1 B points 91
2 C points 99
3 D points 94
4 A assists 12
5 B assists 17
6 C assists 24
7 D assists 28
8 A rebounds 22
9 B rebounds 28
10 C rebounds 30
11 D rebounds 31
The column team was the identifier, remaining a column in the long form, whilst the former columns points, assists and rebounds became components of the new columns variable and value. The column names variable and value can be customised:
pd.melt(df, id_vars='team', value_vars=['points', 'assists', 'rebounds'],
var_name='New_var', value_name='New_val')
team New_var New_val
0 A points 88
1 B points 91
2 C points 99
3 D points 94
4 A assists 12
5 B assists 17
6 C assists 24
7 D assists 28
8 A rebounds 22
9 B rebounds 28
10 C rebounds 30
11 D rebounds 31
wide_to_long#
There is an explicit pandas command to change the data form from wide to long
np.random.seed(123)
df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"},
"A1980" : {0 : "d", 1 : "e", 2 : "f"},
"B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
"B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
"X" : dict(zip(range(3), np.random.randn(3)))
})
df
A1970 A1980 B1970 B1980 X id
0 a d 2.5 3.2 -1.085631 0
1 b e 1.2 1.3 0.997345 1
2 c f 0.7 0.1 0.282978 2
pd.wide_to_long(df, ["A", "B"], i="id", j="year")
X A B
id year
0 1970 -1.085631 a 2.5
1 1970 0.997345 b 1.2
2 1970 0.282978 c 0.7
0 1980 -1.085631 d 3.2
1 1980 0.997345 e 1.3
2 1980 0.282978 f 0.1
All extra variables are left untouched. This simply uses pandas.melt under the hood, but is hard-coded to “do the right thing” in a typical case.
Long to Wide#
pivot#
The long format can be made into a wide format by pivot:
df
team New_var New_val
0 A points 88
1 B points 91
2 C points 99
3 D points 94
4 A assists 12
5 B assists 17
6 C assists 24
7 D assists 28
8 A rebounds 22
9 B rebounds 28
10 C rebounds 30
11 D rebounds 31
df=pd.pivot(df, index='team', columns='New_var', values='New_val')
df
New_var assists points rebounds
team
A 12 88 22
B 17 91 28
C 24 99 30
D 28 94 31
team became the index, the contents of New_var (points, assists and rebounds) became the new columns, whilst the contents of New_val, became the values inside the dataframe.