0 comments on “Python 3.x”

Python 3.x

Last Updated: 24-Nov, 2018

Why do people prefer Pandas over SQL

You can probably have many technical discussions around this, but I’m considering the user perspective below.

One simple reason why you may see a lot more questions around Pandas data manipulation as opposed to SQL is that to use SQL, by definition, means using a database, and a lot of use-cases these days quite simply require bits of data for ‘one-and-done’ tasks (from .csv, web api, etc.). In these cases loading, storing, manipulating and extracting from a database is not viable.

However, considering cases where the use-case may justify using either Pandas or SQL, you’re certainly not wrong. If you want to do many, repetitive data manipulation tasks and persist the outputs, I’d always recommend trying to go via SQL first. From what I’ve seen the reason why many users, even in these cases, don’t go via SQL is two-fold.

Firstly, the major advantage pandas has over SQL is that it’s part of the wider Python universe, which means in one fell swoop I can load, clean, manipulate, and visualize my data (I can even execute SQL through Pandas…). The other is, quite simply, that all too many users don’t know the extent of SQL’s capabilities. Every beginner learns the ‘extraction syntax’ of SQL (SELECT, FROM, WHERE, etc.) as a means to get your data from a DB to the next place. Some may pick up some of the more advance grouping and iteration syntax. But after that there tends to be a pretty significant gulf in knowledge, until you get to the experts (DBA, Data Engineers, etc.).

It’s often down to the use-case, convenience, or a gap in knowledge around the extent of SQL’s capabilities.

1. How to use value if not null else use value from next column in pandas?

2. How to rename columns in pandas?

3. How to sort a dictionary by value?

 

1. How to use value if NOT NULL else use value from Next Column

How to Use Value

if

Not Null,

Else

Use Value From Next Column

Given the following dataframe:

import pandas as pd
df = pd.DataFrame({'COL1': ['A', np.nan,'A'], 
                   'COL2' : [np.nan,'A','A']})
df
    COL1    COL2
0    A      NaN
1    NaN    A
2    A      A

How to create a column (‘COL3’) that uses the value from COL1 per row unless that value is null (or NaN). If the value is null (or NaN), how to use the value from COL2.

The desired result is:

   COL1    COL2   COL3
0    A      NaN    A
1    NaN    A      A
2    A      A      A

SOLUTION:

In [8]: df
Out[8]:
  COL1 COL2
0    A  NaN
1  NaN    B
2    A    B

In [9]: df["COL3"] = df["COL1"].fillna(df["COL2"])

In [10]: df
Out[10]:
  COL1 COL2 COL3
0    A  NaN    A
1  NaN    B    B
2    A    B    A

2. How to rename columns in pandas

I have a DataFrame using pandas and column labels that I need to edit to replace the original column labels.

I’d like to change the column names in a DataFrame A where the original column names are:

['$a', '$b', '$c', '$d', '$e'] 

to

['a', 'b', 'c', 'd', 'e'].

I have the edited column names stored it in a list, but I don’t know how to replace the column names.

SOLUTION:

Just assign it to the .columns attribute:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df.columns = ['a', 'b']
>>> df
   a   b
0  1  10
1  2  20

 

3. How to sort a dictionary by value

I have a dictionary of values read from two fields in a database: a string field and a numeric field. The string field is unique, so that is the key of the dictionary.

I can sort on the keys, but how can I sort based on the values?

SOLUTION

It is not possible to sort a dictionary, only to get a representation of a dictionary that is sorted. Dictionaries are inherently orderless, but other types, such as lists and tuples, are not. So you need an ordered data type to represent sorted values, which will be a list—probably a list of tuples.

For instance,

import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(1))

sorted_x will be a list of tuples sorted by the second element in each tuple. dict(sorted_x) == x.

And for those wishing to sort on keys instead of values:

import operator
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_x = sorted(x.items(), key=operator.itemgetter(0))

In Python3 since unpacking is not allowed we can use

x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}
sorted_by_value = sorted(x.items(), key=lambda kv: kv[1])