Functions In Python

Santhosh Kumar BVSRK
4 min readJun 20, 2021

Every Data Scientist will be using different in-built python functions for different purposes.

In this story I have tried to segregate all the python functions which can help as a quick learn/search.

unique() → This function gives unique values

nunique() → This function gives count of unique values in columns of a data frame

df.duplicated() → This function gives duplicate values in a data frame

df.dropduplicates() → this function drops all duplicate rows in a dataframe

head() → By default gives first 5 rows of a data frame

head(n) → Gives n number of rows of a data frame

tail() → By default gives the last 5 rows of a data frame

type() → data type of container. Container means list, array, tuple, series, dataframe

dtype() → data type of a column inside container

describe() → Gives statistical data like count, mean, median, SD, quartile values of a data set for only numerical columns in the data frame.

info() → Gives the number of rows & columns and their types in a data frame. Also gives the count of rows with data in every column, helping to identify which columns have null values.

mean() → this will give mean for a particular set of data (This can be used on a dataframe)

merge() → this is used to merge 2 data frames

sample(n) → this will print sample n rows of the dataframe.

value_counts() → this is like group by, we can fetch counts of all categories in a column

plot() → used for plotting graph

xlabel() → gives name for x-axis in your chart

ylabel() → gives name for y-axis in your chart

title() → gives title for your chart

savefig() → used for saving a chart figure on to your local disk

empty(3,2) → creates an empty array of desired size, here it’ll create an empty array of size 3*2

arange(10,25,5) → creates an array of 3 elements with a step size of 5 i.e., 10, 15, 20. In this array 25 will not be there as last element is excluded

linspace(10,25,5) → This generates 5 random numbers between 10 & 25 and the last random number would be the stop element i.e., 25 which is ideally ignored in range or arange.

astype() → converts data type from one format to another i.e., astype(int) converts a different type to int type

corrcoeff() → one of aggregate functions which will get correlation coefficient

std() → Standard deviation

median() → Mathematical Median

view() → creates a virtual array from the original array and any changes to view effects the original array

sort() → sorts an array

copy() → creates deep copy of array and changes made to new array won’t effect original array

resize() → reshapes the existing array to a new shape

append() → appends items to an array

title()/capitalize() → In series this converts elements of series to Init Caps

np.diff() → calculates n-th discrete difference along axis. It calculates arr[i+1]-arr[i]

np.sign() → fetches the sign of elements

np.where() → returns the indices of elements in an input array where the given condition is satisfied

np.is_busday() → returns all business days from the given input. Eg: np.is_busday(tmp) -this returns all business days from the dates mentioned inside tmp

np.busday_count() → returns count of number of business days in a given set of dates

np.pad() → does padding of rows and columns around an existing array

np.inner() → computes product of 2 inner arrays

eg: np.inner(a,b)=sum(a[:] * b[:])

series.str.contain() → used to check if a particular string is present in the series

df[‘columnname].pct_change() → this calculates the percentage change

retail_df = retail_df.loc[retail_df[“Invoice No”].str.startswith(‘C’, na=False)] → startswith function used here is used to fetch records for a particular column which have a value starting with that string.

Df1=Df.copy(deep=true) → this copies the data and structure of dataframe df to new dataframe df1

Style.highlight_max() → this function highlights max value in dataframe

df.str.extract → this function is used to extract data from columns in dataframe.

Let us say this is your data

Now, to extract data from this column of dataframe we’ll use extract as shown below

melt() → this function is used to change the DataFrame format from wide to long. It’s used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns — variable and value.

site.getsitepackages() → this function tells you the path where third-party packages are installed on your machine. For this we’ve to import site first

plt.legend(loc=”upper right”) → this function places the legend at upper right corner of your plot.

np.random.rand(5,5) → Generates a 5*5 matrix with random numbers

np.random.randn(5,5) → Generates a normally distributed 5*5 matrix with random numbers

NOTE: Will keep updating this story with few more functions.

--

--

Santhosh Kumar BVSRK

16+ Years in IT/ITES| ML & AI Enthusiast| Oracle Cloud Architect, Integration Specialist — BPEL, OSB, OIC, IICS| Database Scripting-SQL, PL/SQL, MySQL