Functions In Python

Every Data Scientist will be using different in-built python functions for different purposes.

In this story I have tried to segregate all the python functions which can help as a quick learn/search.

unique() → This function gives unique values

nunique() → This function gives count of unique values in columns of a data frame

df.duplicated() → This function gives duplicate values in a data frame

df.dropduplicates() → this function drops all duplicate rows in a dataframe

head() → By default gives first 5 rows of a data frame

head(n) → Gives n number of rows of a data frame

tail() → By default gives the last 5 rows of a data frame

type() → data type of container. Container means list, array, tuple, series, dataframe

dtype() → data type of a column inside container

describe() → Gives statistical data like count, mean, median, SD, quartile values of a data set for only numerical columns in the data frame.

info() → Gives the number of rows & columns and their types in a data frame. Also gives the count of rows with data in every column, helping to identify which columns have null values.

mean() → this will give mean for a particular set of data (This can be used on a dataframe)

merge() → this is used to merge 2 data frames

sample(n) → this will print sample n rows of the dataframe.

value_counts() → this is like group by, we can fetch counts of all categories in a column

plot() → used for plotting graph

xlabel() → gives name for x-axis in your chart

ylabel() → gives name for y-axis in your chart

title() → gives title for your chart

savefig() → used for saving a chart figure on to your local disk

empty(3,2) → creates an empty array of desired size, here it’ll create an empty array of size 3*2

arange(10,25,5) → creates an array of 3 elements with a step size of 5 i.e., 10, 15, 20. In this array 25 will not be there as last element is excluded

linspace(10,25,5) → This generates 5 random numbers between 10 & 25 and the last random number would be the stop element i.e., 25 which is ideally ignored in range or arange.

astype() → converts data type from one format to another i.e., astype(int) converts a different type to int type

corrcoeff() → one of aggregate functions which will get correlation coefficient

std() → Standard deviation

median() → Mathematical Median

view() → creates a virtual array from the original array and any changes to view effects the original array

sort() → sorts an array

copy() → creates deep copy of array and changes made to new array won’t effect original array

resize() → reshapes the existing array to a new shape

append() → appends items to an array

title()/capitalize() → In series this converts elements of series to Init Caps

np.diff() → calculates n-th discrete difference along axis. It calculates arr[i+1]-arr[i]

np.sign() → fetches the sign of elements

np.where() → returns the indices of elements in an input array where the given condition is satisfied

np.is_busday() → returns all business days from the given input. Eg: np.is_busday(tmp) -this returns all business days from the dates mentioned inside tmp

np.busday_count() → returns count of number of business days in a given set of dates

np.pad() → does padding of rows and columns around an existing array

np.inner() → computes product of 2 inner arrays

eg: np.inner(a,b)=sum(a[:] * b[:])

series.str.contain() → used to check if a particular string is present in the series

df[‘columnname].pct_change() → this calculates the percentage change

retail_df = retail_df.loc[retail_df[“Invoice No”].str.startswith(‘C’, na=False)] → startswith function used here is used to fetch records for a particular column which have a value starting with that string.

Df1=Df.copy(deep=true) → this copies the data and structure of dataframe df to new dataframe df1

Style.highlight_max() → this function highlights max value in dataframe

df.str.extract → this function is used to extract data from columns in dataframe.

Let us say this is your data

Now, to extract data from this column of dataframe we’ll use extract as shown below

melt() → this function is used to change the DataFrame format from wide to long. It’s used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns — variable and value.

site.getsitepackages() → this function tells you the path where third-party packages are installed on your machine. For this we’ve to import site first

plt.legend(loc=”upper right”) → this function places the legend at upper right corner of your plot.

np.random.rand(5,5) → Generates a 5*5 matrix with random numbers

np.random.randn(5,5) → Generates a normally distributed 5*5 matrix with random numbers

NOTE: Will keep updating this story with few more functions.




14+ Years in IT/ITES| ML & AI Enthusiast| Integration Specialist — BPEL, OSB, OIC, IICS, Oracle Cloud Infrastructure| Database Scripting-SQL, PL/SQL, MySQL

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

What Do Data Scientists Even Do?

15th Day: Data Exploration

Get Rice Rates Charts With An API

Are you Evaluation-Ready?

NBA Player Comparison and Position Classifier

Analyze and Visualize URLs with Network Graph

Mining for gold in Amazon reviews

Tweet Sentiments and ML models

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Santhosh Kumar BVSRK

Santhosh Kumar BVSRK

14+ Years in IT/ITES| ML & AI Enthusiast| Integration Specialist — BPEL, OSB, OIC, IICS, Oracle Cloud Infrastructure| Database Scripting-SQL, PL/SQL, MySQL

More from Medium

Arange (NumPy) in Python

Understanding Data Structures in Python

Creating Series, Data Frames and Panels in Pandas

Python Date and Time I’ve Found Very Useful and Frequently Used as a Data Analyst