Practical 01¶
Programs to understand the control structures of python.
- Python program to print "Hello Python"
- Python program to find the area of a triangle
- Python Program to Check Leap Year
- Python Program to Find the Sum of Natural Numbers
- Python Program to Print all Prime Numbers between an Interval
1. Python program to print "Hello Python"¶
print ("Hello Python")
Hello Python
2. Python program to find the area of a triangle¶
# import math module
import math
# take inputs
a = float(input('Enter first side: '))
b = float(input('Enter second side: '))
c = float(input('Enter third side: '))
# calculate the semi-perimeter
s = (a + b + c) / 2
# calculate the area using heron's formula
area = math.sqrt(s * (s - a) * (s - b) * (s - c))
print('The area of the triangle is %0.2f' % area)
Enter first side: 15 Enter second side: 30 Enter third side: 45 The area of the triangle is 0.00
3. Python Program to Check Leap Year¶
year = int(input("Enter a year: "))
if year % 4 == 0:
if year % 100 == 0:
if year % 400 == 0:
print(year, "is a leap year.")
else:
print(year, "is not a leap year.")
else:
print(year, "is a leap year.")
else:
print(year, "is not a leap year.")
Enter a year: 2009 2009 is not a leap year.
4. Python Program to Find the Sum of Natural Numbers¶
# Take input from the user
num = int(input("Enter a number: "))
sum = 0
# use while loop to iterate un till zero
while(num > 0):
sum += num
num -= 1
print("The sum is", sum)
Enter a number: 14 The sum is 105
5. Python Program to Print all Prime Numbers between an Interval¶
# define a function
def prime_numbers(start, end):
for num in range(start, end + 1):
if num > 1:
for i in range(2, num):
if (num % i) == 0:
break
else:
print(num)
# take input from the user
start = int(input("Enter the start of the interval: "))
end = int(input("Enter the end of the interval: "))
print("Prime numbers between", start, "and", end, "are:")
prime_numbers(start, end)
Enter the start of the interval: 5 Enter the end of the interval: 50 Prime numbers between 5 and 50 are: 5 7 11 13 17 19 23 29 31 37 41 43 47
Practical 02¶
Develop Programs to learn different data-types (string “ ”, set { }, list [ ], dictionary { : }, tuple ( )) in python.
- Python Program to Remove Punctuation from a String
- Python Program to Illustrate Different Set Operations – Union, Intersection, Difference, Symmetric Difference
- Python Program to demonstrate list slicing
- Python Program to compare two lists
- Python Program to Check If a List is Empty
- Python Program Concatenate Two Lists
- Python Program to Merge Two Dictionaries
- Python Program to Iterate Over Dictionaries Using for Loop
- Python Program to Sort a Dictionary by Value
- Python Program to Find the size of a Tuple
- Python Program to find Sum of Tuple’s elements (numbers)
- Python Program to Count the Number of Each Vowel
1. Python Program to Remove Punctuation from a String¶
# Define a string of punctuation characters to remove from the input string
punctuations = "='!()-[]0};:""\,<>./?@#$%^&'"
# Prompt the user to enter a string
my_str = input("Enter a string:- ")
# Initialize an empty string to hold the input string with punctuation removed
no_punct = ""
# Iterate over each character in the input string
for char in my_str:
# If the character is not in the punctuation string, add it to the no_punct string
if char not in punctuations:
no_punct = no_punct + char
# Print the input string with punctuation removed
print(no_punct)
Enter a string:- Hello ! My name is Parth. Hello My name is Parth
2. Python Program to Illustrate Different Set Operations – Union, Intersection, Difference, Symmetric Difference¶
# define two sets
set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}
# union of two sets
print("Union of two sets:", set1 | set2)
# intersection of two sets
print("Intersection of two sets:", set1 & set2)
# difference of two sets
print("Difference of two sets:", set1 - set2)
# symmetric difference of two sets
print("Symmetric difference of two sets:", set1 ^ set2)
Union of two sets: {1, 2, 3, 4, 5, 6, 7, 8} Intersection of two sets: {4, 5} Difference of two sets: {1, 2, 3} Symmetric difference of two sets: {1, 2, 3, 6, 7, 8}
3. Python Program to demonstrate list slicing¶
# Define a list of numbers
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Print the entire list
print("Original list:", numbers)
# Print the first three elements of the list
print("First three elements:", numbers[:3])
# Print the last three elements of the list
print("Last three elements:", numbers[-3:])
# Print every other element of the list
print("Every other element:", numbers[::2])
# Print the elements of the list in reverse order
print("Reversed list:", numbers[::-1])
Original list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] First three elements: [1, 2, 3] Last three elements: [8, 9, 10] Every other element: [1, 3, 5, 7, 9] Reversed list: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
4. Python Program to compare two lists¶
# Define two lists of numbers
list1 = [1, 2, 3, 4, 5]
list2 = [1, 2, 3, 4, 6]
# Check if the two lists are equal
if list1 == list2:
print("The two lists are equal")
else:
print("The two lists are not equal")
# Check if the two lists have the same elements (order doesn't matter)
if set(list1) == set(list2):
print("The two lists have the same elements")
else:
print("The two lists do not have the same elements")
The two lists are not equal The two lists do not have the same elements
5. Python Program to Check If a List is Empty¶
# Define a list of numbers
my_list = [1, 2, 3]
# Check if the list is empty using the len() function
if len(my_list) == 0:
print("The list is empty")
else:
print("The list is not empty")
The list is not empty
6. Python Program Concatenate Two Lists¶
# Define two lists of numbers
list1 = [1, 2, 3]
list2 = [4, 5, 6]
# Concatenate the two lists using the + operator
concatenated_list = list1 + list2
# Print the concatenated list
print("Concatenated list:", concatenated_list)
Concatenated list: [1, 2, 3, 4, 5, 6]
7. Python Program to Merge Two Dictionaries¶
# Define two dictionaries
dict1 = {'a': 1, 'b': 2}
dict2 = {'c': 3, 'd': 4}
# Merge the two dictionaries using the update() method
merged_dict = dict1.copy()
merged_dict.update(dict2)
# Print the merged dictionary
print("Merged dictionary:", merged_dict)
Merged dictionary: {'a': 1, 'b': 2, 'c': 3, 'd': 4}
8. Python Program to Iterate Over Dictionaries Using for Loop¶
# Define a dictionary of key-value pairs
my_dict = {'a': 1, 'b': 2, 'c': 3}
# Iterate over the dictionary using a for loop
for key, value in my_dict.items():
print(key, value)
a 1 b 2 c 3
9. Python Program to Sort a Dictionary by Value¶
# Define a dictionary of key-value pairs
my_dict = {'apple': 5, 'banana': 2, 'orange': 4, 'pear': 3}
# Sort the dictionary by value using the sorted() function and a lambda function
sorted_dict = dict(sorted(my_dict.items(), key=lambda item: item[1]))
# Print the sorted dictionary
print("Sorted dictionary:", sorted_dict)
Sorted dictionary: {'banana': 2, 'pear': 3, 'orange': 4, 'apple': 5}
10. Python Program to Find the size of a Tuple¶
# Define a tuple of numbers
my_tuple = (1, 2, 3, 4, 5)
# Find the size of the tuple using the len() function
tuple_size = len(my_tuple)
# Print the size of the tuple
print("Size of the tuple:", tuple_size)
Size of the tuple: 5
11. Python Program to find Sum of Tuple’s elements (numbers)¶
# Define a tuple
my_tuple = (10, 20, 30, 40, 50)
# Initialize a variable to store the sum
sum_of_elements = 0
# Iterate through the tuple and add each element to the sum
for element in my_tuple:
sum_of_elements += element
# Print the sum of elements
print("Sum of elements in the tuple:", sum_of_elements)
Sum of elements in the tuple: 150
12. Python Program to Count the Number of Each Vowel¶
# Define a string of vowels
vowels = 'aAeEiIoOuU'
# Prompt the user to enter a string
my_str = input("Enter a string: ")
# Convert the string to lowercase
my_str = my_str.lower()
# Initialize a dictionary to hold the vowel counts
vowel_counts = {}
# Iterate over each character in the string
for char in my_str:
# If the character is a vowel, increment its count in the dictionary
if char in vowels:
if char in vowel_counts:
vowel_counts[char] += 1
else:
vowel_counts[char] = 1
# Print the vowel counts
for vowel, count in vowel_counts.items():
print(vowel, count)
Enter a string: Hello, this is an example text. e 4 o 1 i 2 a 2
Practical 03¶
Develop Programs to learn concept of functions scoping, recursion and list mutability
- Python Program to demonstrate use of local, nonlocal & global variable
- Python Program to Make a Simple Calculator
- Python Program to Find Factorial of Number Using Recursion
- Python Program to Display Fibonacci Sequence Using Recursion
1. Python Program to demonstrate use of local, nonlocal & global¶
# Global variable
global_variable = 10
def example_function():
# Local variable
local_variable = 5
print("Inside the function:")
print("Local variable:", local_variable) # Access local variable
print("Global variable:", global_variable) # Access global variable
example_function()
print("\nOutside the function:")
# Trying to access local_variable outside the function will result in an error
print("Global variable:", global_variable) # Access global variable
# Nested function with nonlocal
def outer_function():
outer_variable = 20
def inner_function():
nonlocal outer_variable # Access and modify the outer_variable
outer_variable = 30
print("Inside inner_function - nonlocal variable:", outer_variable)
inner_function()
print("Inside outer_function - nonlocal variable:", outer_variable)
outer_function()
print("\nOutside the outer_function:")
# Trying to access outer_variable outside the function will result in an error
Inside the function: Local variable: 5 Global variable: 10 Outside the function: Global variable: 10 Inside inner_function - nonlocal variable: 30 Inside outer_function - nonlocal variable: 30 Outside the outer_function:
2. Python Program to Make a Simple Calculator¶
# Define a function to add two numbers
def add(num1, num2):
return num1 + num2
# Define a function to subtract two numbers
def subtract(num1, num2):
return num1 - num2
# Define a function to multiply two numbers
def multiply(num1, num2):
return num1 * num2
# Define a function to divide two numbers
def divide(num1, num2):
return num1 / num2
# Prompt the user to enter two numbers and an operation
num1 = float(input("Enter the first number: "))
num2 = float(input("Enter the second number: "))
operation = input("Enter the operation (+, -, *, /): ")
# Perform the selected operation on the two numbers
if operation == '+':
result = add(num1, num2)
elif operation == '-':
result = subtract(num1, num2)
elif operation == '*':
result = multiply(num1, num2)
elif operation == '/':
result = divide(num1, num2)
else:
print("Invalid operation selected.")
result = None
# Print the result of the operation
if result is not None:
print("Result:", result)
Enter the first number: 15 Enter the second number: 34 Enter the operation (+, -, *, /): + Result: 49.0
3. Python Program to Find Factorial of Number Using Recursion¶
# Define a function to calculate the factorial of a number
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n-1)
# Prompt the user to enter a number
num = int(input("Enter a number: "))
# Calculate the factorial of the number using the factorial() function
result = factorial(num)
# Print the result
print("Factorial of", num, "is", result)
Enter a number: 14 Factorial of 14 is 87178291200
4. Python Program to Display Fibonacci Sequence Using Recursion¶
# Define a function to calculate the nth Fibonacci number
def fibonacci(n):
if n <= 1:
return n
else:
return fibonacci(n-1) + fibonacci(n-2)
# Prompt the user to enter the number of terms to display
num_terms = int(input("Enter the number of terms to display: "))
# Display the Fibonacci sequence using the fibonacci() function
for i in range(num_terms):
print(fibonacci(i))
Enter the number of terms to display: 10 0 1 1 2 3 5 8 13 21 34
Practical 04¶
Develop Programs to understand working of exception handling and assertions.
- Program Program to depict else clause with try-expect
- Python Program to demonstrate finally
- Python Program to depict Raising Exception
- Python Program using assertions
1. Program Program to depict else clause with try-expect¶
def divide(a, b):
try:
result = a / b
except ZeroDivisionError:
print("Error: Cannot divide by zero!")
else:
print(f"The result of {a} divided by {b} is {result}")
divide(10, 2) # Output: The result of 10 divided by 2 is 5.0
divide(5, 0) # Output: Error: Cannot divide by zero!
The result of 10 divided by 2 is 5.0 Error: Cannot divide by zero!
2. Python Program to demonstrate finally¶
try:
x = int(input("Enter a number: "))
y = int(input("Enter another number: "))
result = x / y
except ZeroDivisionError:
print("Error: Cannot divide by zero!")
else:
print(f"The result of {x} divided by {y} is {result}")
finally:
print("This will always execute, regardless of whether an exception was raised or not.")
Enter a number: 24 Enter another number: 12 The result of 24 divided by 12 is 2.0 This will always execute, regardless of whether an exception was raised or not.
3. Python Program to depict Raising Exception¶
def divide(a, b):
if b == 0:
raise ZeroDivisionError("Cannot divide by zero!")
else:
return a / b
try:
result = divide(10, 0)
except ZeroDivisionError as e:
print(e)
else:
print(f"The result is {result}")
Cannot divide by zero!
4. Python Program using assertions¶
def check_positive_number(number):
assert number > 0, "Number must be positive"
return number
try:
num = int(input("Enter a positive number: "))
result = check_positive_number(num)
print(f"You entered a positive number: {result}")
except ValueError:
print("Invalid input. Please enter a valid positive number.")
except AssertionError as e:
print(f"Assertion error: {e}")
Enter a positive number: 0 Assertion error: Number must be positive
Practical 05¶
Develop programs to demonstrate use of NumPy
Refer : https://numpy.org/doc/stable/user/quickstart.html
- Print Numpy version and configuration information.
- Create a numpy array with numbers ranging from 50 to 100.Also print attributes of created object.
- Create a null vector of size 10.
- Create a vector of size 10 initialized with random integers with a seed equal to last four digits of your enrolment number.
- Create a matrix of size 5x5 initialized with random integers with a seed equal to last four digits of your enrolment number.
- Create an identity matrix of size 5x5.
- Do the following
- Create a numpy array mat1 of size 5x2 with numbers ranging from 1 to 10.
- Create another numpy array mat2 of size 2x5 with floating point numbers between 10 to 20.
- Calculate the matrix product of mat1 and mat2.
- Print vector containing minimum and maximum in each row of mat1.
- Print vector containing mean and standard deviation in each column of mat2.
- Perform vstack and hstack on mat1 and mat2.
- Split mat1 horizontal into 2 and split mat2 vertical into 2 parts.
- Create a vector vec of size 20 initialized with double numbers ranging from 20 to 40 and do the following.
- Reshape it to a 5x4 matrix mat.
- Print every 4th element of vec.
- Print elements of vec less than or equal to 25.
- In vec assign every number at even location to 1.
- In vec assign from start position to 10th position every second element to 0.
- Reverse elements of vec.
- In mat print each row in the second column and print each column in the second row.
- Apply some universal functions such as sin,sqrt,cos,exp on vec.
- Print transpose and inverse of mat.
- Demonstrate shallow copy and deep copy.
1. Print Numpy version and configuration information.¶
import numpy as np
print("NumPy version:", np.__version__)
print("\nNumPy configuration:")
np.show_config()
NumPy version: 1.23.5 NumPy configuration: openblas64__info: libraries = ['openblas64_', 'openblas64_'] library_dirs = ['/usr/local/lib'] language = c define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)] runtime_library_dirs = ['/usr/local/lib'] blas_ilp64_opt_info: libraries = ['openblas64_', 'openblas64_'] library_dirs = ['/usr/local/lib'] language = c define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)] runtime_library_dirs = ['/usr/local/lib'] openblas64__lapack_info: libraries = ['openblas64_', 'openblas64_'] library_dirs = ['/usr/local/lib'] language = c define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)] runtime_library_dirs = ['/usr/local/lib'] lapack_ilp64_opt_info: libraries = ['openblas64_', 'openblas64_'] library_dirs = ['/usr/local/lib'] language = c define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)] runtime_library_dirs = ['/usr/local/lib'] Supported SIMD extensions in this NumPy install: baseline = SSE,SSE2,SSE3 found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2 not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
2. Create a numpy array with numbers ranging from 50 to 100.Also print attributes of created object.¶
arr = np.arange(50, 101)
print(arr)
print("Shape of array:", arr.shape)
print("Data type of array elements:", arr.dtype)
print("Number of dimensions of array:", arr.ndim)
[ 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100] Shape of array: (51,) Data type of array elements: int64 Number of dimensions of array: 1
3. Create a null vector of size 10.¶
null_vector = np.zeros(10)
print(null_vector)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
4. Create a vector of size 10 initialized with random integers with a seed equal to last four digits of your enrolment number.¶
np.random.seed(7133)
random_vector = np.random.randint(0, 100, size=10)
print(random_vector)
[20 86 93 78 39 95 92 60 11 16]
5. Create a matrix of size 5x5 initialized with random integers with a seed equal to last four digits of your enrolment number.¶
np.random.seed(7133)
matrix = np.random.randint(0, 100, size=(5, 5))
print(matrix)
[[20 86 93 78 39] [95 92 60 11 16] [45 8 19 16 65] [93 95 31 26 83] [24 65 22 5 95]]
6. Create an identity matrix of size 5x5.¶
identity_matrix = np.identity(5)
print(identity_matrix)
[[1. 0. 0. 0. 0.] [0. 1. 0. 0. 0.] [0. 0. 1. 0. 0.] [0. 0. 0. 1. 0.] [0. 0. 0. 0. 1.]]
7. Do the following¶
Create a numpy array mat1 of size 5x2 with numbers ranging from 1 to 10.¶
mat1 = np.arange(1, 11).reshape(5, 2)
print(mat1)
[[ 1 2] [ 3 4] [ 5 6] [ 7 8] [ 9 10]]
Create another numpy array mat2 of size 2x5 with floating point numbers between 10 to 20.¶
mat2 = np.random.uniform(10, 20, size=(2, 5))
print(mat2)
[[13.14036652 16.57157439 13.4856098 19.84901782 15.93328273] [17.91219832 19.03925499 10.25847916 17.49775131 14.59391677]]
Calculate the matrix product of mat1 and mat2.¶
mat1 = np.arange(1, 11).reshape(5, 2)
mat2 = np.random.uniform(10, 20, size=(2, 5))
mat_product = np.dot(mat1, mat2)
print(mat_product)
[[ 49.38993502 48.41417736 49.04024609 35.40692619 41.32757539] [117.31621038 114.0877628 116.48572366 81.26452379 100.82181511] [185.24248575 179.76134824 183.93120123 127.12212138 160.31605483] [253.16876111 245.43493369 251.3766788 172.97971898 219.81029456] [321.09503648 311.10851913 318.82215637 218.83731658 279.30453428]]
Print vector containing minimum and maximum in each row of mat1.¶
mat1 = np.arange(1, 11).reshape(5, 2)
min_max_vec = np.column_stack((np.min(mat1, axis=1), np.max(mat1, axis=1)))
print(min_max_vec)
[[ 1 2] [ 3 4] [ 5 6] [ 7 8] [ 9 10]]
Print vector containing mean and standard deviation in each column of mat2.¶
mat2 = np.random.uniform(10, 20, size=(2, 5))
mean_std_vec = np.column_stack((np.mean(mat2, axis=0), np.std(mat2, axis=0)))
print(mean_std_vec)
[[15.3786871 1.5405699 ] [15.70136899 2.36388109] [14.84204549 1.71781312] [13.49937687 1.68200076] [12.94592071 0.28293471]]
Perform vstack and hstack on mat1 and mat2.¶
# Assuming you have two NumPy arrays mat1 and mat2
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])
# Vertical Stack (vstack)
vertical_stack_result = np.vstack((mat1, mat2))
# Horizontal Stack (hstack)
horizontal_stack_result = np.hstack((mat1, mat2))
print("Original mat1:")
print(mat1)
print("Original mat2:")
print(mat2)
print("Vertical Stack Result:")
print(vertical_stack_result)
print("Horizontal Stack Result:")
print(horizontal_stack_result)
Original mat1: [[1 2] [3 4]] Original mat2: [[5 6] [7 8]] Vertical Stack Result: [[1 2] [3 4] [5 6] [7 8]] Horizontal Stack Result: [[1 2 5 6] [3 4 7 8]]
Split mat1 horizontal into 2 and split mat2 vertical into 2 parts.¶
mat1 = np.arange(1, 11).reshape(5, 2)
mat2 = np.random.uniform(10, 20, size=(2, 5))
# Split mat1 horizontally into 2
mat1_split_h = np.hsplit(mat1, 2)
print("mat1 split horizontally into 2:\n", mat1_split_h)
# Split mat2 vertically into 2
mat2_split_v = np.vsplit(mat2, 2)
print("\nmat2 split vertically into 2:\n", mat2_split_v)
mat1 split horizontally into 2: [array([[1], [3], [5], [7], [9]]), array([[ 2], [ 4], [ 6], [ 8], [10]])] mat2 split vertically into 2: [array([[15.78556787, 17.50629083, 14.88323419, 11.78221015, 14.87120469]]), array([[14.81754071, 16.19864425, 16.63558727, 17.08566406, 16.16187844]])]
8. Create a vector vec of size 20 initialized with double numbers ranging from 20 to 40 and do the following.¶
import numpy as np
# Create vector vec of size 20 initialized with double numbers ranging from 20 to 40
vec = np.linspace(20, 40, 20, dtype=np.float64)
Reshape it to a 5x4 matrix mat.¶
# Reshape vec to a 5x4 matrix mat
mat = vec.reshape(5, 4)
Print every 4th element of vec.¶
# Print every 4th element of vec
print(vec[3::4])
[23.15789474 27.36842105 31.57894737 35.78947368 40. ]
Print elements of vec less than or equal to 25.¶
# Print elements of vec less than or equal to 25
print(vec[vec <= 25])
[20. 21.05263158 22.10526316 23.15789474 24.21052632]
In vec assign every number at even location to 1.¶
# In vec assign every number at even location to 1
vec[1::2] = 1
In vec assign from start position to 10th position every second element to 0.¶
# In vec assign from start position to 10th position every second element to 0
vec[:10:2] = 0
Reverse elements of vec.¶
# Reverse elements of vec
vec = vec[::-1]
In mat print each row in the second column and print each column in the second row.¶
# In mat print each row in the second column and print each column in the second row
print(mat[:, 1])
print(mat[1, :])
[1. 1. 1. 1. 1.] [0. 1. 0. 1.]
Apply some universal functions such as sin, sqrt, cos, exp on vec.¶
# Apply some universal functions such as sin,sqrt,cos,exp on vec
print(np.sin(vec))
print(np.sqrt(vec))
print(np.cos(vec))
print(np.exp(vec))
[ 0.84147098 0.94843344 0.84147098 -0.75588615 0.84147098 -0.17836339 0.84147098 0.93759646 0.84147098 -0.77682669 0.84147098 0. 0.84147098 0. 0.84147098 0. 0.84147098 0. 0.84147098 0. ] [1. 6.24078268 1. 6.06976979 1. 5.89379692 1. 5.71240571 1. 5.52506251 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. ] [ 0.54030231 0.31697636 0.54030231 0.65470308 0.54030231 -0.98396469 0.54030231 0.3477253 0.54030231 0.62971446 0.54030231 1. 0.54030231 1. 0.54030231 1. 0.54030231 1. 0.54030231 1. ] [2.71828183e+00 8.21537118e+16 2.71828183e+00 1.00074405e+16 2.71828183e+00 1.21904249e+15 2.71828183e+00 1.48495972e+14 2.71828183e+00 1.80888310e+13 2.71828183e+00 1.00000000e+00 2.71828183e+00 1.00000000e+00 2.71828183e+00 1.00000000e+00 2.71828183e+00 1.00000000e+00 2.71828183e+00 1.00000000e+00]
Print transpose and inverse of mat.¶
mat = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Transpose of the matrix
transpose_mat = np.transpose(mat)
# Inverse of the matrix
try:
inverse_mat = np.linalg.inv(mat)
except np.linalg.LinAlgError:
inverse_mat = "Matrix is not invertible"
# Print the original matrix, its transpose, and its inverse
print("Original Matrix:")
print(mat)
print("\nTranspose of the Matrix:")
print(transpose_mat)
print("\nInverse of the Matrix:")
print(inverse_mat)
Original Matrix: [[1 2 3] [4 5 6] [7 8 9]] Transpose of the Matrix: [[1 4 7] [2 5 8] [3 6 9]] Inverse of the Matrix: Matrix is not invertible
9. Demonstrate shallow copy and deep copy.¶
import copy
# Create a list of lists
original_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
# Shallow copy
shallow_copy = copy.copy(original_list)
# Deep copy
deep_copy = copy.deepcopy(original_list)
# Modify the original list
original_list[0][0] = 0
# Print the original list, shallow copy, and deep copy
print("Original list:", original_list)
print("Shallow copy:", shallow_copy)
print("Deep copy:", deep_copy)
Original list: [[0, 2, 3], [4, 5, 6], [7, 8, 9]] Shallow copy: [[0, 2, 3], [4, 5, 6], [7, 8, 9]] Deep copy: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Practical 06¶
Python File Handling to load text & image data
- Create File with two columns X1 and Y1. Generate 20 rows with random data for values of X1 and Y1.
- Demonstrate concept of streaming and sampling using above generated file. Fetch every even record from file and Fetch any 5 random records from file to demonstrate sampling.
- Download and load image data from the following URL. https://unsplash.com/photos/pypeCEaJeZY
import pandas as pd
import numpy as np
# Generate 20 random numbers for X1 and Y1
X1 = np.random.rand(20)
Y1 = np.random.rand(20)
# Create a DataFrame
df = pd.DataFrame({'X1': X1, 'Y1': Y1})
# Save the DataFrame to a csv file
df.to_csv('data.csv', index=False)
1. Create File with two columns X1 and Y1. Generate 20 rows with random data for values of X1 and Y1.¶
# Read the csv file
df = pd.read_csv('data.csv')
# Display the data
print(df)
X1 Y1 0 0.825299 0.580684 1 0.443933 0.539453 2 0.520570 0.920555 3 0.589448 0.593961 4 0.770221 0.073119 5 0.196157 0.889510 6 0.692413 0.109250 7 0.449811 0.395146 8 0.644630 0.755827 9 0.026888 0.861718 10 0.497937 0.891846 11 0.562219 0.354087 12 0.319644 0.610723 13 0.560438 0.129359 14 0.804205 0.090613 15 0.095709 0.943955 16 0.597264 0.396839 17 0.666371 0.968839 18 0.998911 0.449929 19 0.512409 0.218404
2. Demonstrate concept of streaming and sampling using above generated file. Fetch every even record from file and Fetch any 5 random records from file to demonstrate sampling.¶
# Read the csv file
df = pd.read_csv('data.csv')
# Fetch every even record from the file
even_records = df[df.index % 2 == 0]
print("Even records:\n", even_records)
# Fetch any 5 random records from the file
random_records = df.sample(n=5)
print("\nRandom records:\n", random_records)
Even records: X1 Y1 0 0.825299 0.580684 2 0.520570 0.920555 4 0.770221 0.073119 6 0.692413 0.109250 8 0.644630 0.755827 10 0.497937 0.891846 12 0.319644 0.610723 14 0.804205 0.090613 16 0.597264 0.396839 18 0.998911 0.449929 Random records: X1 Y1 6 0.692413 0.109250 9 0.026888 0.861718 10 0.497937 0.891846 2 0.520570 0.920555 15 0.095709 0.943955
3. Download and load image data from the following URL. https://unsplash.com/photos/pypeCEaJeZY¶
import requests
from PIL import Image
from io import BytesIO
from IPython.display import display
# URL of the image
url = "https://images.unsplash.com/photo-1542744173-05336fcc7ad4"
# Send a HTTP request to the URL of the image
response = requests.get(url)
# Load the image
img = Image.open(BytesIO(response.content))
# Display the image
display(img)
Practical 07¶
Develop Programs to demonstrate use of Pandas for Conditioning Your Data
URL for test.xml
file.
https://drive.google.com/file/d/1FqOWhY2XNYkHwCBYOjhAILCzVUo9QEp6
URL for indian_food.csv
file.
https://drive.google.com/file/d/1CNAdqFZ-Amji8kOMd4GovivK8UKVLQ-p
- Read the xml file
test.xml
and create a dataframe from it and do the following.- Find and print duplicate records.
- Remove duplicates and save data in other dataframe.
- Read the csv file
indian_food.csv
. Consider value -1 for missing or NA values.(Replace -1 with NaN when reading a csv file.)- Print the first and last 10 records of dataframe, also print column names and summary of data. Print information about data such as data types of each column.
- Convert columns with name course,diet,flavor_profile,state,region to categorical data type & print data type for dataframe using info function.
- Categories are defined as follows.
Course ['dessert' 'main course' 'starter' 'snack']
Flavor_profile ['sweet' 'spicy' 'bitter' 'sour']
State ['West Bengal' 'Rajasthan' 'Punjab' 'Uttar Pradesh' 'Odisha' 'Maharashtra' 'Uttarakhand' 'Assam' 'Bihar' 'Andhra Pradesh' 'Karnataka' 'Telangana' 'Kerala' 'Tamil Nadu' 'Gujarat' 'Tripura' 'Manipur' 'Nagaland' 'NCT of Delhi' 'Jammu & Kashmir' 'Chhattisgarh' 'Haryana' 'MadhyaPradesh' 'Goa']
Region ['East' 'West' 'North' nan 'North East' 'South' 'Central']
- Print name of items with course as dessert.
- Print count of items with flavor_profile with sweet type.
- Print name of items with cooking_time < prep_time.
- Print summary of data grouped by diet column.
- Print average cooking_time & prep_time for vegetarian diet type.
- Insert a new column with column name as total_time which contains sum of cooking_time & prep_time into existing dataframe.
- Print name,cooking_time,prep_time,total_time of items with total_time >=500
- Print count of items with various flavour_profile per region.
- Find & print records with missing data in the state column.
- Fill missing data in the state column with -.
- Write regular expression,
To extract phone numbers (+dd-dddd-dddd) from the following text
“Hey my number is +01-555-1212 & his number is +01-770-1410”
To extract email addresses from the following text.
“You can contact to abcd@gmail.co.in or to xyzw@yahoo.in”
- Demonstrate stemming & stop word removal using nltk library for content given below
Most of the world will make decisions by either guessing or using their gut. They will beeither lucky or wrong.
The goal is to turn data into information and information into insight.
- Using a 20 newsgroup dataset, create and demonstrate a bag of words model.Also convert theraw newsgroup documents into a matrix of TF-IDF feature.
1. Read the xml file test.xml
and create a dataframe from it and do the following.¶
Find and print duplicate records.¶
import pandas as pd # Importing pandas library for data manipulation and analysis
import xml.etree.ElementTree as ET # Importing ElementTree from xml.etree for parsing and creating XML data
tree = ET.parse('test.xml') # Parsing the XML file
root = tree.getroot() # Getting the root element of the XML document
data = [] # Initializing an empty list to store the data
for record in root: # Iterating over each record in the root element
row = {} # Initializing an empty dictionary to store each row of data
for item in record: # Iterating over each item in the record
row[item.tag] = item.text # Adding the item's tag as the key and the item's text as the value to the row dictionary
data.append(row) # Appending the row dictionary to the data list
df = pd.DataFrame(data) # Creating a DataFrame from the data list
print(df) # Printing the DataFrame
Category Quantity Price 0 NaN NaN NaN 1 A 3 24.50 2 B 1 89.99 3 A 5 4.95 4 A 3 66.00 5 B 10 .99 6 A 3 24.50 7 A 15 29.00 8 B 8 6.99 9 A 15 29.00
Remove duplicates and save data in other dataframe.¶
duplicates = df[df.duplicated()] # Finding duplicate rows in the DataFrame
print("Duplicate Records:") # Printing a string for clarity
print(duplicates) # Printing the duplicate rows
df_no_duplicates = df.drop_duplicates() # Removing duplicate rows from the DataFrame
print("DataFrame without Duplicates:") # Printing a string for clarity
print(df_no_duplicates) # Printing the DataFrame without duplicates
Duplicate Records: Category Quantity Price 6 A 3 24.50 9 A 15 29.00 DataFrame without Duplicates: Category Quantity Price 0 NaN NaN NaN 1 A 3 24.50 2 B 1 89.99 3 A 5 4.95 4 A 3 66.00 5 B 10 .99 7 A 15 29.00 8 B 8 6.99
2. Read the csv file indian_food.csv
. Consider value -1 for missing or NA values.(Replace -1 with NaN when reading a csv file.)¶
import pandas as pd
# Read the CSV file
df_csv = pd.read_csv('indian_food.csv', na_values=-1)
Print the first and last 10 records of dataframe, also print column names and summary of data. Print information about data such as data types of each column.¶
# Print the first 10 records
print(df_csv.head(10))
# Print the last 10 records
print(df_csv.tail(10))
# Print column names
print(df_csv.columns)
# Print summary of data
print(df_csv.describe(include='all'))
# Print information about data
print(df_csv.info())
name ingredients \ 0 Balu shahi Maida flour, yogurt, oil, sugar 1 Boondi Gram flour, ghee, sugar 2 Gajar ka halwa Carrots, milk, sugar, ghee, cashews, raisins 3 Ghevar Flour, ghee, kewra, milk, clarified butter, su... 4 Gulab jamun Milk powder, plain flour, baking powder, ghee,... 5 Imarti Sugar syrup, lentil flour 6 Jalebi Maida, corn flour, baking soda, vinegar, curd,... 7 Kaju katli Cashews, ghee, cardamom, sugar 8 Kalakand Milk, cottage cheese, sugar 9 Kheer Milk, rice, sugar, dried fruits diet prep_time cook_time flavor_profile course state \ 0 vegetarian 45.0 25.0 sweet dessert West Bengal 1 vegetarian 80.0 30.0 sweet dessert Rajasthan 2 vegetarian 15.0 60.0 sweet dessert Punjab 3 vegetarian 15.0 30.0 sweet dessert Rajasthan 4 vegetarian 15.0 40.0 sweet dessert West Bengal 5 vegetarian 10.0 50.0 sweet dessert West Bengal 6 vegetarian 10.0 50.0 sweet dessert Uttar Pradesh 7 vegetarian 10.0 20.0 sweet dessert NaN 8 vegetarian 20.0 30.0 sweet dessert West Bengal 9 vegetarian 10.0 40.0 sweet dessert NaN region 0 East 1 West 2 North 3 West 4 East 5 East 6 North 7 NaN 8 East 9 NaN name ingredients \ 245 Pani Pitha Tea leaves, white sesame seeds, dry coconut, s... 246 Payokh Basmati rice, rose water, sugar, clarified but... 247 Prawn malai curry Coconut milk, prawns, garlic, turmeric, sugar 248 Red Rice Red pepper, red onion, butter, watercress, oli... 249 Shukto Green beans, bitter gourd, ridge gourd, banana... 250 Til Pitha Glutinous rice, black sesame seeds, gur 251 Bebinca Coconut milk, egg yolks, clarified butter, all... 252 Shufta Cottage cheese, dry dates, dried rose petals, ... 253 Mawa Bati Milk powder, dry fruits, arrowroot powder, all... 254 Pinaca Brown rice, fennel seeds, grated coconut, blac... diet prep_time cook_time flavor_profile course \ 245 vegetarian 10.0 20.0 NaN main course 246 vegetarian NaN NaN sweet dessert 247 non vegetarian 15.0 50.0 spicy main course 248 vegetarian NaN NaN NaN main course 249 vegetarian 10.0 20.0 spicy main course 250 vegetarian 5.0 30.0 sweet dessert 251 vegetarian 20.0 60.0 sweet dessert 252 vegetarian NaN NaN sweet dessert 253 vegetarian 20.0 45.0 sweet dessert 254 vegetarian NaN NaN sweet dessert state region 245 Assam North East 246 Assam North East 247 West Bengal East 248 NaN NaN 249 West Bengal East 250 Assam North East 251 Goa West 252 Jammu & Kashmir North 253 Madhya Pradesh Central 254 Goa West Index(['name', 'ingredients', 'diet', 'prep_time', 'cook_time', 'flavor_profile', 'course', 'state', 'region'], dtype='object') name ingredients diet prep_time \ count 255 255 255 225.000000 unique 255 252 2 NaN top Balu shahi Gram flour, ghee, sugar vegetarian NaN freq 1 2 226 NaN mean NaN NaN NaN 35.386667 std NaN NaN NaN 76.241081 min NaN NaN NaN 5.000000 25% NaN NaN NaN 10.000000 50% NaN NaN NaN 10.000000 75% NaN NaN NaN 20.000000 max NaN NaN NaN 500.000000 cook_time flavor_profile course state region count 227.000000 226 255 231 241 unique NaN 4 4 24 6 top NaN spicy main course Gujarat West freq NaN 133 129 35 74 mean 38.911894 NaN NaN NaN NaN std 49.421711 NaN NaN NaN NaN min 2.000000 NaN NaN NaN NaN 25% 20.000000 NaN NaN NaN NaN 50% 30.000000 NaN NaN NaN NaN 75% 45.000000 NaN NaN NaN NaN max 720.000000 NaN NaN NaN NaN <class 'pandas.core.frame.DataFrame'> RangeIndex: 255 entries, 0 to 254 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 255 non-null object 1 ingredients 255 non-null object 2 diet 255 non-null object 3 prep_time 225 non-null float64 4 cook_time 227 non-null float64 5 flavor_profile 226 non-null object 6 course 255 non-null object 7 state 231 non-null object 8 region 241 non-null object dtypes: float64(2), object(7) memory usage: 18.1+ KB None
Convert columns with name course,diet,flavor_profile,state,region to categorical data type & print data type for dataframe using info function.¶
# Define a list of column names that we want to convert to 'category' data type.
columns_to_convert = ['course', 'diet', 'flavor_profile', 'state', 'region']
# Convert the data type of the specified columns to 'category'.
df_csv[columns_to_convert] = df_csv[columns_to_convert].astype('category')
# Print the string "Data Types (After Conversion):" to the console.
print("Data Types (After Conversion):")
# Print the data types of the columns in the DataFrame to the console, after the conversion.
print(df_csv.dtypes)
Data Types (After Conversion): name object ingredients object diet category prep_time float64 cook_time float64 flavor_profile category course category state category region category dtype: object
Categories are defined as follows.¶
Course ['dessert' 'main course' 'starter' 'snack']
Flavor_profile ['sweet' 'spicy' 'bitter' 'sour']
State ['West Bengal' 'Rajasthan' 'Punjab' 'Uttar Pradesh' 'Odisha' 'Maharashtra' 'Uttarakhand' 'Assam' 'Bihar' 'Andhra Pradesh' 'Karnataka' 'Telangana' 'Kerala' 'Tamil Nadu' 'Gujarat' 'Tripura' 'Manipur' 'Nagaland' 'NCT of Delhi' 'Jammu & Kashmir' 'Chhattisgarh' 'Haryana' 'MadhyaPradesh' 'Goa']
Region ['East' 'West' 'North' nan 'North East' 'South' 'Central']
# Define a dictionary where the keys are column names and the values are lists of categories for each column.
category_definitions = {
'course': ['dessert', 'main course', 'starter', 'snack'],
'flavor_profile': ['sweet', 'spicy', 'bitter', 'sour'],
'state': [
'West Bengal', 'Rajasthan', 'Punjab', 'Uttar Pradesh', 'Odisha',
'Maharashtra', 'Uttarakhand', 'Assam', 'Bihar', 'Andhra Pradesh',
'Karnataka', 'Telangana', 'Kerala', 'Tamil Nadu', 'Gujarat',
'Tripura', 'Manipur', 'Nagaland', 'NCT of Delhi',
'Jammu & Kashmir', 'Chhattisgarh', 'Haryana', 'Madhya Pradesh', 'Goa'
],
'region': ['East', 'West', 'North', 'North East', 'South', 'Central']
}
# Loop over the items in the dictionary.
for column, categories in category_definitions.items():
# Assert that all categories are in the categories of the column in the DataFrame.
assert all(cat in df_csv[column].cat.categories for cat in categories)
# Print the last column name to the console.
print(column)
# Print the last list of categories to the console.
print(categories)
region ['East', 'West', 'North', 'North East', 'South', 'Central']
Print name of items with course as dessert.¶
# Filter the DataFrame to only include rows where the 'course' column is 'dessert'.
desserts = df_csv[df_csv['course'] == 'dessert']
# Print the string "Dessert Items:" to the console.
print("Dessert Items:")
# Print the 'name' column of the filtered DataFrame to the console.
print(desserts['name'])
Dessert Items: 0 Balu shahi 1 Boondi 2 Gajar ka halwa 3 Ghevar 4 Gulab jamun ... 250 Til Pitha 251 Bebinca 252 Shufta 253 Mawa Bati 254 Pinaca Name: name, Length: 85, dtype: object
Print count of items with flavor_profile with sweet type.¶
# Filter the DataFrame to only include rows where the 'flavor_profile' column is 'sweet'.
sweet_items = df_csv[df_csv['flavor_profile'] == 'sweet']
# Print the string "Count of Sweet Items:" and the number of sweet items to the console.
print("Count of Sweet Items:", len(sweet_items))
Count of Sweet Items: 88
Print name of items with cooking_time < prep_time.¶
# Filter the DataFrame to only include rows where the 'cook_time' column is less than the 'prep_time' column.
fast_cooking_items = df_csv[df_csv['cook_time'] < df_csv['prep_time']]
# Print the string "Items with Cooking Time < Prep Time:" to the console.
print("Items with Cooking Time < Prep Time:")
# Print the 'name' column of the filtered DataFrame to the console.
print(fast_cooking_items['name'])
Items with Cooking Time < Prep Time: 0 Balu shahi 1 Boondi 14 Phirni 29 Misti doi 33 Ras malai 35 Sandesh 46 Obbattu holige 48 Poornalu 54 Kajjikaya 66 Chak Hao Kheer 81 Chicken Tikka 94 Khichdi 96 Kulfi falooda 104 Naan 109 Pani puri 114 Pindi chana 122 Tandoori Chicken 123 Tandoori Fish Tikka 124 Attu 128 Dosa 129 Idiappam 130 Idli 144 Masala Dosa 151 Pesarattu 155 Puttu 157 Sandige 158 Sevai 178 Kutchi dabeli 202 Sabudana Khichadi 207 Surnoli 212 Lilva Kachori Name: name, dtype: object
Print summary of data grouped by diet column.¶
# Group the DataFrame by the 'diet' column and calculate summary statistics for each group.
diet_summary = df_csv.groupby('diet').describe()
# Print the string "Summary of Data Grouped by Diet:" to the console.
print("Summary of Data Grouped by Diet:")
# Print the summary statistics of the grouped DataFrame to the console.
print(diet_summary)
Summary of Data Grouped by Diet: prep_time \ count mean std min 25% 50% 75% max diet non vegetarian 19.0 41.842105 74.259502 5.0 10.0 10.0 17.5 240.0 vegetarian 206.0 34.791262 76.570389 5.0 10.0 11.0 20.0 500.0 cook_time count mean std min 25% 50% 75% max diet non vegetarian 19.0 40.0000 22.422707 15.0 30.0 35.0 42.5 120.0 vegetarian 208.0 38.8125 51.213850 2.0 20.0 30.0 45.0 720.0
Print average cooking_time & prep_time for vegetarian diet type.¶
# Filter the DataFrame to only include rows where the 'diet' column is 'vegetarian'.
vegetarian_diet = df_csv[df_csv['diet'] == 'vegetarian']
# Calculate the mean of the 'cooking_time' column for the filtered DataFrame.
average_cooking_time = vegetarian_diet['cook_time'].mean()
# Calculate the mean of the 'prep_time' column for the filtered DataFrame.
average_prep_time = vegetarian_diet['prep_time'].mean()
# Print the string "Average Cooking Time for Vegetarian Diet:" and the average cooking time to the console.
print("Average Cooking Time for Vegetarian Diet:", average_cooking_time)
# Print the string "Average Prep Time for Vegetarian Diet:" and the average preparation time to the console.
print("Average Prep Time for Vegetarian Diet:", average_prep_time)
Average Cooking Time for Vegetarian Diet: 38.8125 Average Prep Time for Vegetarian Diet: 34.79126213592233
Insert a new column with column name as total_time which contains sum of cooking_time & prep_time into existing dataframe.¶
# Add the 'cooking_time' and 'prep_time' columns to create a new 'total_time' column in the DataFrame.
df_csv['total_time'] = df_csv['cook_time'] + df_csv['prep_time']
Print name,cooking_time,prep_time,total_time of items with total_time >=500¶
# Filter the DataFrame to only include rows where the 'total_time' column is greater than or equal to 500.
total_time_gt_500 = df_csv[df_csv['total_time'] >= 500]
# Print the string "Items with Total Time >= 500:" to the console.
print("Items with Total Time >= 500:")
# Print the 'name', 'cooking_time', 'prep_time', and 'total_time' columns of the filtered DataFrame to the console.
print(total_time_gt_500[['name', 'cook_time', 'prep_time', 'total_time']])
Items with Total Time >= 500: name cook_time prep_time total_time 29 Misti doi 30.0 480.0 510.0 62 Shrikhand 720.0 10.0 730.0 114 Pindi chana 120.0 500.0 620.0 155 Puttu 40.0 495.0 535.0
Print count of items with various flavour_profile per region.¶
# Group the DataFrame by the 'region' and 'flavor_profile' columns, calculate the size of each group, and unstack the resulting series into a DataFrame, filling missing values with 0.
flavor_profile_per_region = df_csv.groupby(['region', 'flavor_profile']).size().unstack(fill_value=0)
# Print the string "Count of Items with Flavor Profile per Region:" to the console.
print("Count of Items with Flavor Profile per Region:")
# Print the DataFrame of counts of items with each flavor profile per region to the console.
print(flavor_profile_per_region)
Count of Items with Flavor Profile per Region: flavor_profile bitter sour spicy sweet region Central 0 0 2 1 East 0 0 6 22 North 2 0 35 10 North East 0 0 13 7 South 0 0 30 19 West 2 1 41 23
Find & print records with missing data in the state column.¶
# Filter the DataFrame to only include rows where the 'state' column is missing (NaN).
missing_state_data = df_csv[df_csv['state'].isna()]
# Print the string "Records with Missing Data in the State Column:" to the console.
print("Records with Missing Data in the State Column:")
# Print the DataFrame of records with missing state data to the console.
print(missing_state_data)
Records with Missing Data in the State Column: name ingredients \ 7 Kaju katli Cashews, ghee, cardamom, sugar 9 Kheer Milk, rice, sugar, dried fruits 10 Laddu Gram flour, ghee, sugar 12 Nankhatai Refined flour, besan, ghee, powdered sugar, yo... 94 Khichdi Moong dal, green peas, ginger, tomato, green c... 96 Kulfi falooda Rose syrup, falooda sev, mixed nuts, saffron, ... 98 Lauki ki subji Bottle gourd, coconut oil, garam masala, ginge... 109 Pani puri Kala chana, mashed potato, boondi, sev, lemon 111 Papad Urad dal, sev, lemon juice, chopped tomatoes 115 Rajma chaval Red kidney beans, garam masala powder, ginger,... 117 Samosa Potatoes, green peas, garam masala, ginger, dough 128 Dosa Chana dal, urad dal, whole urad dal, blend ric... 130 Idli Split urad dal, urad dal, idli rice, thick poh... 144 Masala Dosa Chana dal, urad dal, potatoes, idli rice, thic... 145 Pachadi Coconut oil, cucumber, curd, curry leaves, mus... 149 Payasam Rice, cashew nuts, milk, raisins, sugar 154 Rasam Tomato, curry leaves, garlic, mustard seeds, h... 156 Sambar Pigeon peas, eggplant, drumsticks, sambar powd... 158 Sevai Sevai, parboiled rice, steamer 161 Uttapam Chana dal, urad dal, thick poha, tomato, butter 162 Vada Urad dal, ginger, curry leaves, green chilies,... 164 Upma Chana dal, urad dal, ginger, curry leaves, sugar 231 Brown Rice Brown rice, soy sauce, olive oil 248 Red Rice Red pepper, red onion, butter, watercress, oli... diet prep_time cook_time flavor_profile course state \ 7 vegetarian 10.0 20.0 sweet dessert NaN 9 vegetarian 10.0 40.0 sweet dessert NaN 10 vegetarian 10.0 40.0 sweet dessert NaN 12 vegetarian 20.0 30.0 sweet dessert NaN 94 vegetarian 40.0 20.0 spicy main course NaN 96 vegetarian 45.0 25.0 sweet dessert NaN 98 vegetarian 10.0 20.0 spicy main course NaN 109 vegetarian 15.0 2.0 spicy snack NaN 111 vegetarian 5.0 5.0 spicy snack NaN 115 vegetarian 15.0 90.0 spicy main course NaN 117 vegetarian 30.0 30.0 spicy snack NaN 128 vegetarian 360.0 90.0 spicy snack NaN 130 vegetarian 360.0 90.0 spicy snack NaN 144 vegetarian 360.0 90.0 spicy snack NaN 145 vegetarian 10.0 25.0 NaN main course NaN 149 vegetarian 15.0 30.0 sweet dessert NaN 154 vegetarian 10.0 35.0 spicy main course NaN 156 vegetarian 20.0 45.0 spicy main course NaN 158 vegetarian 120.0 30.0 NaN main course NaN 161 vegetarian 10.0 20.0 spicy snack NaN 162 vegetarian 15.0 20.0 spicy snack NaN 164 vegetarian 10.0 20.0 spicy snack NaN 231 vegetarian 15.0 25.0 NaN main course NaN 248 vegetarian NaN NaN NaN main course NaN region total_time 7 NaN 30.0 9 NaN 50.0 10 NaN 50.0 12 NaN 50.0 94 NaN 60.0 96 NaN 70.0 98 NaN 30.0 109 NaN 17.0 111 NaN 10.0 115 North 105.0 117 NaN 60.0 128 South 450.0 130 South 450.0 144 South 450.0 145 South 35.0 149 South 45.0 154 South 45.0 156 South 65.0 158 South 150.0 161 South 30.0 162 South 35.0 164 NaN 30.0 231 NaN 40.0 248 NaN NaN
Fill missing data in the state column with -.¶
# Convert the data type of the 'state' column to string.
df_csv['state'] = df_csv['state'].astype(str)
# Fill missing values in the 'state' column with '-'.
df_csv['state'].fillna('-', inplace=True)
3. Write regular expression¶
To extract phone numbers (+dd-dddd-dddd) from the following text
“Hey my number is +01-555-1212 & his number is +01-770-1410”
To extract email addresses from the following text.
“You can contact to abcd@gmail.co.in or to xyzw@yahoo.in”
import re # Importing the regular expression module
# A string containing two phone numbers
text = "Hey my number is +01-555-1212 & his number is +01-770-1410"
# Using a regular expression to find all phone numbers in the string
phone_numbers = re.findall(r'\+\d{2}-\d{3}-\d{4}', text)
# Printing a message to the console
print("Extracted Phone Numbers:")
# Printing the extracted phone numbers to the console
print(phone_numbers)
# A string containing two email addresses
text2 = "You can contact abcd@gmail.co.in or xyzw@yahoo.in"
# Using a regular expression to find all email addresses in the string
email_addresses = re.findall(r'\S+@\S+', text2)
# Printing a message to the console
print("Extracted Email Addresses:")
# Printing the extracted email addresses to the console
print(email_addresses)
Extracted Phone Numbers: ['+01-555-1212', '+01-770-1410'] Extracted Email Addresses: ['abcd@gmail.co.in', 'xyzw@yahoo.in']
4. Demonstrate stemming & stop word removal using nltk library for content given below¶
Most of the world will make decisions by either guessing or using their gut. They will beeither lucky or wrong.
The goal is to turn data into information and information into insight.
import nltk # Importing the Natural Language Toolkit (NLTK)
nltk.data.path.append('/root/nltk_data') # Adding the path to NLTK data
nltk.download('punkt') # Downloading the Punkt Tokenizer Models
nltk.download('stopwords') # Downloading the Stopwords Corpus
from nltk.corpus import stopwords # Importing the stopwords corpus
from nltk.stem import PorterStemmer # Importing the Porter Stemmer
# A string containing a sentence
text = "Stemming and stop word removal are common text preprocessing techniques."
# Tokenizing the sentence into words
words = nltk.word_tokenize(text)
# Getting a set of English stop words
stop_words = set(stopwords.words('english'))
# Filtering out the stop words from the tokenized words
filtered_words = [word for word in words if word.lower() not in stop_words]
# Creating a Porter Stemmer object
stemmer = PorterStemmer()
# Stemming the filtered words
stemmed_words = [stemmer.stem(word) for word in filtered_words]
# Printing the original tokenized words
print("Original Words: ", words)
# Printing the words after stop word removal
print("Filtered Words (Stop Word Removal): ", filtered_words)
# Printing the words after stemming
print("Stemmed Words: ", stemmed_words)
[nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip.
Original Words: ['Stemming', 'and', 'stop', 'word', 'removal', 'are', 'common', 'text', 'preprocessing', 'techniques', '.'] Filtered Words (Stop Word Removal): ['Stemming', 'stop', 'word', 'removal', 'common', 'text', 'preprocessing', 'techniques', '.'] Stemmed Words: ['stem', 'stop', 'word', 'remov', 'common', 'text', 'preprocess', 'techniqu', '.']
[nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Unzipping corpora/stopwords.zip.
5. Using a 20 newsgroup dataset, create and demonstrate a bag of words model.Also convert theraw newsgroup documents into a matrix of TF-IDF feature.¶
from sklearn.datasets import fetch_20newsgroups # Importing the 20 newsgroups dataset from sklearn
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer # Importing CountVectorizer and TfidfVectorizer
# Fetching the 20 newsgroups dataset, removing headers, footers, and quotes
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
# Creating a CountVectorizer object
count_vectorizer = CountVectorizer()
# Transforming the newsgroups data into a document-term matrix using CountVectorizer
X_count = count_vectorizer.fit_transform(newsgroups.data)
# Creating a TfidfVectorizer object
tfidf_vectorizer = TfidfVectorizer()
# Transforming the newsgroups data into a document-term matrix using TfidfVectorizer
X_tfidf = tfidf_vectorizer.fit_transform(newsgroups.data)
# Selecting a document index
document_index = 0
# Printing the original document
print("Original Document:")
print(newsgroups.data[document_index])
# Printing the Bag-of-Words (BoW) representation of the document
print("\nBoW Representation:")
print(X_count[document_index])
# Printing the TF-IDF representation of the document
print("\nTF-IDF Representation:")
print(X_tfidf[document_index])
Original Document: I am sure some bashers of Pens fans are pretty confused about the lack of any kind of posts about the recent Pens massacre of the Devils. Actually, I am bit puzzled too and a bit relieved. However, I am going to put an end to non-PIttsburghers' relief with a bit of praise for the Pens. Man, they are killing those Devils worse than I thought. Jagr just showed you why he is much better than his regular season stats. He is also a lot fo fun to watch in the playoffs. Bowman should let JAgr have a lot of fun in the next couple of games since the Pens are going to beat the pulp out of Jersey anyway. I was very disappointed not to see the Islanders lose the final regular season game. PENS RULE!!! BoW Representation: (0, 24810) 3 (0, 114159) 1 (0, 110739) 1 (0, 29449) 1 (0, 89588) 8 (0, 93532) 5 (0, 52374) 1 (0, 26465) 3 (0, 96770) 1 (0, 39025) 1 (0, 22362) 2 (0, 116790) 10 (0, 73447) 1 (0, 25769) 1 (0, 71770) 1 (0, 96012) 1 (0, 101388) 1 (0, 79249) 1 (0, 44272) 2 (0, 22945) 1 (0, 31039) 3 (0, 98219) 1 (0, 118091) 1 (0, 25260) 1 (0, 102231) 1 : : (0, 65467) 2 (0, 95129) 1 (0, 32176) 1 (0, 108815) 1 (0, 74531) 1 (0, 60784) 1 (0, 86996) 1 (0, 40171) 1 (0, 56396) 1 (0, 109383) 1 (0, 29827) 1 (0, 98002) 1 (0, 91152) 1 (0, 68989) 1 (0, 25803) 1 (0, 126472) 1 (0, 124216) 1 (0, 44988) 1 (0, 88021) 1 (0, 107582) 1 (0, 67699) 1 (0, 76006) 1 (0, 53470) 1 (0, 56382) 1 (0, 105100) 1 TF-IDF Representation: (0, 105100) 0.08272635762898674 (0, 56382) 0.06548149992571357 (0, 53470) 0.07970882460224814 (0, 76006) 0.08520403767167795 (0, 67699) 0.10133321981346254 (0, 107582) 0.047734013015898615 (0, 88021) 0.029186309768862773 (0, 44988) 0.11435087413757543 (0, 124216) 0.046797861842828435 (0, 126472) 0.03431094478562303 (0, 25803) 0.06762450917821239 (0, 68989) 0.09928167974488611 (0, 91152) 0.03854632657683808 (0, 98002) 0.13907784133493692 (0, 29827) 0.08892315942651909 (0, 109383) 0.05105878014696189 (0, 56396) 0.07131187545939574 (0, 40171) 0.06922150866167542 (0, 86996) 0.06301015973631587 (0, 60784) 0.0271997995927387 (0, 74531) 0.05641076322334231 (0, 108815) 0.046785253119473694 (0, 32176) 0.12719876177804612 (0, 95129) 0.09279827218104486 (0, 65467) 0.045003949086708295 : : (0, 102231) 0.12606018701082405 (0, 25260) 0.021079342325985767 (0, 118091) 0.050494985289006714 (0, 98219) 0.13122965511734613 (0, 31039) 0.1787890311357878 (0, 22945) 0.0582631519062005 (0, 44272) 0.19818164703955615 (0, 79249) 0.09799294739501874 (0, 101388) 0.07788730502152183 (0, 96012) 0.08438145186248154 (0, 71770) 0.06356353510772614 (0, 25769) 0.03649294356115389 (0, 73447) 0.08082262581632702 (0, 116790) 0.1812172345919807 (0, 22362) 0.07108512930815473 (0, 39025) 0.09292472333557168 (0, 96770) 0.0648969397428524 (0, 26465) 0.08914949300938675 (0, 52374) 0.08665300135000918 (0, 93532) 0.49835327837061033 (0, 89588) 0.1703687096002322 (0, 29449) 0.13122965511734613 (0, 110739) 0.03797468797425647 (0, 114159) 0.05473671466917944 (0, 24810) 0.13884931659983238
Practical 08¶
Develop Programs to visualize the data using matplotlib
- Line Plot
- Scatter Plot
- Bar Chart
- Histogram
- Pie Chart
import matplotlib.pyplot as plt
import numpy as np
1. Line Plot¶
# Line Plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(9, 3))
plt.plot(x, y)
plt.show()
2. Scatter Plot¶
# Scatter Plot
x = np.random.rand(50)
y = np.random.rand(50)
plt.figure(figsize=(9, 3))
plt.scatter(x, y)
plt.show()
3. Bar Chart¶
# Bar Chart
x = ['A', 'B', 'C', 'D', 'E']
y = [3, 7, 2, 5, 8]
plt.figure(figsize=(9, 3))
plt.bar(x, y)
plt.show()
4. Histogram¶
# Histogram
data = np.random.randn(1000)
plt.figure(figsize=(9, 3))
plt.hist(data, bins=30)
plt.show()
5. Pie Chart¶
# Pie Chart
sizes = [215, 130, 245, 210]
labels = ['A', 'B', 'C', 'D']
plt.figure(figsize=(9, 3))
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.show()
Practical 09¶
Demonstration of Scikit-learn and other machine learning libraries such as keras, tensorflow etc.
- Scikit-Learn for Classification (e.g., Iris dataset)
- Using Keras for Neural Networks
1. Scikit-Learn for Classification (e.g., Iris dataset)¶
# Import the function to load iris dataset from sklearn.datasets
from sklearn.datasets import load_iris
# Load the iris dataset
iris = load_iris()
# Store the feature matrix (X) and response vector (y)
# The feature matrix 'X' is a multidimensional array containing the features that the model will learn from
# The response vector 'y' is an array containing the target variable that the model will predict
X = iris.data
y = iris.target
# Store the feature and target names
# 'feature_names' is a list of the names of each feature
# 'target_names' is a list of the names of each target class
feature_names = iris.feature_names
target_names = iris.target_names
# Print the feature and target names
print("Feature names:", feature_names)
print("Target names:", target_names)
# Print the type of the feature matrix 'X'
print("\nType of X is:", type(X))
# Print the first 5 rows of the feature matrix 'X'
print("\nFirst 5 rows of X:\n", X[:5])
Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] Target names: ['setosa' 'versicolor' 'virginica'] Type of X is: <class 'numpy.ndarray'> First 5 rows of X: [[5.1 3.5 1.4 0.2] [4.9 3. 1.4 0.2] [4.7 3.2 1.3 0.2] [4.6 3.1 1.5 0.2] [5. 3.6 1.4 0.2]]
2. Using Keras for Neural Networks¶
# Import necessary modules from Keras
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
# Set the batch size, number of classes, and number of epochs
batch_size = 128
num_classes = 10
epochs = 5
# Load the MNIST dataset and split it into training and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Reshape the data to fit the model
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
# Convert the data to float32 type
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# Normalize the pixel values (between 0 and 1)
x_train /= 255
x_test /= 255
# Convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# Create a Sequential model
model = Sequential()
# Add a dense layer with 512 units, ReLU activation and input shape of 784
model.add(Dense(512, activation='relu', input_shape=(784,)))
# Add a dropout layer to prevent overfitting
model.add(Dropout(0.2))
# Add another dense layer with 512 units and ReLU activation
model.add(Dense(512, activation='relu'))
# Add another dropout layer
model.add(Dropout(0.2))
# Add a dense output layer with 10 units (one for each class) and softmax activation
model.add(Dense(num_classes, activation='softmax'))
# Compile the model with categorical crossentropy loss, RMSprop optimizer and accuracy metric
model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])
# Fit the model on the training data, with specified batch size, epochs and verbosity
# Also pass the test data for validation
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test))
# Evaluate the model on the test data and print the test loss and accuracy
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11490434/11490434 [==============================] - 0s 0us/step Epoch 1/5 469/469 [==============================] - 9s 18ms/step - loss: 0.2549 - accuracy: 0.9214 - val_loss: 0.1087 - val_accuracy: 0.9664 Epoch 2/5 469/469 [==============================] - 8s 18ms/step - loss: 0.1056 - accuracy: 0.9681 - val_loss: 0.0850 - val_accuracy: 0.9724 Epoch 3/5 469/469 [==============================] - 9s 18ms/step - loss: 0.0735 - accuracy: 0.9773 - val_loss: 0.0675 - val_accuracy: 0.9797 Epoch 4/5 469/469 [==============================] - 8s 16ms/step - loss: 0.0573 - accuracy: 0.9819 - val_loss: 0.0659 - val_accuracy: 0.9809 Epoch 5/5 469/469 [==============================] - 9s 18ms/step - loss: 0.0485 - accuracy: 0.9850 - val_loss: 0.0710 - val_accuracy: 0.9796 Test loss: 0.07101037353277206 Test accuracy: 0.9796000123023987