Open In Colab

Practical 01¶

Programs to understand the control structures of python.

  1. Python program to print "Hello Python"
  2. Python program to find the area of a triangle
  3. Python Program to Check Leap Year
  4. Python Program to Find the Sum of Natural Numbers
  5. Python Program to Print all Prime Numbers between an Interval

1. Python program to print "Hello Python"¶

In [ ]:
print ("Hello Python")
Hello Python

2. Python program to find the area of a triangle¶

In [ ]:
# import math module
import math

# take inputs
a = float(input('Enter first side: '))
b = float(input('Enter second side: '))
c = float(input('Enter third side: '))

# calculate the semi-perimeter
s = (a + b + c) / 2

# calculate the area using heron's formula
area = math.sqrt(s * (s - a) * (s - b) * (s - c))
print('The area of the triangle is %0.2f' % area)
Enter first side: 15
Enter second side: 30
Enter third side: 45
The area of the triangle is 0.00

3. Python Program to Check Leap Year¶

In [ ]:
year = int(input("Enter a year: "))

if year % 4 == 0:
    if year % 100 == 0:
        if year % 400 == 0:
            print(year, "is a leap year.")
        else:
            print(year, "is not a leap year.")
    else:
        print(year, "is a leap year.")
else:
    print(year, "is not a leap year.")
Enter a year: 2009
2009 is not a leap year.

4. Python Program to Find the Sum of Natural Numbers¶

In [ ]:
# Take input from the user
num = int(input("Enter a number: "))
sum = 0

# use while loop to iterate un till zero
while(num > 0):
    sum += num
    num -= 1

print("The sum is", sum)
Enter a number: 14
The sum is 105

5. Python Program to Print all Prime Numbers between an Interval¶

In [ ]:
# define a function
def prime_numbers(start, end):
    for num in range(start, end + 1):
        if num > 1:
            for i in range(2, num):
                if (num % i) == 0:
                    break
            else:
                print(num)


# take input from the user
start = int(input("Enter the start of the interval: "))
end = int(input("Enter the end of the interval: "))
print("Prime numbers between", start, "and", end, "are:")
prime_numbers(start, end)
Enter the start of the interval: 5
Enter the end of the interval: 50
Prime numbers between 5 and 50 are:
5
7
11
13
17
19
23
29
31
37
41
43
47

Practical 02¶

Develop Programs to learn different data-types (string “ ”, set { }, list [ ], dictionary { : }, tuple ( )) in python.

  1. Python Program to Remove Punctuation from a String
  2. Python Program to Illustrate Different Set Operations – Union, Intersection, Difference, Symmetric Difference
  3. Python Program to demonstrate list slicing
  4. Python Program to compare two lists
  5. Python Program to Check If a List is Empty
  6. Python Program Concatenate Two Lists
  7. Python Program to Merge Two Dictionaries
  8. Python Program to Iterate Over Dictionaries Using for Loop
  9. Python Program to Sort a Dictionary by Value
  10. Python Program to Find the size of a Tuple
  11. Python Program to find Sum of Tuple’s elements (numbers)
  12. Python Program to Count the Number of Each Vowel

1. Python Program to Remove Punctuation from a String¶

In [ ]:
# Define a string of punctuation characters to remove from the input string
punctuations = "='!()-[]0};:""\,<>./?@#$%^&'"

# Prompt the user to enter a string
my_str = input("Enter a string:- ")

# Initialize an empty string to hold the input string with punctuation removed
no_punct = ""

# Iterate over each character in the input string
for char in my_str:
    # If the character is not in the punctuation string, add it to the no_punct string
    if char not in punctuations:
        no_punct = no_punct + char

# Print the input string with punctuation removed
print(no_punct)
Enter a string:- Hello ! My name is Parth.
Hello  My name is Parth

2. Python Program to Illustrate Different Set Operations – Union, Intersection, Difference, Symmetric Difference¶

In [ ]:
# define two sets
set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}

# union of two sets
print("Union of two sets:", set1 | set2)

# intersection of two sets
print("Intersection of two sets:", set1 & set2)

# difference of two sets
print("Difference of two sets:", set1 - set2)

# symmetric difference of two sets
print("Symmetric difference of two sets:", set1 ^ set2)
Union of two sets: {1, 2, 3, 4, 5, 6, 7, 8}
Intersection of two sets: {4, 5}
Difference of two sets: {1, 2, 3}
Symmetric difference of two sets: {1, 2, 3, 6, 7, 8}

3. Python Program to demonstrate list slicing¶

In [ ]:
# Define a list of numbers
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Print the entire list
print("Original list:", numbers)

# Print the first three elements of the list
print("First three elements:", numbers[:3])

# Print the last three elements of the list
print("Last three elements:", numbers[-3:])

# Print every other element of the list
print("Every other element:", numbers[::2])

# Print the elements of the list in reverse order
print("Reversed list:", numbers[::-1])
Original list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
First three elements: [1, 2, 3]
Last three elements: [8, 9, 10]
Every other element: [1, 3, 5, 7, 9]
Reversed list: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

4. Python Program to compare two lists¶

In [ ]:
# Define two lists of numbers
list1 = [1, 2, 3, 4, 5]
list2 = [1, 2, 3, 4, 6]

# Check if the two lists are equal
if list1 == list2:
    print("The two lists are equal")
else:
    print("The two lists are not equal")

# Check if the two lists have the same elements (order doesn't matter)
if set(list1) == set(list2):
    print("The two lists have the same elements")
else:
    print("The two lists do not have the same elements")
The two lists are not equal
The two lists do not have the same elements

5. Python Program to Check If a List is Empty¶

In [ ]:
# Define a list of numbers
my_list = [1, 2, 3]

# Check if the list is empty using the len() function
if len(my_list) == 0:
    print("The list is empty")
else:
    print("The list is not empty")
The list is not empty

6. Python Program Concatenate Two Lists¶

In [ ]:
# Define two lists of numbers
list1 = [1, 2, 3]
list2 = [4, 5, 6]

# Concatenate the two lists using the + operator
concatenated_list = list1 + list2

# Print the concatenated list
print("Concatenated list:", concatenated_list)
Concatenated list: [1, 2, 3, 4, 5, 6]

7. Python Program to Merge Two Dictionaries¶

In [ ]:
# Define two dictionaries
dict1 = {'a': 1, 'b': 2}
dict2 = {'c': 3, 'd': 4}

# Merge the two dictionaries using the update() method
merged_dict = dict1.copy()
merged_dict.update(dict2)

# Print the merged dictionary
print("Merged dictionary:", merged_dict)
Merged dictionary: {'a': 1, 'b': 2, 'c': 3, 'd': 4}

8. Python Program to Iterate Over Dictionaries Using for Loop¶

In [ ]:
# Define a dictionary of key-value pairs
my_dict = {'a': 1, 'b': 2, 'c': 3}

# Iterate over the dictionary using a for loop
for key, value in my_dict.items():
    print(key, value)
a 1
b 2
c 3

9. Python Program to Sort a Dictionary by Value¶

In [ ]:
# Define a dictionary of key-value pairs
my_dict = {'apple': 5, 'banana': 2, 'orange': 4, 'pear': 3}

# Sort the dictionary by value using the sorted() function and a lambda function
sorted_dict = dict(sorted(my_dict.items(), key=lambda item: item[1]))

# Print the sorted dictionary
print("Sorted dictionary:", sorted_dict)
Sorted dictionary: {'banana': 2, 'pear': 3, 'orange': 4, 'apple': 5}

10. Python Program to Find the size of a Tuple¶

In [ ]:
# Define a tuple of numbers
my_tuple = (1, 2, 3, 4, 5)

# Find the size of the tuple using the len() function
tuple_size = len(my_tuple)

# Print the size of the tuple
print("Size of the tuple:", tuple_size)
Size of the tuple: 5

11. Python Program to find Sum of Tuple’s elements (numbers)¶

In [ ]:
# Define a tuple
my_tuple = (10, 20, 30, 40, 50)

# Initialize a variable to store the sum
sum_of_elements = 0

# Iterate through the tuple and add each element to the sum
for element in my_tuple:
    sum_of_elements += element

# Print the sum of elements
print("Sum of elements in the tuple:", sum_of_elements)
Sum of elements in the tuple: 150

12. Python Program to Count the Number of Each Vowel¶

In [ ]:
# Define a string of vowels
vowels = 'aAeEiIoOuU'

# Prompt the user to enter a string
my_str = input("Enter a string: ")

# Convert the string to lowercase
my_str = my_str.lower()

# Initialize a dictionary to hold the vowel counts
vowel_counts = {}

# Iterate over each character in the string
for char in my_str:
    # If the character is a vowel, increment its count in the dictionary
    if char in vowels:
        if char in vowel_counts:
            vowel_counts[char] += 1
        else:
            vowel_counts[char] = 1

# Print the vowel counts
for vowel, count in vowel_counts.items():
    print(vowel, count)
Enter a string: Hello, this is an example text.
e 4
o 1
i 2
a 2

Practical 03¶

Develop Programs to learn concept of functions scoping, recursion and list mutability

  1. Python Program to demonstrate use of local, nonlocal & global variable
  2. Python Program to Make a Simple Calculator
  3. Python Program to Find Factorial of Number Using Recursion
  4. Python Program to Display Fibonacci Sequence Using Recursion

1. Python Program to demonstrate use of local, nonlocal & global¶

In [ ]:
# Global variable
global_variable = 10

def example_function():
    # Local variable
    local_variable = 5
    print("Inside the function:")
    print("Local variable:", local_variable)  # Access local variable
    print("Global variable:", global_variable)  # Access global variable

example_function()

print("\nOutside the function:")
# Trying to access local_variable outside the function will result in an error
print("Global variable:", global_variable)  # Access global variable

# Nested function with nonlocal
def outer_function():
    outer_variable = 20

    def inner_function():
        nonlocal outer_variable  # Access and modify the outer_variable
        outer_variable = 30
        print("Inside inner_function - nonlocal variable:", outer_variable)

    inner_function()
    print("Inside outer_function - nonlocal variable:", outer_variable)

outer_function()

print("\nOutside the outer_function:")
# Trying to access outer_variable outside the function will result in an error
Inside the function:
Local variable: 5
Global variable: 10

Outside the function:
Global variable: 10
Inside inner_function - nonlocal variable: 30
Inside outer_function - nonlocal variable: 30

Outside the outer_function:

2. Python Program to Make a Simple Calculator¶

In [ ]:
# Define a function to add two numbers
def add(num1, num2):
    return num1 + num2

# Define a function to subtract two numbers
def subtract(num1, num2):
    return num1 - num2

# Define a function to multiply two numbers
def multiply(num1, num2):
    return num1 * num2

# Define a function to divide two numbers
def divide(num1, num2):
    return num1 / num2

# Prompt the user to enter two numbers and an operation
num1 = float(input("Enter the first number: "))
num2 = float(input("Enter the second number: "))
operation = input("Enter the operation (+, -, *, /): ")

# Perform the selected operation on the two numbers
if operation == '+':
    result = add(num1, num2)
elif operation == '-':
    result = subtract(num1, num2)
elif operation == '*':
    result = multiply(num1, num2)
elif operation == '/':
    result = divide(num1, num2)
else:
    print("Invalid operation selected.")
    result = None

# Print the result of the operation
if result is not None:
    print("Result:", result)
Enter the first number: 15
Enter the second number: 34
Enter the operation (+, -, *, /): +
Result: 49.0

3. Python Program to Find Factorial of Number Using Recursion¶

In [ ]:
# Define a function to calculate the factorial of a number
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

# Prompt the user to enter a number
num = int(input("Enter a number: "))

# Calculate the factorial of the number using the factorial() function
result = factorial(num)

# Print the result
print("Factorial of", num, "is", result)
Enter a number: 14
Factorial of 14 is 87178291200

4. Python Program to Display Fibonacci Sequence Using Recursion¶

In [ ]:
# Define a function to calculate the nth Fibonacci number
def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)

# Prompt the user to enter the number of terms to display
num_terms = int(input("Enter the number of terms to display: "))

# Display the Fibonacci sequence using the fibonacci() function
for i in range(num_terms):
    print(fibonacci(i))
Enter the number of terms to display: 10
0
1
1
2
3
5
8
13
21
34

Practical 04¶

Develop Programs to understand working of exception handling and assertions.

  1. Program Program to depict else clause with try-expect
  2. Python Program to demonstrate finally
  3. Python Program to depict Raising Exception
  4. Python Program using assertions

1. Program Program to depict else clause with try-expect¶

In [ ]:
def divide(a, b):
    try:
        result = a / b
    except ZeroDivisionError:
        print("Error: Cannot divide by zero!")
    else:
        print(f"The result of {a} divided by {b} is {result}")

divide(10, 2) # Output: The result of 10 divided by 2 is 5.0
divide(5, 0) # Output: Error: Cannot divide by zero!
The result of 10 divided by 2 is 5.0
Error: Cannot divide by zero!

2. Python Program to demonstrate finally¶

In [ ]:
try:
    x = int(input("Enter a number: "))
    y = int(input("Enter another number: "))
    result = x / y
except ZeroDivisionError:
    print("Error: Cannot divide by zero!")
else:
    print(f"The result of {x} divided by {y} is {result}")
finally:
    print("This will always execute, regardless of whether an exception was raised or not.")
Enter a number: 24
Enter another number: 12
The result of 24 divided by 12 is 2.0
This will always execute, regardless of whether an exception was raised or not.

3. Python Program to depict Raising Exception¶

In [ ]:
def divide(a, b):
    if b == 0:
        raise ZeroDivisionError("Cannot divide by zero!")
    else:
        return a / b

try:
    result = divide(10, 0)
except ZeroDivisionError as e:
    print(e)
else:
    print(f"The result is {result}")
Cannot divide by zero!

4. Python Program using assertions¶

In [ ]:
def check_positive_number(number):
    assert number > 0, "Number must be positive"
    return number

try:
    num = int(input("Enter a positive number: "))
    result = check_positive_number(num)
    print(f"You entered a positive number: {result}")
except ValueError:
    print("Invalid input. Please enter a valid positive number.")
except AssertionError as e:
    print(f"Assertion error: {e}")
Enter a positive number: 0
Assertion error: Number must be positive

Practical 05¶

Develop programs to demonstrate use of NumPy

Refer : https://numpy.org/doc/stable/user/quickstart.html

  1. Print Numpy version and configuration information.
  2. Create a numpy array with numbers ranging from 50 to 100.Also print attributes of created object.
  3. Create a null vector of size 10.
  4. Create a vector of size 10 initialized with random integers with a seed equal to last four digits of your enrolment number.
  5. Create a matrix of size 5x5 initialized with random integers with a seed equal to last four digits of your enrolment number.
  6. Create an identity matrix of size 5x5.
  7. Do the following
    • Create a numpy array mat1 of size 5x2 with numbers ranging from 1 to 10.
    • Create another numpy array mat2 of size 2x5 with floating point numbers between 10 to 20.
    • Calculate the matrix product of mat1 and mat2.
    • Print vector containing minimum and maximum in each row of mat1.
    • Print vector containing mean and standard deviation in each column of mat2.
    • Perform vstack and hstack on mat1 and mat2.
    • Split mat1 horizontal into 2 and split mat2 vertical into 2 parts.
  8. Create a vector vec of size 20 initialized with double numbers ranging from 20 to 40 and do the following.
    • Reshape it to a 5x4 matrix mat.
    • Print every 4th element of vec.
    • Print elements of vec less than or equal to 25.
    • In vec assign every number at even location to 1.
    • In vec assign from start position to 10th position every second element to 0.
    • Reverse elements of vec.
    • In mat print each row in the second column and print each column in the second row.
    • Apply some universal functions such as sin,sqrt,cos,exp on vec.
    • Print transpose and inverse of mat.
  9. Demonstrate shallow copy and deep copy.

1. Print Numpy version and configuration information.¶

In [ ]:
import numpy as np

print("NumPy version:", np.__version__)
print("\nNumPy configuration:")
np.show_config()
NumPy version: 1.23.5

NumPy configuration:
openblas64__info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
    runtime_library_dirs = ['/usr/local/lib']
blas_ilp64_opt_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
    runtime_library_dirs = ['/usr/local/lib']
openblas64__lapack_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
    runtime_library_dirs = ['/usr/local/lib']
lapack_ilp64_opt_info:
    libraries = ['openblas64_', 'openblas64_']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
    runtime_library_dirs = ['/usr/local/lib']
Supported SIMD extensions in this NumPy install:
    baseline = SSE,SSE2,SSE3
    found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
    not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL

2. Create a numpy array with numbers ranging from 50 to 100.Also print attributes of created object.¶

In [ ]:
arr = np.arange(50, 101)
print(arr)
print("Shape of array:", arr.shape)
print("Data type of array elements:", arr.dtype)
print("Number of dimensions of array:", arr.ndim)
[ 50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67
  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100]
Shape of array: (51,)
Data type of array elements: int64
Number of dimensions of array: 1

3. Create a null vector of size 10.¶

In [ ]:
null_vector = np.zeros(10)
print(null_vector)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

4. Create a vector of size 10 initialized with random integers with a seed equal to last four digits of your enrolment number.¶

In [ ]:
np.random.seed(7133)
random_vector = np.random.randint(0, 100, size=10)
print(random_vector)
[20 86 93 78 39 95 92 60 11 16]

5. Create a matrix of size 5x5 initialized with random integers with a seed equal to last four digits of your enrolment number.¶

In [ ]:
np.random.seed(7133)
matrix = np.random.randint(0, 100, size=(5, 5))
print(matrix)
[[20 86 93 78 39]
 [95 92 60 11 16]
 [45  8 19 16 65]
 [93 95 31 26 83]
 [24 65 22  5 95]]

6. Create an identity matrix of size 5x5.¶

In [ ]:
identity_matrix = np.identity(5)
print(identity_matrix)
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

7. Do the following¶

Create a numpy array mat1 of size 5x2 with numbers ranging from 1 to 10.¶

In [ ]:
mat1 = np.arange(1, 11).reshape(5, 2)
print(mat1)
[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]

Create another numpy array mat2 of size 2x5 with floating point numbers between 10 to 20.¶

In [ ]:
mat2 = np.random.uniform(10, 20, size=(2, 5))
print(mat2)
[[13.14036652 16.57157439 13.4856098  19.84901782 15.93328273]
 [17.91219832 19.03925499 10.25847916 17.49775131 14.59391677]]

Calculate the matrix product of mat1 and mat2.¶

In [ ]:
mat1 = np.arange(1, 11).reshape(5, 2)
mat2 = np.random.uniform(10, 20, size=(2, 5))

mat_product = np.dot(mat1, mat2)
print(mat_product)
[[ 49.38993502  48.41417736  49.04024609  35.40692619  41.32757539]
 [117.31621038 114.0877628  116.48572366  81.26452379 100.82181511]
 [185.24248575 179.76134824 183.93120123 127.12212138 160.31605483]
 [253.16876111 245.43493369 251.3766788  172.97971898 219.81029456]
 [321.09503648 311.10851913 318.82215637 218.83731658 279.30453428]]

Print vector containing minimum and maximum in each row of mat1.¶

In [ ]:
mat1 = np.arange(1, 11).reshape(5, 2)

min_max_vec = np.column_stack((np.min(mat1, axis=1), np.max(mat1, axis=1)))
print(min_max_vec)
[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]

Print vector containing mean and standard deviation in each column of mat2.¶

In [ ]:
mat2 = np.random.uniform(10, 20, size=(2, 5))

mean_std_vec = np.column_stack((np.mean(mat2, axis=0), np.std(mat2, axis=0)))
print(mean_std_vec)
[[15.3786871   1.5405699 ]
 [15.70136899  2.36388109]
 [14.84204549  1.71781312]
 [13.49937687  1.68200076]
 [12.94592071  0.28293471]]

Perform vstack and hstack on mat1 and mat2.¶


In [ ]:
# Assuming you have two NumPy arrays mat1 and mat2
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])

# Vertical Stack (vstack)
vertical_stack_result = np.vstack((mat1, mat2))

# Horizontal Stack (hstack)
horizontal_stack_result = np.hstack((mat1, mat2))

print("Original mat1:")
print(mat1)
print("Original mat2:")
print(mat2)

print("Vertical Stack Result:")
print(vertical_stack_result)

print("Horizontal Stack Result:")
print(horizontal_stack_result)
Original mat1:
[[1 2]
 [3 4]]
Original mat2:
[[5 6]
 [7 8]]
Vertical Stack Result:
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
Horizontal Stack Result:
[[1 2 5 6]
 [3 4 7 8]]

Split mat1 horizontal into 2 and split mat2 vertical into 2 parts.¶

In [ ]:
mat1 = np.arange(1, 11).reshape(5, 2)
mat2 = np.random.uniform(10, 20, size=(2, 5))

# Split mat1 horizontally into 2
mat1_split_h = np.hsplit(mat1, 2)
print("mat1 split horizontally into 2:\n", mat1_split_h)

# Split mat2 vertically into 2
mat2_split_v = np.vsplit(mat2, 2)
print("\nmat2 split vertically into 2:\n", mat2_split_v)
mat1 split horizontally into 2:
 [array([[1],
       [3],
       [5],
       [7],
       [9]]), array([[ 2],
       [ 4],
       [ 6],
       [ 8],
       [10]])]

mat2 split vertically into 2:
 [array([[15.78556787, 17.50629083, 14.88323419, 11.78221015, 14.87120469]]), array([[14.81754071, 16.19864425, 16.63558727, 17.08566406, 16.16187844]])]

8. Create a vector vec of size 20 initialized with double numbers ranging from 20 to 40 and do the following.¶

In [ ]:
import numpy as np

# Create vector vec of size 20 initialized with double numbers ranging from 20 to 40
vec = np.linspace(20, 40, 20, dtype=np.float64)

Reshape it to a 5x4 matrix mat.¶

In [ ]:
# Reshape vec to a 5x4 matrix mat
mat = vec.reshape(5, 4)

Print every 4th element of vec.¶

In [ ]:
# Print every 4th element of vec
print(vec[3::4])
[23.15789474 27.36842105 31.57894737 35.78947368 40.        ]

Print elements of vec less than or equal to 25.¶

In [ ]:
# Print elements of vec less than or equal to 25
print(vec[vec <= 25])
[20.         21.05263158 22.10526316 23.15789474 24.21052632]

In vec assign every number at even location to 1.¶

In [ ]:
# In vec assign every number at even location to 1
vec[1::2] = 1

In vec assign from start position to 10th position every second element to 0.¶

In [ ]:
# In vec assign from start position to 10th position every second element to 0
vec[:10:2] = 0

Reverse elements of vec.¶

In [ ]:
# Reverse elements of vec
vec = vec[::-1]

In mat print each row in the second column and print each column in the second row.¶

In [ ]:
# In mat print each row in the second column and print each column in the second row
print(mat[:, 1])
print(mat[1, :])
[1. 1. 1. 1. 1.]
[0. 1. 0. 1.]

Apply some universal functions such as sin, sqrt, cos, exp on vec.¶

In [ ]:
# Apply some universal functions such as sin,sqrt,cos,exp on vec
print(np.sin(vec))
print(np.sqrt(vec))
print(np.cos(vec))
print(np.exp(vec))
[ 0.84147098  0.94843344  0.84147098 -0.75588615  0.84147098 -0.17836339
  0.84147098  0.93759646  0.84147098 -0.77682669  0.84147098  0.
  0.84147098  0.          0.84147098  0.          0.84147098  0.
  0.84147098  0.        ]
[1.         6.24078268 1.         6.06976979 1.         5.89379692
 1.         5.71240571 1.         5.52506251 1.         0.
 1.         0.         1.         0.         1.         0.
 1.         0.        ]
[ 0.54030231  0.31697636  0.54030231  0.65470308  0.54030231 -0.98396469
  0.54030231  0.3477253   0.54030231  0.62971446  0.54030231  1.
  0.54030231  1.          0.54030231  1.          0.54030231  1.
  0.54030231  1.        ]
[2.71828183e+00 8.21537118e+16 2.71828183e+00 1.00074405e+16
 2.71828183e+00 1.21904249e+15 2.71828183e+00 1.48495972e+14
 2.71828183e+00 1.80888310e+13 2.71828183e+00 1.00000000e+00
 2.71828183e+00 1.00000000e+00 2.71828183e+00 1.00000000e+00
 2.71828183e+00 1.00000000e+00 2.71828183e+00 1.00000000e+00]

Print transpose and inverse of mat.¶

In [ ]:
mat = np.array([[1, 2, 3],
               [4, 5, 6],
               [7, 8, 9]])

# Transpose of the matrix
transpose_mat = np.transpose(mat)

# Inverse of the matrix
try:
    inverse_mat = np.linalg.inv(mat)
except np.linalg.LinAlgError:
    inverse_mat = "Matrix is not invertible"

# Print the original matrix, its transpose, and its inverse
print("Original Matrix:")
print(mat)

print("\nTranspose of the Matrix:")
print(transpose_mat)

print("\nInverse of the Matrix:")
print(inverse_mat)
Original Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Transpose of the Matrix:
[[1 4 7]
 [2 5 8]
 [3 6 9]]

Inverse of the Matrix:
Matrix is not invertible

9. Demonstrate shallow copy and deep copy.¶

In [ ]:
import copy

# Create a list of lists
original_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Shallow copy
shallow_copy = copy.copy(original_list)

# Deep copy
deep_copy = copy.deepcopy(original_list)

# Modify the original list
original_list[0][0] = 0

# Print the original list, shallow copy, and deep copy
print("Original list:", original_list)
print("Shallow copy:", shallow_copy)
print("Deep copy:", deep_copy)
Original list: [[0, 2, 3], [4, 5, 6], [7, 8, 9]]
Shallow copy: [[0, 2, 3], [4, 5, 6], [7, 8, 9]]
Deep copy: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Practical 06¶

Python File Handling to load text & image data

  1. Create File with two columns X1 and Y1. Generate 20 rows with random data for values of X1 and Y1.
  2. Demonstrate concept of streaming and sampling using above generated file. Fetch every even record from file and Fetch any 5 random records from file to demonstrate sampling.
  3. Download and load image data from the following URL. https://unsplash.com/photos/pypeCEaJeZY

In [ ]:
import pandas as pd
import numpy as np

# Generate 20 random numbers for X1 and Y1
X1 = np.random.rand(20)
Y1 = np.random.rand(20)

# Create a DataFrame
df = pd.DataFrame({'X1': X1, 'Y1': Y1})

# Save the DataFrame to a csv file
df.to_csv('data.csv', index=False)

1. Create File with two columns X1 and Y1. Generate 20 rows with random data for values of X1 and Y1.¶

In [ ]:
# Read the csv file
df = pd.read_csv('data.csv')

# Display the data
print(df)
          X1        Y1
0   0.825299  0.580684
1   0.443933  0.539453
2   0.520570  0.920555
3   0.589448  0.593961
4   0.770221  0.073119
5   0.196157  0.889510
6   0.692413  0.109250
7   0.449811  0.395146
8   0.644630  0.755827
9   0.026888  0.861718
10  0.497937  0.891846
11  0.562219  0.354087
12  0.319644  0.610723
13  0.560438  0.129359
14  0.804205  0.090613
15  0.095709  0.943955
16  0.597264  0.396839
17  0.666371  0.968839
18  0.998911  0.449929
19  0.512409  0.218404

2. Demonstrate concept of streaming and sampling using above generated file. Fetch every even record from file and Fetch any 5 random records from file to demonstrate sampling.¶

In [ ]:
# Read the csv file
df = pd.read_csv('data.csv')

# Fetch every even record from the file
even_records = df[df.index % 2 == 0]
print("Even records:\n", even_records)

# Fetch any 5 random records from the file
random_records = df.sample(n=5)
print("\nRandom records:\n", random_records)
Even records:
           X1        Y1
0   0.825299  0.580684
2   0.520570  0.920555
4   0.770221  0.073119
6   0.692413  0.109250
8   0.644630  0.755827
10  0.497937  0.891846
12  0.319644  0.610723
14  0.804205  0.090613
16  0.597264  0.396839
18  0.998911  0.449929

Random records:
           X1        Y1
6   0.692413  0.109250
9   0.026888  0.861718
10  0.497937  0.891846
2   0.520570  0.920555
15  0.095709  0.943955

3. Download and load image data from the following URL. https://unsplash.com/photos/pypeCEaJeZY¶

In [ ]:
import requests
from PIL import Image
from io import BytesIO
from IPython.display import display

# URL of the image
url = "https://images.unsplash.com/photo-1542744173-05336fcc7ad4"

# Send a HTTP request to the URL of the image
response = requests.get(url)

# Load the image
img = Image.open(BytesIO(response.content))

# Display the image
display(img)
No description has been provided for this image

Practical 07¶

Develop Programs to demonstrate use of Pandas for Conditioning Your Data

URL for test.xml file. https://drive.google.com/file/d/1FqOWhY2XNYkHwCBYOjhAILCzVUo9QEp6

URL for indian_food.csv file. https://drive.google.com/file/d/1CNAdqFZ-Amji8kOMd4GovivK8UKVLQ-p

  1. Read the xml file test.xml and create a dataframe from it and do the following.
    • Find and print duplicate records.
    • Remove duplicates and save data in other dataframe.
  2. Read the csv file indian_food.csv. Consider value -1 for missing or NA values.(Replace -1 with NaN when reading a csv file.)
    • Print the first and last 10 records of dataframe, also print column names and summary of data. Print information about data such as data types of each column.
    • Convert columns with name course,diet,flavor_profile,state,region to categorical data type & print data type for dataframe using info function.
    • Categories are defined as follows.

      Course ['dessert' 'main course' 'starter' 'snack']

      Flavor_profile ['sweet' 'spicy' 'bitter' 'sour']

      State ['West Bengal' 'Rajasthan' 'Punjab' 'Uttar Pradesh' 'Odisha' 'Maharashtra' 'Uttarakhand' 'Assam' 'Bihar' 'Andhra Pradesh' 'Karnataka' 'Telangana' 'Kerala' 'Tamil Nadu' 'Gujarat' 'Tripura' 'Manipur' 'Nagaland' 'NCT of Delhi' 'Jammu & Kashmir' 'Chhattisgarh' 'Haryana' 'MadhyaPradesh' 'Goa']

      Region ['East' 'West' 'North' nan 'North East' 'South' 'Central']

    • Print name of items with course as dessert.
    • Print count of items with flavor_profile with sweet type.
    • Print name of items with cooking_time < prep_time.
    • Print summary of data grouped by diet column.
    • Print average cooking_time & prep_time for vegetarian diet type.
    • Insert a new column with column name as total_time which contains sum of cooking_time & prep_time into existing dataframe.
    • Print name,cooking_time,prep_time,total_time of items with total_time >=500
    • Print count of items with various flavour_profile per region.
    • Find & print records with missing data in the state column.
    • Fill missing data in the state column with -.
  3. Write regular expression,

    To extract phone numbers (+dd-dddd-dddd) from the following text

    “Hey my number is +01-555-1212 & his number is +01-770-1410”

    To extract email addresses from the following text.

    “You can contact to abcd@gmail.co.in or to xyzw@yahoo.in”

  4. Demonstrate stemming & stop word removal using nltk library for content given below

    Most of the world will make decisions by either guessing or using their gut. They will beeither lucky or wrong.

    The goal is to turn data into information and information into insight.

  5. Using a 20 newsgroup dataset, create and demonstrate a bag of words model.Also convert theraw newsgroup documents into a matrix of TF-IDF feature.

1. Read the xml file test.xml and create a dataframe from it and do the following.¶

Find and print duplicate records.¶

In [ ]:
import pandas as pd  # Importing pandas library for data manipulation and analysis
import xml.etree.ElementTree as ET  # Importing ElementTree from xml.etree for parsing and creating XML data
tree = ET.parse('test.xml')  # Parsing the XML file
root = tree.getroot()  # Getting the root element of the XML document
data = []  # Initializing an empty list to store the data
for record in root:  # Iterating over each record in the root element
    row = {}  # Initializing an empty dictionary to store each row of data
    for item in record:  # Iterating over each item in the record
        row[item.tag] = item.text  # Adding the item's tag as the key and the item's text as the value to the row dictionary
    data.append(row)  # Appending the row dictionary to the data list
df = pd.DataFrame(data)  # Creating a DataFrame from the data list
print(df)  # Printing the DataFrame
  Category Quantity  Price
0      NaN      NaN    NaN
1        A        3  24.50
2        B        1  89.99
3        A        5   4.95
4        A        3  66.00
5        B       10    .99
6        A        3  24.50
7        A       15  29.00
8        B        8   6.99
9        A       15  29.00

Remove duplicates and save data in other dataframe.¶

In [ ]:
duplicates = df[df.duplicated()]  # Finding duplicate rows in the DataFrame
print("Duplicate Records:")  # Printing a string for clarity
print(duplicates)  # Printing the duplicate rows
df_no_duplicates = df.drop_duplicates()  # Removing duplicate rows from the DataFrame
print("DataFrame without Duplicates:")  # Printing a string for clarity
print(df_no_duplicates)  # Printing the DataFrame without duplicates
Duplicate Records:
  Category Quantity  Price
6        A        3  24.50
9        A       15  29.00
DataFrame without Duplicates:
  Category Quantity  Price
0      NaN      NaN    NaN
1        A        3  24.50
2        B        1  89.99
3        A        5   4.95
4        A        3  66.00
5        B       10    .99
7        A       15  29.00
8        B        8   6.99

2. Read the csv file indian_food.csv. Consider value -1 for missing or NA values.(Replace -1 with NaN when reading a csv file.)¶

In [ ]:
import pandas as pd

# Read the CSV file
df_csv = pd.read_csv('indian_food.csv', na_values=-1)

Print the first and last 10 records of dataframe, also print column names and summary of data. Print information about data such as data types of each column.¶

In [ ]:
# Print the first 10 records
print(df_csv.head(10))

# Print the last 10 records
print(df_csv.tail(10))

# Print column names
print(df_csv.columns)

# Print summary of data
print(df_csv.describe(include='all'))

# Print information about data
print(df_csv.info())
             name                                        ingredients  \
0      Balu shahi                    Maida flour, yogurt, oil, sugar   
1          Boondi                            Gram flour, ghee, sugar   
2  Gajar ka halwa       Carrots, milk, sugar, ghee, cashews, raisins   
3          Ghevar  Flour, ghee, kewra, milk, clarified butter, su...   
4     Gulab jamun  Milk powder, plain flour, baking powder, ghee,...   
5          Imarti                          Sugar syrup, lentil flour   
6          Jalebi  Maida, corn flour, baking soda, vinegar, curd,...   
7      Kaju katli                     Cashews, ghee, cardamom, sugar   
8        Kalakand                        Milk, cottage cheese, sugar   
9           Kheer                    Milk, rice, sugar, dried fruits   

         diet  prep_time  cook_time flavor_profile   course          state  \
0  vegetarian       45.0       25.0          sweet  dessert    West Bengal   
1  vegetarian       80.0       30.0          sweet  dessert      Rajasthan   
2  vegetarian       15.0       60.0          sweet  dessert         Punjab   
3  vegetarian       15.0       30.0          sweet  dessert      Rajasthan   
4  vegetarian       15.0       40.0          sweet  dessert    West Bengal   
5  vegetarian       10.0       50.0          sweet  dessert    West Bengal   
6  vegetarian       10.0       50.0          sweet  dessert  Uttar Pradesh   
7  vegetarian       10.0       20.0          sweet  dessert            NaN   
8  vegetarian       20.0       30.0          sweet  dessert    West Bengal   
9  vegetarian       10.0       40.0          sweet  dessert            NaN   

  region  
0   East  
1   West  
2  North  
3   West  
4   East  
5   East  
6  North  
7    NaN  
8   East  
9    NaN  
                  name                                        ingredients  \
245         Pani Pitha  Tea leaves, white sesame seeds, dry coconut, s...   
246             Payokh  Basmati rice, rose water, sugar, clarified but...   
247  Prawn malai curry      Coconut milk, prawns, garlic, turmeric, sugar   
248           Red Rice  Red pepper, red onion, butter, watercress, oli...   
249             Shukto  Green beans, bitter gourd, ridge gourd, banana...   
250          Til Pitha            Glutinous rice, black sesame seeds, gur   
251            Bebinca  Coconut milk, egg yolks, clarified butter, all...   
252             Shufta  Cottage cheese, dry dates, dried rose petals, ...   
253          Mawa Bati  Milk powder, dry fruits, arrowroot powder, all...   
254             Pinaca  Brown rice, fennel seeds, grated coconut, blac...   

               diet  prep_time  cook_time flavor_profile       course  \
245      vegetarian       10.0       20.0            NaN  main course   
246      vegetarian        NaN        NaN          sweet      dessert   
247  non vegetarian       15.0       50.0          spicy  main course   
248      vegetarian        NaN        NaN            NaN  main course   
249      vegetarian       10.0       20.0          spicy  main course   
250      vegetarian        5.0       30.0          sweet      dessert   
251      vegetarian       20.0       60.0          sweet      dessert   
252      vegetarian        NaN        NaN          sweet      dessert   
253      vegetarian       20.0       45.0          sweet      dessert   
254      vegetarian        NaN        NaN          sweet      dessert   

               state      region  
245            Assam  North East  
246            Assam  North East  
247      West Bengal        East  
248              NaN         NaN  
249      West Bengal        East  
250            Assam  North East  
251              Goa        West  
252  Jammu & Kashmir       North  
253   Madhya Pradesh     Central  
254              Goa        West  
Index(['name', 'ingredients', 'diet', 'prep_time', 'cook_time',
       'flavor_profile', 'course', 'state', 'region'],
      dtype='object')
              name              ingredients        diet   prep_time  \
count          255                      255         255  225.000000   
unique         255                      252           2         NaN   
top     Balu shahi  Gram flour, ghee, sugar  vegetarian         NaN   
freq             1                        2         226         NaN   
mean           NaN                      NaN         NaN   35.386667   
std            NaN                      NaN         NaN   76.241081   
min            NaN                      NaN         NaN    5.000000   
25%            NaN                      NaN         NaN   10.000000   
50%            NaN                      NaN         NaN   10.000000   
75%            NaN                      NaN         NaN   20.000000   
max            NaN                      NaN         NaN  500.000000   

         cook_time flavor_profile       course    state region  
count   227.000000            226          255      231    241  
unique         NaN              4            4       24      6  
top            NaN          spicy  main course  Gujarat   West  
freq           NaN            133          129       35     74  
mean     38.911894            NaN          NaN      NaN    NaN  
std      49.421711            NaN          NaN      NaN    NaN  
min       2.000000            NaN          NaN      NaN    NaN  
25%      20.000000            NaN          NaN      NaN    NaN  
50%      30.000000            NaN          NaN      NaN    NaN  
75%      45.000000            NaN          NaN      NaN    NaN  
max     720.000000            NaN          NaN      NaN    NaN  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 255 entries, 0 to 254
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   name            255 non-null    object 
 1   ingredients     255 non-null    object 
 2   diet            255 non-null    object 
 3   prep_time       225 non-null    float64
 4   cook_time       227 non-null    float64
 5   flavor_profile  226 non-null    object 
 6   course          255 non-null    object 
 7   state           231 non-null    object 
 8   region          241 non-null    object 
dtypes: float64(2), object(7)
memory usage: 18.1+ KB
None

Convert columns with name course,diet,flavor_profile,state,region to categorical data type & print data type for dataframe using info function.¶

In [ ]:
# Define a list of column names that we want to convert to 'category' data type.
columns_to_convert = ['course', 'diet', 'flavor_profile', 'state', 'region']

# Convert the data type of the specified columns to 'category'.
df_csv[columns_to_convert] = df_csv[columns_to_convert].astype('category')

# Print the string "Data Types (After Conversion):" to the console.
print("Data Types (After Conversion):")

# Print the data types of the columns in the DataFrame to the console, after the conversion.
print(df_csv.dtypes)
Data Types (After Conversion):
name                object
ingredients         object
diet              category
prep_time          float64
cook_time          float64
flavor_profile    category
course            category
state             category
region            category
dtype: object

Categories are defined as follows.¶

Course ['dessert' 'main course' 'starter' 'snack']

Flavor_profile ['sweet' 'spicy' 'bitter' 'sour']

State ['West Bengal' 'Rajasthan' 'Punjab' 'Uttar Pradesh' 'Odisha' 'Maharashtra' 'Uttarakhand' 'Assam' 'Bihar' 'Andhra Pradesh' 'Karnataka' 'Telangana' 'Kerala' 'Tamil Nadu' 'Gujarat' 'Tripura' 'Manipur' 'Nagaland' 'NCT of Delhi' 'Jammu & Kashmir' 'Chhattisgarh' 'Haryana' 'MadhyaPradesh' 'Goa']

Region ['East' 'West' 'North' nan 'North East' 'South' 'Central']

In [ ]:
# Define a dictionary where the keys are column names and the values are lists of categories for each column.
category_definitions = {
    'course': ['dessert', 'main course', 'starter', 'snack'],
    'flavor_profile': ['sweet', 'spicy', 'bitter', 'sour'],
    'state': [
        'West Bengal', 'Rajasthan', 'Punjab', 'Uttar Pradesh', 'Odisha',
        'Maharashtra', 'Uttarakhand', 'Assam', 'Bihar', 'Andhra Pradesh',
        'Karnataka', 'Telangana', 'Kerala', 'Tamil Nadu', 'Gujarat',
        'Tripura', 'Manipur', 'Nagaland', 'NCT of Delhi',
        'Jammu & Kashmir', 'Chhattisgarh', 'Haryana', 'Madhya Pradesh', 'Goa'
    ],
    'region': ['East', 'West', 'North', 'North East', 'South', 'Central']
}

# Loop over the items in the dictionary.
for column, categories in category_definitions.items():
    # Assert that all categories are in the categories of the column in the DataFrame.
    assert all(cat in df_csv[column].cat.categories for cat in categories)

# Print the last column name to the console.
print(column)

# Print the last list of categories to the console.
print(categories)
region
['East', 'West', 'North', 'North East', 'South', 'Central']

Print name of items with course as dessert.¶

In [ ]:
# Filter the DataFrame to only include rows where the 'course' column is 'dessert'.
desserts = df_csv[df_csv['course'] == 'dessert']

# Print the string "Dessert Items:" to the console.
print("Dessert Items:")

# Print the 'name' column of the filtered DataFrame to the console.
print(desserts['name'])
Dessert Items:
0          Balu shahi
1              Boondi
2      Gajar ka halwa
3              Ghevar
4         Gulab jamun
            ...      
250         Til Pitha
251           Bebinca
252            Shufta
253         Mawa Bati
254            Pinaca
Name: name, Length: 85, dtype: object

Print count of items with flavor_profile with sweet type.¶

In [ ]:
# Filter the DataFrame to only include rows where the 'flavor_profile' column is 'sweet'.
sweet_items = df_csv[df_csv['flavor_profile'] == 'sweet']

# Print the string "Count of Sweet Items:" and the number of sweet items to the console.
print("Count of Sweet Items:", len(sweet_items))
Count of Sweet Items: 88

Print name of items with cooking_time < prep_time.¶

In [ ]:
# Filter the DataFrame to only include rows where the 'cook_time' column is less than the 'prep_time' column.
fast_cooking_items = df_csv[df_csv['cook_time'] < df_csv['prep_time']]

# Print the string "Items with Cooking Time < Prep Time:" to the console.
print("Items with Cooking Time < Prep Time:")

# Print the 'name' column of the filtered DataFrame to the console.
print(fast_cooking_items['name'])
Items with Cooking Time < Prep Time:
0               Balu shahi
1                   Boondi
14                  Phirni
29               Misti doi
33               Ras malai
35                 Sandesh
46          Obbattu holige
48                Poornalu
54               Kajjikaya
66          Chak Hao Kheer
81           Chicken Tikka
94                 Khichdi
96           Kulfi falooda
104                   Naan
109              Pani puri
114            Pindi chana
122       Tandoori Chicken
123    Tandoori Fish Tikka
124                   Attu
128                   Dosa
129               Idiappam
130                   Idli
144            Masala Dosa
151              Pesarattu
155                  Puttu
157                Sandige
158                  Sevai
178          Kutchi dabeli
202      Sabudana Khichadi
207                Surnoli
212          Lilva Kachori
Name: name, dtype: object

Print summary of data grouped by diet column.¶

In [ ]:
# Group the DataFrame by the 'diet' column and calculate summary statistics for each group.
diet_summary = df_csv.groupby('diet').describe()

# Print the string "Summary of Data Grouped by Diet:" to the console.
print("Summary of Data Grouped by Diet:")

# Print the summary statistics of the grouped DataFrame to the console.
print(diet_summary)
Summary of Data Grouped by Diet:
               prep_time                                                      \
                   count       mean        std  min   25%   50%   75%    max   
diet                                                                           
non vegetarian      19.0  41.842105  74.259502  5.0  10.0  10.0  17.5  240.0   
vegetarian         206.0  34.791262  76.570389  5.0  10.0  11.0  20.0  500.0   

               cook_time                                                     
                   count     mean        std   min   25%   50%   75%    max  
diet                                                                         
non vegetarian      19.0  40.0000  22.422707  15.0  30.0  35.0  42.5  120.0  
vegetarian         208.0  38.8125  51.213850   2.0  20.0  30.0  45.0  720.0  

Print average cooking_time & prep_time for vegetarian diet type.¶

In [ ]:
# Filter the DataFrame to only include rows where the 'diet' column is 'vegetarian'.
vegetarian_diet = df_csv[df_csv['diet'] == 'vegetarian']

# Calculate the mean of the 'cooking_time' column for the filtered DataFrame.
average_cooking_time = vegetarian_diet['cook_time'].mean()

# Calculate the mean of the 'prep_time' column for the filtered DataFrame.
average_prep_time = vegetarian_diet['prep_time'].mean()

# Print the string "Average Cooking Time for Vegetarian Diet:" and the average cooking time to the console.
print("Average Cooking Time for Vegetarian Diet:", average_cooking_time)

# Print the string "Average Prep Time for Vegetarian Diet:" and the average preparation time to the console.
print("Average Prep Time for Vegetarian Diet:", average_prep_time)
Average Cooking Time for Vegetarian Diet: 38.8125
Average Prep Time for Vegetarian Diet: 34.79126213592233

Insert a new column with column name as total_time which contains sum of cooking_time & prep_time into existing dataframe.¶

In [ ]:
# Add the 'cooking_time' and 'prep_time' columns to create a new 'total_time' column in the DataFrame.
df_csv['total_time'] = df_csv['cook_time'] + df_csv['prep_time']

Print name,cooking_time,prep_time,total_time of items with total_time >=500¶

In [ ]:
# Filter the DataFrame to only include rows where the 'total_time' column is greater than or equal to 500.
total_time_gt_500 = df_csv[df_csv['total_time'] >= 500]

# Print the string "Items with Total Time >= 500:" to the console.
print("Items with Total Time >= 500:")

# Print the 'name', 'cooking_time', 'prep_time', and 'total_time' columns of the filtered DataFrame to the console.
print(total_time_gt_500[['name', 'cook_time', 'prep_time', 'total_time']])
Items with Total Time >= 500:
            name  cook_time  prep_time  total_time
29     Misti doi       30.0      480.0       510.0
62     Shrikhand      720.0       10.0       730.0
114  Pindi chana      120.0      500.0       620.0
155        Puttu       40.0      495.0       535.0

Print count of items with various flavour_profile per region.¶

In [ ]:
# Group the DataFrame by the 'region' and 'flavor_profile' columns, calculate the size of each group, and unstack the resulting series into a DataFrame, filling missing values with 0.
flavor_profile_per_region = df_csv.groupby(['region', 'flavor_profile']).size().unstack(fill_value=0)

# Print the string "Count of Items with Flavor Profile per Region:" to the console.
print("Count of Items with Flavor Profile per Region:")

# Print the DataFrame of counts of items with each flavor profile per region to the console.
print(flavor_profile_per_region)
Count of Items with Flavor Profile per Region:
flavor_profile  bitter  sour  spicy  sweet
region                                    
Central              0     0      2      1
East                 0     0      6     22
North                2     0     35     10
North East           0     0     13      7
South                0     0     30     19
West                 2     1     41     23

Find & print records with missing data in the state column.¶

In [ ]:
# Filter the DataFrame to only include rows where the 'state' column is missing (NaN).
missing_state_data = df_csv[df_csv['state'].isna()]

# Print the string "Records with Missing Data in the State Column:" to the console.
print("Records with Missing Data in the State Column:")

# Print the DataFrame of records with missing state data to the console.
print(missing_state_data)
Records with Missing Data in the State Column:
               name                                        ingredients  \
7        Kaju katli                     Cashews, ghee, cardamom, sugar   
9             Kheer                    Milk, rice, sugar, dried fruits   
10            Laddu                            Gram flour, ghee, sugar   
12        Nankhatai  Refined flour, besan, ghee, powdered sugar, yo...   
94          Khichdi  Moong dal, green peas, ginger, tomato, green c...   
96    Kulfi falooda  Rose syrup, falooda sev, mixed nuts, saffron, ...   
98   Lauki ki subji  Bottle gourd, coconut oil, garam masala, ginge...   
109       Pani puri      Kala chana, mashed potato, boondi, sev, lemon   
111           Papad       Urad dal, sev, lemon juice, chopped tomatoes   
115    Rajma chaval  Red kidney beans, garam masala powder, ginger,...   
117          Samosa  Potatoes, green peas, garam masala, ginger, dough   
128            Dosa  Chana dal, urad dal, whole urad dal, blend ric...   
130            Idli  Split urad dal, urad dal, idli rice, thick poh...   
144     Masala Dosa  Chana dal, urad dal, potatoes, idli rice, thic...   
145         Pachadi  Coconut oil, cucumber, curd, curry leaves, mus...   
149         Payasam            Rice, cashew nuts, milk, raisins, sugar   
154           Rasam  Tomato, curry leaves, garlic, mustard seeds, h...   
156          Sambar  Pigeon peas, eggplant, drumsticks, sambar powd...   
158           Sevai                     Sevai, parboiled rice, steamer   
161         Uttapam    Chana dal, urad dal, thick poha, tomato, butter   
162            Vada  Urad dal, ginger, curry leaves, green chilies,...   
164            Upma   Chana dal, urad dal, ginger, curry leaves, sugar   
231      Brown Rice                   Brown rice, soy sauce, olive oil   
248        Red Rice  Red pepper, red onion, butter, watercress, oli...   

           diet  prep_time  cook_time flavor_profile       course state  \
7    vegetarian       10.0       20.0          sweet      dessert   NaN   
9    vegetarian       10.0       40.0          sweet      dessert   NaN   
10   vegetarian       10.0       40.0          sweet      dessert   NaN   
12   vegetarian       20.0       30.0          sweet      dessert   NaN   
94   vegetarian       40.0       20.0          spicy  main course   NaN   
96   vegetarian       45.0       25.0          sweet      dessert   NaN   
98   vegetarian       10.0       20.0          spicy  main course   NaN   
109  vegetarian       15.0        2.0          spicy        snack   NaN   
111  vegetarian        5.0        5.0          spicy        snack   NaN   
115  vegetarian       15.0       90.0          spicy  main course   NaN   
117  vegetarian       30.0       30.0          spicy        snack   NaN   
128  vegetarian      360.0       90.0          spicy        snack   NaN   
130  vegetarian      360.0       90.0          spicy        snack   NaN   
144  vegetarian      360.0       90.0          spicy        snack   NaN   
145  vegetarian       10.0       25.0            NaN  main course   NaN   
149  vegetarian       15.0       30.0          sweet      dessert   NaN   
154  vegetarian       10.0       35.0          spicy  main course   NaN   
156  vegetarian       20.0       45.0          spicy  main course   NaN   
158  vegetarian      120.0       30.0            NaN  main course   NaN   
161  vegetarian       10.0       20.0          spicy        snack   NaN   
162  vegetarian       15.0       20.0          spicy        snack   NaN   
164  vegetarian       10.0       20.0          spicy        snack   NaN   
231  vegetarian       15.0       25.0            NaN  main course   NaN   
248  vegetarian        NaN        NaN            NaN  main course   NaN   

    region  total_time  
7      NaN        30.0  
9      NaN        50.0  
10     NaN        50.0  
12     NaN        50.0  
94     NaN        60.0  
96     NaN        70.0  
98     NaN        30.0  
109    NaN        17.0  
111    NaN        10.0  
115  North       105.0  
117    NaN        60.0  
128  South       450.0  
130  South       450.0  
144  South       450.0  
145  South        35.0  
149  South        45.0  
154  South        45.0  
156  South        65.0  
158  South       150.0  
161  South        30.0  
162  South        35.0  
164    NaN        30.0  
231    NaN        40.0  
248    NaN         NaN  

Fill missing data in the state column with -.¶

In [ ]:
# Convert the data type of the 'state' column to string.
df_csv['state'] = df_csv['state'].astype(str)

# Fill missing values in the 'state' column with '-'.
df_csv['state'].fillna('-', inplace=True)

3. Write regular expression¶

To extract phone numbers (+dd-dddd-dddd) from the following text

“Hey my number is +01-555-1212 & his number is +01-770-1410”

To extract email addresses from the following text.

“You can contact to abcd@gmail.co.in or to xyzw@yahoo.in”

In [ ]:
import re  # Importing the regular expression module

# A string containing two phone numbers
text = "Hey my number is +01-555-1212 & his number is +01-770-1410"

# Using a regular expression to find all phone numbers in the string
phone_numbers = re.findall(r'\+\d{2}-\d{3}-\d{4}', text)

# Printing a message to the console
print("Extracted Phone Numbers:")

# Printing the extracted phone numbers to the console
print(phone_numbers)

# A string containing two email addresses
text2 = "You can contact abcd@gmail.co.in or xyzw@yahoo.in"

# Using a regular expression to find all email addresses in the string
email_addresses = re.findall(r'\S+@\S+', text2)

# Printing a message to the console
print("Extracted Email Addresses:")

# Printing the extracted email addresses to the console
print(email_addresses)
Extracted Phone Numbers:
['+01-555-1212', '+01-770-1410']
Extracted Email Addresses:
['abcd@gmail.co.in', 'xyzw@yahoo.in']

4. Demonstrate stemming & stop word removal using nltk library for content given below¶

Most of the world will make decisions by either guessing or using their gut. They will beeither lucky or wrong.

The goal is to turn data into information and information into insight.

In [ ]:
import nltk  # Importing the Natural Language Toolkit (NLTK)
nltk.data.path.append('/root/nltk_data')  # Adding the path to NLTK data
nltk.download('punkt')  # Downloading the Punkt Tokenizer Models
nltk.download('stopwords')  # Downloading the Stopwords Corpus
from nltk.corpus import stopwords  # Importing the stopwords corpus
from nltk.stem import PorterStemmer  # Importing the Porter Stemmer

# A string containing a sentence
text = "Stemming and stop word removal are common text preprocessing techniques."

# Tokenizing the sentence into words
words = nltk.word_tokenize(text)

# Getting a set of English stop words
stop_words = set(stopwords.words('english'))

# Filtering out the stop words from the tokenized words
filtered_words = [word for word in words if word.lower() not in stop_words]

# Creating a Porter Stemmer object
stemmer = PorterStemmer()

# Stemming the filtered words
stemmed_words = [stemmer.stem(word) for word in filtered_words]

# Printing the original tokenized words
print("Original Words: ", words)

# Printing the words after stop word removal
print("Filtered Words (Stop Word Removal): ", filtered_words)

# Printing the words after stemming
print("Stemmed Words: ", stemmed_words)
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Original Words:  ['Stemming', 'and', 'stop', 'word', 'removal', 'are', 'common', 'text', 'preprocessing', 'techniques', '.']
Filtered Words (Stop Word Removal):  ['Stemming', 'stop', 'word', 'removal', 'common', 'text', 'preprocessing', 'techniques', '.']
Stemmed Words:  ['stem', 'stop', 'word', 'remov', 'common', 'text', 'preprocess', 'techniqu', '.']
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.

5. Using a 20 newsgroup dataset, create and demonstrate a bag of words model.Also convert theraw newsgroup documents into a matrix of TF-IDF feature.¶

In [ ]:
from sklearn.datasets import fetch_20newsgroups  # Importing the 20 newsgroups dataset from sklearn
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer  # Importing CountVectorizer and TfidfVectorizer

# Fetching the 20 newsgroups dataset, removing headers, footers, and quotes
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))

# Creating a CountVectorizer object
count_vectorizer = CountVectorizer()

# Transforming the newsgroups data into a document-term matrix using CountVectorizer
X_count = count_vectorizer.fit_transform(newsgroups.data)

# Creating a TfidfVectorizer object
tfidf_vectorizer = TfidfVectorizer()

# Transforming the newsgroups data into a document-term matrix using TfidfVectorizer
X_tfidf = tfidf_vectorizer.fit_transform(newsgroups.data)

# Selecting a document index
document_index = 0

# Printing the original document
print("Original Document:")
print(newsgroups.data[document_index])

# Printing the Bag-of-Words (BoW) representation of the document
print("\nBoW Representation:")
print(X_count[document_index])

# Printing the TF-IDF representation of the document
print("\nTF-IDF Representation:")
print(X_tfidf[document_index])
Original Document:


I am sure some bashers of Pens fans are pretty confused about the lack
of any kind of posts about the recent Pens massacre of the Devils. Actually,
I am  bit puzzled too and a bit relieved. However, I am going to put an end
to non-PIttsburghers' relief with a bit of praise for the Pens. Man, they
are killing those Devils worse than I thought. Jagr just showed you why
he is much better than his regular season stats. He is also a lot
fo fun to watch in the playoffs. Bowman should let JAgr have a lot of
fun in the next couple of games since the Pens are going to beat the pulp out of Jersey anyway. I was very disappointed not to see the Islanders lose the final
regular season game.          PENS RULE!!!



BoW Representation:
  (0, 24810)	3
  (0, 114159)	1
  (0, 110739)	1
  (0, 29449)	1
  (0, 89588)	8
  (0, 93532)	5
  (0, 52374)	1
  (0, 26465)	3
  (0, 96770)	1
  (0, 39025)	1
  (0, 22362)	2
  (0, 116790)	10
  (0, 73447)	1
  (0, 25769)	1
  (0, 71770)	1
  (0, 96012)	1
  (0, 101388)	1
  (0, 79249)	1
  (0, 44272)	2
  (0, 22945)	1
  (0, 31039)	3
  (0, 98219)	1
  (0, 118091)	1
  (0, 25260)	1
  (0, 102231)	1
  :	:
  (0, 65467)	2
  (0, 95129)	1
  (0, 32176)	1
  (0, 108815)	1
  (0, 74531)	1
  (0, 60784)	1
  (0, 86996)	1
  (0, 40171)	1
  (0, 56396)	1
  (0, 109383)	1
  (0, 29827)	1
  (0, 98002)	1
  (0, 91152)	1
  (0, 68989)	1
  (0, 25803)	1
  (0, 126472)	1
  (0, 124216)	1
  (0, 44988)	1
  (0, 88021)	1
  (0, 107582)	1
  (0, 67699)	1
  (0, 76006)	1
  (0, 53470)	1
  (0, 56382)	1
  (0, 105100)	1

TF-IDF Representation:
  (0, 105100)	0.08272635762898674
  (0, 56382)	0.06548149992571357
  (0, 53470)	0.07970882460224814
  (0, 76006)	0.08520403767167795
  (0, 67699)	0.10133321981346254
  (0, 107582)	0.047734013015898615
  (0, 88021)	0.029186309768862773
  (0, 44988)	0.11435087413757543
  (0, 124216)	0.046797861842828435
  (0, 126472)	0.03431094478562303
  (0, 25803)	0.06762450917821239
  (0, 68989)	0.09928167974488611
  (0, 91152)	0.03854632657683808
  (0, 98002)	0.13907784133493692
  (0, 29827)	0.08892315942651909
  (0, 109383)	0.05105878014696189
  (0, 56396)	0.07131187545939574
  (0, 40171)	0.06922150866167542
  (0, 86996)	0.06301015973631587
  (0, 60784)	0.0271997995927387
  (0, 74531)	0.05641076322334231
  (0, 108815)	0.046785253119473694
  (0, 32176)	0.12719876177804612
  (0, 95129)	0.09279827218104486
  (0, 65467)	0.045003949086708295
  :	:
  (0, 102231)	0.12606018701082405
  (0, 25260)	0.021079342325985767
  (0, 118091)	0.050494985289006714
  (0, 98219)	0.13122965511734613
  (0, 31039)	0.1787890311357878
  (0, 22945)	0.0582631519062005
  (0, 44272)	0.19818164703955615
  (0, 79249)	0.09799294739501874
  (0, 101388)	0.07788730502152183
  (0, 96012)	0.08438145186248154
  (0, 71770)	0.06356353510772614
  (0, 25769)	0.03649294356115389
  (0, 73447)	0.08082262581632702
  (0, 116790)	0.1812172345919807
  (0, 22362)	0.07108512930815473
  (0, 39025)	0.09292472333557168
  (0, 96770)	0.0648969397428524
  (0, 26465)	0.08914949300938675
  (0, 52374)	0.08665300135000918
  (0, 93532)	0.49835327837061033
  (0, 89588)	0.1703687096002322
  (0, 29449)	0.13122965511734613
  (0, 110739)	0.03797468797425647
  (0, 114159)	0.05473671466917944
  (0, 24810)	0.13884931659983238

Practical 08¶

Develop Programs to visualize the data using matplotlib

  1. Line Plot
  2. Scatter Plot
  3. Bar Chart
  4. Histogram
  5. Pie Chart

In [ ]:
import matplotlib.pyplot as plt
import numpy as np

1. Line Plot¶

In [ ]:
# Line Plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(9, 3))
plt.plot(x, y)
plt.show()
No description has been provided for this image

2. Scatter Plot¶

In [ ]:
# Scatter Plot
x = np.random.rand(50)
y = np.random.rand(50)
plt.figure(figsize=(9, 3))
plt.scatter(x, y)
plt.show()
No description has been provided for this image

3. Bar Chart¶

In [ ]:
# Bar Chart
x = ['A', 'B', 'C', 'D', 'E']
y = [3, 7, 2, 5, 8]
plt.figure(figsize=(9, 3))
plt.bar(x, y)
plt.show()
No description has been provided for this image

4. Histogram¶

In [ ]:
# Histogram
data = np.random.randn(1000)
plt.figure(figsize=(9, 3))
plt.hist(data, bins=30)
plt.show()
No description has been provided for this image

5. Pie Chart¶

In [ ]:
# Pie Chart
sizes = [215, 130, 245, 210]
labels = ['A', 'B', 'C', 'D']
plt.figure(figsize=(9, 3))
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.show()
No description has been provided for this image

Practical 09¶

Demonstration of Scikit-learn and other machine learning libraries such as keras, tensorflow etc.

  1. Scikit-Learn for Classification (e.g., Iris dataset)
  2. Using Keras for Neural Networks

1. Scikit-Learn for Classification (e.g., Iris dataset)¶

In [ ]:
# Import the function to load iris dataset from sklearn.datasets
from sklearn.datasets import load_iris

# Load the iris dataset
iris = load_iris()

# Store the feature matrix (X) and response vector (y)
# The feature matrix 'X' is a multidimensional array containing the features that the model will learn from
# The response vector 'y' is an array containing the target variable that the model will predict
X = iris.data
y = iris.target

# Store the feature and target names
# 'feature_names' is a list of the names of each feature
# 'target_names' is a list of the names of each target class
feature_names = iris.feature_names
target_names = iris.target_names

# Print the feature and target names
print("Feature names:", feature_names)
print("Target names:", target_names)

# Print the type of the feature matrix 'X'
print("\nType of X is:", type(X))

# Print the first 5 rows of the feature matrix 'X'
print("\nFirst 5 rows of X:\n", X[:5])
Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Target names: ['setosa' 'versicolor' 'virginica']

Type of X is: <class 'numpy.ndarray'>

First 5 rows of X:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]

2. Using Keras for Neural Networks¶

In [ ]:
# Import necessary modules from Keras
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

# Set the batch size, number of classes, and number of epochs
batch_size = 128
num_classes = 10
epochs = 5

# Load the MNIST dataset and split it into training and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Reshape the data to fit the model
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

# Convert the data to float32 type
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalize the pixel values (between 0 and 1)
x_train /= 255
x_test /= 255

# Convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# Create a Sequential model
model = Sequential()

# Add a dense layer with 512 units, ReLU activation and input shape of 784
model.add(Dense(512, activation='relu', input_shape=(784,)))

# Add a dropout layer to prevent overfitting
model.add(Dropout(0.2))

# Add another dense layer with 512 units and ReLU activation
model.add(Dense(512, activation='relu'))

# Add another dropout layer
model.add(Dropout(0.2))

# Add a dense output layer with 10 units (one for each class) and softmax activation
model.add(Dense(num_classes, activation='softmax'))

# Compile the model with categorical crossentropy loss, RMSprop optimizer and accuracy metric
model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])

# Fit the model on the training data, with specified batch size, epochs and verbosity
# Also pass the test data for validation
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test))

# Evaluate the model on the test data and print the test loss and accuracy
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
Epoch 1/5
469/469 [==============================] - 9s 18ms/step - loss: 0.2549 - accuracy: 0.9214 - val_loss: 0.1087 - val_accuracy: 0.9664
Epoch 2/5
469/469 [==============================] - 8s 18ms/step - loss: 0.1056 - accuracy: 0.9681 - val_loss: 0.0850 - val_accuracy: 0.9724
Epoch 3/5
469/469 [==============================] - 9s 18ms/step - loss: 0.0735 - accuracy: 0.9773 - val_loss: 0.0675 - val_accuracy: 0.9797
Epoch 4/5
469/469 [==============================] - 8s 16ms/step - loss: 0.0573 - accuracy: 0.9819 - val_loss: 0.0659 - val_accuracy: 0.9809
Epoch 5/5
469/469 [==============================] - 9s 18ms/step - loss: 0.0485 - accuracy: 0.9850 - val_loss: 0.0710 - val_accuracy: 0.9796
Test loss: 0.07101037353277206
Test accuracy: 0.9796000123023987