The concept of data types is about the various classes or categories of data in any programming language. In Python, every data value also belongs to one particular data type. For example, '1' belongs to the 'int' data type, '1.5' belongs to the 'float' data type, and so on.
In Python, everything is an object, and objects of any data type are known as variables. So '1' is an instance of the 'int' class, and '1.5' is an instance of the 'float' class. Note: We will learn about variables later.
In the category of numbers in Python, there are three forms of data types: integers, floating numbers, and complex numbers. Earlier versions of Python (Python 2.x) had an additional data type called "long" for storing very large integer values (~10¹⁰), but this has been removed in Python 3.x and is considered integer only.
The first data type is the integer, represented by the "int" keyword. In Python 3.x, there is no upper limit on the values of integers, as it depends on the system's memory. The larger the system's memory, the higher the integer value can be. For example, if one system has 10 kb of space left and another system has 100 kb of space, the second system will be able to hold a higher value of the integer.
We can use the built-in Python functions type() or isinstance() to determine the data type of any value or variable. Here isinstance function is used to check whether any value or variable belongs to the mentioned data type or not and accordingly throws output as True or False. For example:
>>> a = 10
>>> type(a)
<class 'int'>
>>> type(10)
<class 'int'>
>>> isinstance(10, int)
True
>>> isinstance(7.11, int)
False
In Python, integer objects are immutable. This means that once an integer object is created, its value cannot be changed. Instead, a new integer object is created whenever a mathematical operation is performed on an integer. Here's an example code to demonstrate this:
# create an integer object
x = 5
# print the id of the object
print(id(x))
# output: memory address of 'a'
# perform a mathematical operation on the object
x = x + 1
# print the id of the object again
print(id(x))
# output: a different memory address
# create another integer object with the same value
y = 5
# print the id of the second object
print(id(y))
# output: the same memory address as the first object
Some common use cases of Integers in machine learning
Floating-point numbers are represented by the term "float". The key difference between "float" and "int" is that float can hold values that fall between two integers like 2.7. The presence of decimal points distinguishes this data type. A number with many digits after the decimal can be expressed using scientific notation. This notation uses the letter "e" or "E" followed by a positive or negative integer to indicate the magnitude of the number.
>>> a = 10.0
>>> type(a)
<class 'float'>
>>> b = 7.11
>>> type(b)
<class 'float'>
>>> isinstance(a, float)
True
>>> isinstance(a, int)
False
### Scientific Notation example
>>> 7.11e11
711000000000.0
The maximum value for any floating-point number can be approximately 1.8 x 10³⁰⁸. Python treats numbers beyond this maximum as infinity.
>>> 1.79e308
1.79e+308
>>> 1.8e308
inf
Python has built-in support for complex numbers, which are represented using the complex type. The complex type consists of two floating-point numbers, the real part and the imaginary part, separated by a plus sign (+) and the letter "j". The letter "j" represents the square root of the negative one, which defines the imaginary component of a complex number.
>>> a = 3 + 4j
>>> type(a)
<class 'complex'>
Python provides several built-in functions for working with complex numbers, including abs, real, imag, and conjugate.
Here are some examples of using these functions:
z = 3 + 4j
print(abs(z)) # prints 5.0
print(z.real) # prints 3.0
print(z.imag) # prints 4.0
print(z.conjugate()) # prints (3-4j)
Python also provides operators for working with complex numbers, including addition, subtraction, multiplication, division, and exponentiation. These operators work just like their counterparts for real numbers, but they take into account the imaginary part of the numbers. Here's an example of using the addition operator with complex numbers:
z1 = 3 + 4j
z2 = 1 - 2j
z3 = z1 + z2
print(z3) # prints (4+2j)
Python's cmath module provides additional functions for working with complex numbers, such as sqrt, exp, and log. These functions can be useful for more advanced calculations involving complex numbers. We would suggest exploring some examples of this.
Complex numbers are not commonly used in machine learning and data science because the majority of the data and algorithms used in these fields involve real numbers. But there are some areas where complex numbers can be useful. One such area is signal processing, particularly in the analysis of audio and image data.
Here we use complex numbers to represent the Fourier transform of a signal i.e. a common technique for analyzing the frequency components of a signal. The complex numbers represent both the amplitude and phase of the frequency components.
In Python, we represent a sequence of characters as strings, denoted by the keyword "str". Boundaries of any string data type are defined by either a single quote or a double quote ("or ""). Depending on our system's memory, we can store many characters in strings and an empty.
>>> a = 'Single quote string'
>>> b = "Double quote string"
>>> type(a)
<class 'str'>
>>> type(b)
<class 'str'>
### empty string
>>> ''
''
But what if we have a single quote present as a character? For example: 'We represent a single quote using 'as a character'.
>>>'We represent a single quote using ' as a character'
SyntaxError: invalid syntax
As shown, it will produce a syntax error as the single opening quote got paired with a single closing quote (present before the "as" word), and the characters beyond that do not have any single opening quote. To avoid these errors, we have two fixes for that:
>>> "We represent a single quote using ' as a character"
"We represent a single quote using ' as a character"
Placing a backslash in front of the quote character makes Python treat it as normal and forget its special meaning. There are several other examples as well, where the escape sequence changes the behaviour of the normal/special characters in the strings, like:
>>> print("anb")
anb
### Placing backslash before n, makes it a newline character
>>> print("a\nb")
a
b
### Placing backslash before t, makes it a tab character
>>> print("a\tb")
a b
### Placing backslash before backslash removes the special meaning ### of backslash
>>> print("a\\nb")
a\nb
Here are some critical insights about Python String
Python strings are immutable, which means once a string is created, it cannot be modified. But we can create a new string by concatenating two or more strings using the + operator.
first_name = 'Enjoy'
last_name = 'Algorithms'
full_name = first_name + ' ' + last_name
print(full_name) # prints "Enjoy Algorithms"
Python provides several built-in methods for working with strings, including len, lower, upper, strip, split, and join. These methods allow you to perform common string operations, such as getting the length of a string, converting a string to lowercase or uppercase, removing whitespace from the beginning and end of a string, splitting a string into a list of substrings, and joining a list of strings into a single string.
Here are some examples:
my_string = "Hello, World!"
print(len(my_string)) # prints 19
print(my_string.lower()) # prints "hello, world!"
print(my_string.strip()) # prints "Hello, World!"
print(my_string.split(',')) # prints ["Hello", " World!"]
print(' '.join(['Hello', 'World'])) # prints "Hello World"
Python provides support for regular expressions, which are powerful tools for working with strings. The re module provides functions for searching, replacing, and manipulating strings using regular expressions. Note: We would suggest exploring the use cases of regular expressions.
In machine learning and data science, strings are often used to represent text data, which is a common type of data in natural language processing (NLP) tasks such as sentiment analysis, language translation, and text classification.
In Python 3, we have a boolean data type that can take either of two values: True (with capital T) or False (with capital F). We can check the type of the variable as:
>>> type(True)
<class 'bool'>
>>> type(False)
<class 'bool'>
This data type is used to check the truth of any statement. In Python, we use single "=" to assign value to the variable and double "==" to check the statement's validity.
>>> a = 5
>>> a == 5
True
In Python, the boolean values True and False are actually just special cases of the integers 1 and 0, respectively. This means you can perform arithmetic operations on boolean values, such as adding or multiplying them:
print(True + True) # prints 2
print(False * 10) # prints 0
Python provides several operators for working with boolean values, including and, or, and not. These operators allow you to combine boolean expressions and create more complex conditions:
x = 5
y = 10
z = 15
print(x < y and y < z) # prints True
print(x < y or y > z) # prints True
print(not(x == y)) # prints True
Python also provides several functions for working with boolean values, such as all and any, which allow you to check if all or any of the elements in an iterable are true:
my_list = [True, False, True]
print(all(my_list)) # prints False
print(any(my_list)) # prints True
In Python, we have the flexibility to change the datatype of values or variables, but only when the conversion is valid. For example, 2.0 is a floating-point, and we can convert it into an integer like this:
## Float to int conversion
>>> a = int(2.0)
>>> type(a)
<class 'int'>
## Int to float conversion
>>> a = float(2)
>>> type(a)
<class 'float'>
## Float to int conversion example
>>> a = int(7.11)
>>> a
7
In the third example of float-to-int conversion, we used the "int(7.11)" function to convert a floating-point number to an integer. This function rounds down the floating-point number to the nearest lower integer (7). The result would be the same whether the float value was 7.11 or 7.9.
This type of conversion is also possible for strings, but only when the string contains numerical characters. In other words, strings representing numbers can be converted to integers or floating-point numbers. Let's see some examples.
>>> a = '2022'
>>> type(a)
<class 'str'>
>>> int(a)
2022
>>> float(a)
2022.0
But, when the strings would not have numbers, then it will produce the ValueError like this:
>>> a = '1 1'
>>> int(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1 1'
That means such a type of conversion is not allowed. In other words, all float or int data types can be converted to a string, but all string data types can not be converted into int/float data types.
These conversions are very helpful in Machine learning or Deep Learning techniques.
Now that we know the basic data types in Python, let's understand the concept of expression and variables.
Expressions in Python are instructions that tell a computer to perform a specific operation. In other words, they are the building blocks of a Python program and are used to specify the computation that the computer should perform. Expressions can include arithmetic operations, logical operations, bitwise operations, etc. The result of an expression is a value that can be used in further computations or stored for later use.
For example, let's look at some basic arithmetic operations.
# Numbers are operands and mathematical symbols are operators
>>> 5 + 4.99
9.99
>>> 12*7
84
## Division of integer data types results in a float value
>>> 12/6
2.0
>>> 10 - 5
5
In Python, operator precedence determines the order in which operations are performed in an expression. This order is based on standard mathematical conventions, such as the PEMDAS rule (Parentheses, Exponents, Multiplication and Division, and Addition and Subtraction).
When evaluating an expression using the PEMDAS rule, operations inside parentheses are done first, followed by exponents, then multiplication and division (from left to right), and finally addition and subtraction (from left to the right). This ensures that mathematical expressions are evaluated in a predictable and consistent manner, just as in mathematics.
In the case of a tie between two operators with the same precedence in an expression, the associativity rule is used to resolve the tie. The associativity rule states that all operators, except for exponentiation (**), follow a left-to-right associativity. This means that the expression will be evaluated from left to right.
For example, the expression (4 + 3) — 3² + 6/2 * 7 can be evaluated as follows, following the order of operations defined by the Python operator precedence:
So the final result of the expression (4 + 3) — 3² + 6/2 * 7 is 19.
The following is the operator precedence table in Python (Increasing precedence from top to bottom).
A variable is a storage container for a data type. For example, in the following code snippet, "temp_variable" is treated as a variable to store the value 6 having int data type. This variable can then be used elsewhere in the program, carrying the value of 6.
temp_variable = 6
Variables are useful because they allow us to change a value in one place and reflect that change throughout the code. For example, if we want to change the value 6 to 7 and we have used the number 6 directly in many places in the code, we would have to change it in all of those places. However, if we have used a variable with a value of 6 and then used that variable in many places, we could change the variable's value once, and it would update in all places where the variable is used. This makes it easier to maintain and update our code.
Key Note: It is considered good practice to use descriptive and meaningful names for variables, especially in Machine Learning and Data Science domains where codebases become large. This makes us easier to understand and track the usage of each variable, making the code more organized and maintainable.
In this introductory blog on Python, we covered the basics of data types. We explored three primary data types (numbers, booleans, and strings), expressions and variables in Python and learned about their usage in Machine Learning and Data Science.
If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!