Introduction to Data Structures
In computer science, data structures is a very important subject even it is more important than your language. Data structures are very important where there is a lot of data to make code efficient and solve complex problems.
- Data Structures = Data + Structures
- Data: Raw facts in the figure.
- What is Data?
- Structure: Organization of data inside computer memory.
- When data become information?
- There is a distinction between data and an information, let's try to understand what is that exectly.
- The difference between Data and Information.
- When I say just a collection of characters like above, then this is just a data. I wo'nt be able to understand it properly. Right? .Could you able to tell me what exactly i have written over here? No Right? . But when I process this data and lets say I reversed this whole string, means last character become the first character, second last character becomes the second character and so on.then it become an information
- Because I can read this and We can understand what is written.Here I have written my name is JASPREET, So It is quite understandable. So we can say that Now here ,data becomes an information because we can extract some meaning from it.
- So if data is arranged in a systematic way then its get a structure and become meaningful here i have arranged data that it become meaningful for me there fore i can say that this meaningful data is called information. Now it is not difficult for us to understand that data needs to be managed in such a way so that it can produce some meaningful information. to provide an appropriate way to structure the data, We need to know about 'Data structures'.
What is Data Structure?
A data structure is a particular way of organizing data(store, retrieve ,update) in a computer memory so that it can be used effectively.
data structure is an organized way to store information on a computer that allows us to process the information quicker, while a proper algorithm provides a finite process to reach a specific output.
data structure uses various algorithms while structuring data as below
- In which programming language should we implement data structures?
- First of all, let me tell you that data structure is language independent, you can implement data structure in any programming language. But the recommendation is that you first start implementing the data structure in C or C++.
- Because it would be easier in C/C++.
- C/C++ are most powerful langauges.
- Programs written in C run faster than programs written in other languages.
- You also get the freedom to do memory management yourself.
- In big A-grade companies, performance-oriented work is mostly done in C/C++. It means we should have good command of C/C++.
Types of data structures
Basically, Data structures can be broadly classified into two categories:
- Linear data structures
- Non-linear data structures
Operations on data structures
- Traversing(पार करना)(to visit data element exactly one time)(Preorder Traversal, Inorder Traversal, Postorder Traversal )
- Merging(Combining a data structure of the same type to form a new data structure of the same type.
Data Organization:A sequential view of the data.
Need of Data Organization
There are a lot of benefits of organizing data.
1. The first and foremost benefit is it decreases the time consumed to search for data.
2. Organizing data also helps in reducing the data loss and reduces errors.
3. Data organization also helps you to understand why the data was collected and what the proper use of it is.
4. Data organization can be of various types, depending on the requirement of the user. Sometimes, the repeated values in the data are collected together to know the mode of the data or sometimes the data is organized in increasing or decreasing order, to find the median of the given set of data.
The data structures can also be classified on the basis of the following characteristics:
What is an Algorithm ?
An algorithm is a finite set of instructions or logic, written in order, to accomplish a certain predefined task. Algorithm is not the complete code or program, it is just the core logic(solution) of a problem, which can be expressed either as an informal high level description as pseudocode or using a flowchart.
informative text written in plain English. It has no syntax like any of the programming language and thus can’t be compiled or interpreted by the computer.
Every Algorithm must satisfy the following properties:
Input- There should be 0 or more inputs supplied externally to the algorithm.
Output- There should be atleast 1 output obtained.
Definiteness- Every step of the algorithm should be clear and well defined.
Finiteness- The algorithm should have a finite number of steps.
Correctness- Every step of the algorithm must generate a correct output.
An algorithm is said to be efficient and fast, if it takes less time to execute and consumes less memory space. The performance of an algorithm is measured on the basis of following properties :
Time Complexity: Amount of total time required by the program to run till its completion.
Time complexity = Compile time + Run time
Compile time is a fixed part
Run time is a variable part
Time Complexity is most commonly estimated by counting the number of elementary steps performed by any algorithm to finish execution.
The time complexity of algorithms is expressed using the big O notation.
O(expression) It represents the worst-case time-complexity(longest running time performed by an algorithm)
Space Complexity: Amount of space or memory taken by an algorithm to run the input.
Its the amount of memory space required by the algorithm, during the course of its execution. Space complexity must be taken seriously for multi-user systems and in situations where limited memory is available.
An algorithm generally requires space for following components :
Instruction Space: Its the space required to store the executable version of the program. This space is fixed, but varies depending upon the number of lines of code in the program.
Data Space: Its the space required to store all the constants and variables(including temporary variables) value.
Environment Space: Its the space required to store the environment information needed to resume the suspended function.
To learn about Space Complexity in detail, jump to the Space Complexity tutorial.
Time Complexity is a way to represent the amount of time required by the program to run till its completion. It's generally a good practice to try to keep the time required minimum, so that our algorithm completes it's execution in the minimum time possible. We will study about Time Complexity in details in later sections.
Type of datastructure
Basic types of Data Structures
1. Primitive Data Structures
Integer, Float, Boolean, Char etc, all are Primitive data structures
Complex Data Structures
2. Non-primitive Data Structures
Linked List, Tree, Graph, Stack, Queue etc.
2.1 Linear Data Structures: data item represent in sequential manner but necessary to store in sequence.
Array, Stack, Queue(collection of similar kind of homogenious data or data item)
Stack is a linear data structure which follows a particular order in which the operations are performed. The order may be LIFO(Last In First Out) or FILO(First In Last Out)
Operations on Stack mainly two(push and pop)
Insertion(push) and Deletion(pop) happen on the same end.
Time Complexities of operations on stack: O(1) Access time.
It means that the execution time of the algorithm does not depend on the size of the input. Its execution time is constant.also called constant time complexity
We do not run any loop in any of these operations.
Queue is also linear data structure which follows a particular order in which the operations are performed. The order is FIFO(First In First Out)
2.2 Non-linear Data Structures
B-tree and Binary tree
A binary tree is used when the records or data is stored in the RAM instead of disk as the accessing speed of RAM is much higher than the disk.
Binary tree may or may not have all of its child nodes on the same level.
A binary tree can have have at most two children( no child node or one child node or two child). Child node in a binary tree on the left is identified as the left child and on the right is identified as the right child.
A node without children is called a leaf node.
binary tree which is a finite set of elements that is either empty or further divided into sub-trees.
One of the most important applications of the Binary tree is in the searching algorithm.
algorithmic technique in which one tries to reduce the search space in half in the hope of finding the answer quickly. It is a divide and conquer approach.
How binary search actually works?
Binary search algorithm assumes the input data to be sorted. It takes following steps to find some key in the input data.
To search a key in the search space i.e. some list of data points, we need to find the mid point in the data and check if data present there. If data found, then then we stop the further iteration. In this case, time complexity will be O(1), best case. Otherwise, we will move to next step.
Find if data will be present in left or right by comparing search key with current data item.
Repeat 1 and 2 until there is match of there is no further points to search.
As we can see from the above steps, binary search algorithm break the break into half in each iteration.
So how many times we need to divide by 2 until with have only one element-
So, in average and worst case, time complexity of binary search algorithm is o(log(n)).
Total BST with n keys
BST(n) = Catalan number Cn = (2n)! / ((n + 1)! * n!)
Catalan numbers are a sequence of positive integers, where the nth term in the sequence, denoted Cn, is found in the following
formula: Cn = (2n)! / ((n + 1)!n!)
For n = 0, 1, 2, 3, … values of Catalan numbers are 1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, …. So are numbers of Binary Search Trees.
Total BT with n keys
BT(n) = BST(n) * n!
A Binary Tree node contains following parts.
- Pointer to left child
- Pointer to right child
Note: Number of unlabeled Binary Tree with n nodes is = number of Binary Search Trees with n nodes.
Types of Binary tree three types
full binary tree
A full binary tree also called proper binary tree or 2-tree is a tree in which all the node other than the leaves has exact two children.
Total no of node = A full binary tree with n leaves, will have (2n - 1) nodes.
complete binary tree
A complete binary tree is a binary tree, which is completely filled at every level(means all levels of the tree are fully filled), with the possible exception of the bottom level, which is filled from left to right.
threaded Binary tree
There are two ways to represent binary trees. These are:
Using Linked lists
B-tree is used when the data is stored in the disk it reduces the access time by reducing the height of the tree and increasing the branches in the node.
B-tree must have all of its child nodes on the same level whereas binary tree does not have such constraint.
B-tree can have more than two children.
Technique for sorting unsorted list/array in ascending or descending order.
Bubble Sort (unsortedlist/array)
sorting technique used to sort array elements in ascending order, start by comparing the first element of the array with the second element if the first element is greater than the second element, it will swap both the elements, and then move on to compare the second and the third element, and so on.
If we have total n elements, then we need to repeat this process for n-1 times.
It is known as bubble sort, because with every complete iteration the largest element in the given array, bubbles up towards the last place , just like a water bubble rises up to the water surface.
Sorting takes place one-by-one step and comparing it with the adjacent element and swapping them if required.
Time Complexity: O(n2)
Space Complexity : O(1)
Heap Sort (unsorted list/array)
sorting technique used to sort the data element both ascending or descending order.
we create Heap data structure from the given array and then utilizing the Heap datastructure to sort the array.
Shape Property:Heap data structure is always a Complete Binary Tree, which means all levels of the tree are fully filled.
- Heap Property:All nodes are eithergreater than or equal to or less than or equal to each of its children. If the parent nodes are greater than their child nodes, heap is called a Max-Heap, and if the parent nodes are smaller than their child nodes, heap is called Min-Heap.
Time Complexity: O(n log n)
Space Complexity : O(1)