What actually is index in DBMS,in what cases its good to create index and when not?
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
Most commonly index is implemented using B+ tree.
So next time you think of index,think of its just a B+ tree.
Helps you in searching .
Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done.
Indexing is defined based on its indexing attributes. Indexing can be of the following types −
Primary Index − Primary index is defined on an ordered data file. The data file is ordered on a key field. The key field is generally the primary key of the relation.
Secondary Index − Secondary index may be generated from a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values.
Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered on a non-key field.
Now let me tell intuitively, what is Indexing.
Indexing is the data structure which stores our data for fast retrieval. Also if there are more indexes insert and update(basically write) operation becomes slow.
Because apart from indexing, additional time is required to create indexing of that row(aka inserting new row in the index)
Columns - candidateID(PK),email,Name,Age
Primary Key is by default indexed
Suppose I have created an index on column “email”, because I know, queries have to be selected on basis of email. So whenever I write query like
select * from candidates where email="firstname.lastname@example.org";
Mysql will not scan the rows and comparing the emails.
Instead it will directly go the location where details about the candidate having email "email@example.com" are stored.
And thus it saves time of accessing the rows
How these details are stored and how it helps us in fast retrieving?
It can be stored HashMap or B-Tree, in which B-Tree is for clustered indexing(sorting) for the operations like ORDER BY.
INDEXING is way to optimize performance of a database by minimizing the number of disk accesses required when a query is processed.
An index or database index is data structure which is used to quickly locate and access the data in a database table.
Indexes are created using some database coloumns.
- The first coloumn is the search key that contains a copy of the primary key or
candidate key of the table .these values are stored in sorted order so that the corresponding data can be accessed quickly(note that the data may or may not be
stored in sorted order).
- The second coloumn is the Data Reference which contains a set of pointers holding the address of disc block where that particular key value can be found.
There are two kinds of indices:
- ORDERED INDICES : Indices are based on a sorted ordering of the values
- HASH INDICES: indices are based on the values being disturbed uniformly across a range of buckets.the bucket to which a value is assigned is determined by the function called a hash function.
There is no comparision between both techniques,it depends on the database application on which it is being applied.
- ACCESS TYPES: e.g value based search,range,accesses,etc.
- ACCESS TIME:time to find particular data element or set of elements.
- INSERTION TIME:time taken to find the appropriate space and insert a new data
- DELETION TIME:time taken to find an item and delete it as well as update the index structure.
- SPACE OVERHEAD:additional space required by the index.
4)non clustered indexing
CASES TO CREATE AN INDEX:
Most database software includes indexing technology that enables sublinear time lookup to improve performance,as linear search is insufficient for large databases.
Suppose a database contains N data items and one must be retrieved based on the value of one of the fields .Asimple implementation retrieves and examines each item according to the test.if there is only one matching item ,this can stop when it finds the single item,but if there are multiple matches,it must test everything.this means that the number of operations in the worst case is 0(N) or linear time.since databases may contain many objects and since lookup is a common operation,it is often desirable to improve performance.
POLICING THE DATABASE CONSTRAINTS
indexes are good to create if we take the case of database constraints .indexes are used to create police database constraints,such as UNIQUE,EXCLUSION,PRIMARY KEY and FORIEGN KEY.an index may be declared as UNIQUE,which create an implicit constraint on the underlying table.databases systems usually implicitly create an index on a set of coloumns declared PRIMARY KEY,and some are capable of using an already existing index to police this constraint.many database system requires that both referencing and referenced sets of coloumns in a FORIEGN KEY constraint are indexed ,thus improving performance of inserts,updates and deletes to the tables participating in the constraint.
CASES NOT TO CREATE AN INDEX:
- If the indexed coloumn is never searched on and the table is heavily updated you dont the benifit of performance that indeces are for.in contrary you might suffer performance hit.
- one circumstance under which an index is pretty much unconditionally bad is if there is another index which uses the same coloumns (in the same order) as a prefix:
CREATE INDEX ix_good ON some table(col1, col2, col3);
CREATE INDEX ix_bad ON some table(col1, col2);
the bad index is a waste of disk space and slows down modify operations to no benifit.
- if the index coloumn is not used for searches there is no point in defining it.
if the values in that coloumn keep changing very frequently it will be extra work for database server(for re-indexing)
if there are too many inserts and deletes from table it will be extra work for server.