Data And Tools

What is business intelligence?

Business intelligence is a technology-driven process for analyzing data and presenting actionable information to help executives and managers to make a business decision. In simple terms, analysis, reporting, budgeting, and presentation of your business data. The goal of utilizing business intelligence for your business is to improve your organizational processes and financial status to manage your business in a better way.

Business intelligence software is a tool that makes it possible to create value from big data. Some examples of business intelligence include data warehouse, data recovery tools, and cloud data service tools.

Data warehouse:

A data warehouse is an integration of data from multiple heterogeneous sources that support analytical reporting and structured or ad hoc queries and decision making.

The process of constructing and using a data warehouse is called data warehousing.

What is ETL?

ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool extracts data from different Relational Database Management System source systems and then transforms the data like applying calculations, concatenations, etc and then load the data into the data warehouse system.

What are OLAP and OLTP?

OLAP-Online Analytical Processing Servers are based on a multidimensional data model. It allows managers to get an insight into information through fast, consistent and interactive access to information.

OLAP Operations:

  • Roll up
  • Drill down
  • Slice and Dice
  • Pivot

OLTP-Online Transactional Processing is an online database modifying system that is used for order entry, retail sales, and financial transactions.

OLTP Operations:

  • Insert
  • Delete
  • Update

OLAP vs OLTP:

Parameters OLAP OLTP
Process OLAP is an online analysis and data retrieving process It is an online transactional system
Characteristic A large volume of data Large numbers of short online transactions.
Functionality online database query management system. online database modifying system.
Operations Only read and rarely write. Allow read/write operations.
Method OLAP uses the data warehouse. OLTP uses traditional DBMS.
Query Mostly select operations Insert, Update, and Delete information from the database

Facts and Dimension:

The Fact table mainly consists of business facts and foreign keys that refer to primary keys in the dimension tables. A Dimension table is a table in a star schema of a data warehouse. Dimension tables are used to describe dimensions. They contain dimension keys, values and attributes.

Layers of Data Warehouse Architecture:

  • Bottom Tier – Data warehouse database server also called a relational database server that uses the back end tools and utilities to fetch data into the bottom tier. These back end tools and utilities perform the Extract, clean, load and refresh functions.
  • Middle Tier – Contains OLAP server that can be implemented either by ROLAP which is an extended relational database management system. It maps the operation on multidimensional data to standard relational operations or by the Multidimensional OLAP model, which directly implements the multidimensional data and operations.
  • Top tier – Front end client layer holding query tools and reporting tools, analysis tools and data mining tools.

Big data tools and it’s used:

Hadoop -allows the distributed processing of large data sets across clusters of computers. It is designed to scale up from single servers to thousands of machines.

Uses:

  1.      Flexibility in the data processing
  2.      Allows for faster data processing

HPCC – It delivers on a single platform, a single architecture and a single programming language for data processing.

Uses:

  1.      High redundancy and availability
  2.      Provide enhance scalability and performance
  3.      Automatically optimizes code for parallel processing

Cassandra – Provide effective management of large amounts of data.

Uses:

  1.      Great liner scalability
  2.      High fault tolerance
  3.      Built-in high availability

Storm – Real-time framework for data stream processing, which supports any programming language

Uses:

  1.      Great horizontal scalability.
  2.      Built-in fault tolerance.
  3.      Auto restart on crashes.

MangoDB – An open source NoSQL database with rich features, which is cross-platform compatible with many programming languages.

Uses:

  1.      Stores any type of data.
  2.      Data partitioning across multiple nodes and data centres.

Understanding structured text data:

The term structured data refers to data that has defined length and format for big data. For example, structured data includes numbers, dates, and a group of words and numbers called strings. Structured data usually resides in a relational database. This format is eminently searchable both with human-generated queries and via algorithms using a type of data and field names, such as alphabetical or numeric, currency or date.

Understanding unstructured text data:

Unstructured data has an internal structure but is not structured via pre-defined data models or schema. It may be textual or non-textual, and human- or machine-generated. It may also be stored within a non-relational database like NoSQL.For example, social media, websites, mobile data are typical human-generated unstructured data.