Data Fundamentals Part 2: Structured Data and How It’s Stored

Welcome back to part 2 of this data fundamentals series. You can find part 1 here.

When I first started working with data, I remember being surprised at how dirty most data was. You might have learned about how organized everything is supposed to be. In reality, a lot of the data that powers reports, apps, and dashboards does not neatly in databases, waiting to be queried and modeled. Instead, spreadsheets and mysterious systems are spitting out numbers, and things might seem much more chaotic.

But hey! This is where data engineers come in. Understanding how to structure our data in a better way helps you with everything you’ll do as a data analyst, engineer, or BI developer. We need to understand how data is structured, stored, and linked together. And why we store data in different ways, depending on its use case.


Structured vs. Unstructured Data

Before diving into databases, let’s clear up one of the first big distinctions you’ll encounter: structured vs. unstructured data.

  • Structured data lives in these kinds of tables — think Excel sheets or SQL databases. It has rows and columns, with consistent formats. Examples: customer lists, sales transactions, inventory tables.
  • Unstructured data is messier: emails, PDFs, images, social media posts.. It doesn’t fit neatly into rows and columns, but it often contains valuable insights if you know how to extract them.

Most BI tools (Power BI, Tableau, Qlik) are optimized for structured data. Knowing the difference helps you choose the right approach for cleaning and analyzing your data. Let’s go a bit more into details.

Structured Data

Structured data is organized in rows and columns — like a spreadsheet.
Each column has a defined type (text, number, date), and every row represents a single observation or record.

Examples include:

  • Sales transactions
  • Customer profiles
  • Inventory lists
  • Financial records

Structured data is predictable, consistent, and easy to query with SQL. This is the data you’ll model into facts and dimensions, clean with transformations, and visualize in dashboards.

Unstructured Data

Unstructured data, on the other hand, doesn’t fit into neat rows and columns. It’s messy — flexible, but hard to standardise.
Examples include:

  • Emails or chat messages
  • Images and videos
  • Log files or JSON events
  • PDFs or text documents

These data types require different tools (like data lakes, search indexes, or machine learning models) to make sense of.

💡 In practice, companies often work with both. For example, structured customer data from a CRM might be enriched with unstructured support ticket text from an email system — combining them gives richer insights.


Databases: Where Structured Data Lives

Most of your structured data will live in something called a database — a digital system for storing, organizing, and retrieving information efficiently.

Think of a database as a highly organized digital filing cabinet. Inside it are tables, which store data about specific subjects, like Customers, Orders, or Products.

Tables, Rows, and Columns

Inside each database:

  • Tables → like Excel sheets, they hold collections of data.
  • Rows → each row is a single record (one transaction, one product, one customer).
  • Columns → each column describes a property of that record (name, price, date).

Example:

Customers
CustomerIDNameCountry
1AliceDenmark
2BobGermany
Orders
OrderIDCustomerIDDateAmount
100112023-01-0159.99
100222023-01-03120.50

Notice that CustomerID appears in both tables.

  • In Customers, it’s a primary key — a unique ID for each customer.
  • In Orders, it’s a foreign key — linking back to the customer who placed the order.

This relationship allows us to answer questions like:

  • “Which customer placed each order?”
  • “How many orders came from Germany?”
  • “What’s the total revenue by country?”

Relational databases (like PostgreSQL, SQL Server, or MySQL) are built around these relationships — which is why we call them relational.


Keys & Relationships

Keys and relationships are one of the most important elements of data fundamentals out there.

  • primary key uniquely identifies each row in a table. This could for example be a user id, which is supposed to be unique.
  • foreign key links one table to another, creating relationships. Let’s say we have an orders table, which includes a column called userId containing a foreign key to the users table. This is used to link the tables together.

Relationships are the backbone of your data model. Without them, your reports won’t know how to connect the dots.

Once you understand these building blocks, everything else — from SQL joins to Power BI models — starts to click into place.


Schemas: How Tables Relate

As datasets grow, it’s not enough to just have tables — they need structure. That’s where schemas come in.

A schema defines how tables connect and what each one represents.
For example:

  • DimCustomer → stores descriptive info about customers (name, region, segment).
  • DimProduct → stores details about products (name, category, price).
  • FactSales → stores actual transactions (sales amount, quantity, date).

Together, these form a star schema, one of the most common and efficient ways to model data for analytics.

💡 At this stage, you don’t need to master all schema types — just know that good structure is what turns raw data into something reliable, fast, and reusable.


LTP vs OLAP

Historically, there are two type of data models: OLTP and OLAP. Which one is best depends on your use case. Usually, most of the data you consume comes in OLTP form. Let’s discuss the difference:

  • OLTP (Online Transaction Processing) systems are built for speed and reliability in day-to-day operations. Think of a bank ATM, a webshop checkout, or a CRM system — they record individual transactions. This data is heavily normalised. We will explain what this means later, but basically the data is optimised for INSERT statements, and not so much for analysis work and complex SELECT statements.
  • OLAP (Online Analytical Processing) systems are designed for analysis. They aggregate and summarize data so you can answer questions like “What were our monthly sales by region?”. This data is more denormalized meaning that data from different OLTP tables is joined so that analysts do not have to waste resources on running many joins.

OLTP is for doing business. OLAP is for understanding it. BI tools like Power BI sit firmly in the OLAP world.

Data Fundamentals 101: OLTP vs OLAP

In traditional database design, normalization is the process of organizing data to reduce redundancy and improve integrity. It breaks data into smaller related tables — for example, separating customer details from orders.

While normalization is great for transactional systems (OLTP), analytical systems (OLAP) often benefit from denormalized structures like star schemas. We’ll explore that trade-off in more detail later in the series.


Transactional vs Dimensional Data

  • Transactional data is raw and granular — every click, purchase, or login. We often call this a fact table, e.g. data that we measure.
  • Dimensional data organizes those events into a model that’s easier to analyse. A star schema is a common structure, with a central fact table (e.g., sales) surrounded by dimension tables (e.g., product, customer, date).

Dimensional modeling helps you build scalable, reliable reports. It’s the difference between chaos and clarity. We will explore these concepts much more in another part.


Granularity

Granularity is the “zoom level” of your data:

  • High granularity = detailed (e.g., every transaction).
  • Low granularity = summarized (e.g., total sales per month).

Choosing the right granularity affects performance, accuracy, and how your reports behave over time.


Why Structure Matters

Well-modeled, structured data:

  • Makes reporting consistent across teams.
  • Ensures transformations (ETL/ELT) are traceable.
  • Reduces errors when joining or aggregating data.
  • Lays the foundation for advanced analytics later.

When I first worked on a production data model, I realized how crucial this was. A single missing key or inconsistent type could cascade into dozens of broken dashboards. The beauty of structured data is that it keeps the chaos in check.


🛠️ Exercise: Map a Real Dataset

Take a dataset you know — for example, an e-commerce site or your company’s sales data.

  1. Identify what could be dimensions (people, products, places).
  2. Identify what could be facts (transactions, clicks, or events).
  3. Sketch how they might connect through keys.

This exercise helps you think like a data engineer — not just looking at numbers, but at how data relates.


What’s next?

This was the warm-up. You now know the landscape: roles, data flows, and key concepts.

In the next article, you’ll load your first dataset (Superstore), explore it in Power BI, and start recognizing which columns are facts and which are dimensions. If you want to follow along hands-on, make sure you’ve got Power BI Desktop installed.

As we move forward, you’ll gradually step into the shoes of a data engineer — learning how data gets extracted, cleaned, stored, and modeled for analytics.


Like my articles?

You are welcome to comment on this post, or share my post with friends.I would be even more grateful if you support me by buying me a cup of coffee:

Buy me a coffee
Buy me a coffee

Newsletter Updates

Enter your email address below and subscribe to our newsletter

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *