Modern Data Architecture [Part 1]

Agenda

  1. Overview Data Concept
  2. Data Lake Concept
  3. Traditional vs Modern
  4. Reference Modern Data Architecture
  5. Azure Data Reference Architecture
  6. Conclusion

Overview Data Concept

Nowadays, people know that data is the most important asset for all company who want to compete on the age of industry 4.0. So that reason why solutions to handle data from business applications are always at the forefront after digital transformation.

In term of technical view, handle data is not easy. Data come from a lot of data sources with a lot of formats, so before doing something with it, implement a data repository to store all thing coming is the very first step. This article will focus on explain how to store a large amount of data from many data source, describe the modern data architecture and how to build it on Azure Cloud.

Data sources

Data Lake Concept

A data lake is a centralized repository that allows developer to store all your structured and unstructured data at any scale. With Data Lake, developer can store their data without having to first structure the data, and run different types of analytics from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.

Build a Data Lake from scratch have never been a reasonable solution, because there need lots of storage and compute on ready. But nowadays, developer don’t need to do that, many cloud platform provided Data Lake as a service, we just need to launch and use. All we need to do is design a Data Lake architecture and data store structure leverage reference architecture to use for our business.

Clouds provide Data Lake as a service

Traditional vs Modern

Solution to design a data repository was changed from the past to now. By the time, data sources are changed about size increase, data structure, protocol, periodic loading time, that make data solution have to change also. Now basically there are 2 main kinds of data handling. Let’s take a look on differences.

Schema on WriteSchema on Read
When data coming, transform data as pre-defined structure and store itWhen data coming, just store first for any kind, any size, when need to read data, define a structure and read following it
Fast ResultsSlower Results
StructuredUnstructured
SQLNoSQL

Base on 2 solutions about, we have 2 data store solutions, traditional and modern as below.

CharacteristicsData WarehouseData Lake
DataRelational from transactional systems, operational databases, and line of business applicationsNon-relational and relational from IoT devices, websites, mobile apps, social media, and corporate applications
SchemaDesigned prior to the DW implementation (schema-on-write)Written at the time of analysis (schema-on-read)
Price/PerformanceFastest query results using higher cost storageQuery results getting faster using low-cost storage
Data QualityHighly curated data that serves as the central version of the truthAny data that may or may not be curated (ie. raw data)
UsersBusiness analystsData scientists, Data developers, and Business analysts (using curated data)
AnalyticsBatch reporting, BI and visualizationsMachine Learning, Predictive analytics, data discovery and profiling

In this article, I don’t focus on point out which is better than the other. Besides that, please read next part and let’s see how it can be used together.

Reference Modern Data Architecture

Below diagram is overall modern data reference architecture, but not every business needs a data platform with many components like this. Depend on each business, a data architect should design the appropriate data platform following this reference.

Basically, there are 4 main stage:

  1. Ingestion from data sources to raw data storage.
  2. Transform data from raw data storage to structure data storage.
  3. Doing analytics.
  4. Data visualization.
Reference modern data architecture

The data structure inside Data Lake still have best practices. Operate data with good structure will make more cost reduce, easy and flexible for data retrieving. The following diagram is one of some way to operate data inside Data Lake.

Reference Data Lake Structure

Azure Data Reference Architecture

Diagram below describe how to implement data platform on Azure Cloud, using Azure services. Basically, all services is PaaS/CaaS/SaaS to leverage the power of cloud, developer just put their mind in design architecture and implement business logic, let Azure take a responsibility to handle high availability and disaster recovery of platform. For more detail about this reference architecture and purpose of each service used, please refer this official link.

Modern analytics architecture with Azure Databricks - Azure Solution Ideas  | Microsoft Docs
Azure Reference Data Architecture

Conclusion

Not only Azure but also AWS, GCP or others can provide to you appropriate services to build a modern data architecture. This article aim to help you understand more about difference between traditional and modern data store, how to design modern data architecture following reference, how to operate data inside Data Lake. Hope it will helpful for people in this field.

2
0

Leave a Reply

Your email address will not be published. Required fields are marked *