Enterprise Data Architecture

Blog article by
Krystian Piecko,
CTO and Founder | DataWalk

Enterprise Data Architecture

DataWalk's A-Shaped Model Slashes Data Movement Barriers

Discover how DataWalk's technology enables enterprises to efficiently extract meaningful insights from large datasets at minimal cost.

See How It Works!
 

In today’s data-driven world, traditional enterprise architectures face inefficiencies that hinder scalability, costs, agility, and performance. Traditional enterprise architecture has long followed a model that can be described as "V-shaped", where most computational tasks happen within a separate service layer. This often leads to significant challenges: computations are not only costly but also complex to scale, making it difficult to keep up with real-time processing demands. 

DataWalk provides a modern architecture that can be described as "A-shaped", where computation is brought closer to the data itself, fundamentally reducing these inefficiencies. This article explores the benefits of this approach, with insights into how DataWalk’s technology allows enterprises to extract meaningful insights from large datasets with efficiency and minimal cost.

 
 
 

Understanding traditional "V-Shaped" Architectures

With traditional architectures, the majority of computational tasks are handled in the service layer, far from where the data resides. The V-shaped architecture model symbolizes this flow, starting from the data layer, moving up to the service layer for processing, and then back down. This setup keeps data and computation separate, providing a cleaner architectural distinction between storage and processing layers.

Figure 1. Illustration of a V-shape architecture

Figure 1. Illustration of a V-shape architecture

In this architecture, computational tasks such as data analysis, business logic, and application processing are performed away from the actual data storage. Because data is moved to the service layer for processing, the system must rely on robust service layers that can handle significant loads without compromising performance. The result is an increased need for scaling both the service and data layers simultaneously as data volumes grow—simply to support data movement—and this can be costly and complex.

The burning question is: why is this V-shaped approach so popular today? 

The answer is not surprising. Because it addresses a significant challenge in data architecture: shaping the data for efficient queries. 

Running computations close to data requires specialized infrastructure, as data must be organized and indexed to be query-friendly regardless of the query language. Creating a system where data is immediately usable without heavy reshaping or reindexing has been technically challenging, pushing architects toward the service-layer-based tradeoff (V-shaped), where data shaping is simpler but computationally expensive.

Some attempts have been made to address this challenge through "schema on read," where data is stored in a raw format and structured only as it’s accessed. While schema on read provides flexibility by allowing data to be shaped based on the query needs at runtime, it doesn’t entirely solve the problem. This approach often requires substantial transformations on the fly, which can be heavy and time-consuming, especially with large datasets or complex queries. The computational load remains high, and latency issues persist, making “schema on read” an imperfect solution for real-time analytics or other high-performance requirements.

Yet, as data volumes grow, this approach becomes more limiting. Moving data up to the service layer introduces latency, which impacts real-time decision-making. Additionally, maintaining multiple layers of scaled infrastructure adds both cost and complexity. Organizations invest heavily in infrastructure and may still struggle with advanced analytics and rapid processing needs.

Enterprises are increasingly seeking ways to bring computation closer to the data layer to address these challenges—but doing so demands an architecture that can shape and index data effectively at the storage level, which has been prohibitively difficult without specialized technology.

 
 
 

The Drawbacks of V-Shaped Architecture

In the V-shaped architecture model, separating data storage and processing layers becomes a significant bottleneck when combining multiple analytical techniques. As data scientists and analysts seek deeper insights, they often require a blend of approaches such as search, graph analytics, online analytical processing (OLAP), and relational algebra. Integrating these diverse analytical techniques can be problematic in a V-shaped architecture.

The core issue lies in the data movement required to process and correlate information across different layers. For example, executing a complex query that combines graph traversal operations with OLAP computations means data must be transferred back and forth between the data and service layers. This not only increases latency but also complicates the architecture, as different analytical engines and tools need to interoperate seamlessly.

Furthermore, V-shaped architectures struggle with the enterprise agility needed for modern data analysis. As businesses increasingly rely on advanced analytics to drive decision-making, the need to quickly integrate new analytical methods becomes critical. However, in a V-shaped framework, integrating new tools or techniques often requires significant re-engineering of the service layer. This can slow down innovation, as enterprises must navigate complex integration challenges whenever they wish to expand their analytical capabilities.

This becomes particularly critical when examining the enterprise architecture services catalog, where enabling more agile, continuous, and self-service delivery is one of the primary objectives of enterprise architecture services. 

The inherent latency in moving data between layers can impede the seamless application of mixed analytical techniques. For instance, combining search algorithms with graph-based analytics and relational algebra in real-time decision scenarios can introduce delays and reduce the effectiveness of analytical operations. This lag can be detrimental in scenarios where timely insights are crucial.

In addition to performance and agility challenges, the V-shaped architecture introduces complications in data governance and security. Since data moves frequently between storage and processing layers, implementing fine-grained permissions and maintaining strict control over data access becomes more complex. Each layer typically requires its own security protocols, leading to duplicated effort and increased administrative overhead. Sensitive information must be secured across multiple touchpoints, increasing the risk of unauthorized access or data leakage as data travels between layers.

 
 
 

Real-world example:

One of our clients deals with extensive communication data, such as phone calls and text messages, intertwined with other relevant information. Their analysts are continuously engaged in exploratory data analysis. During one of these processes, an analyst recognized the potential value in conducting a search of billions of rows of data to identify specific communication patterns among all involved parties. The analyst was unsure of the query's potential value or the quality of the results. However, with DataWalk’s A-shaped architecture, this operation was executed instantly. The analyst could perform it seamlessly while working on the system and iterate through multiple cycles of learning and refining the question in just minutes.

Now, let's consider the same scenario with a V-shaped architecture. The first challenge would be ensuring the data is available in the search facility, which is not always the case. If it isn't, indexing the data could take a considerable amount of time. Once the data is in the search tool, standard search tools are optimized to return only the top results. If the query yields a large result set, extracting it for further analysis is quite daunting. The tool intended to analyze patterns must be ready for this task, requiring a specific input data format. This often necessitates data transformation, which demands skills, time, and resources. After the initial pipeline execution, the analyst might realize that extending the analysis with additional data could improve the results. This would mean starting over.

Analyzing communication patterns often involves graph visualization, applying graph algorithms like clustering and pathfinding, and generating pivots for aggregations. In a V-shaped architecture, each of these tools operates in isolation, complicating the process. For each tool, the data must be transferred, restructured, and processed separately, which introduces multiple layers of complexity and potential for error. This fragmentation means more time spent on data preparation and less on actual analysis.

This is where the A-shaped architecture truly excels. It integrates these capabilities, allowing seamless transitions between different analytical techniques without the need for extensive data movement or transformation. This minimizes the need to start from scratch and, even if you do, it's far more cost-effective than the V-shaped approach. The A-shaped architecture empowers analysts to focus on deriving insights rather than managing cumbersome processes, significantly enhancing efficiency and effectiveness.

 
 
 

This fragmented approach also complicates compliance with regulatory requirements, as tracking and auditing data spread across multiple layers is more challenging. Consequently, V-shaped architectures can struggle to ensure consistent access control and regulatory compliance, adding an additional layer of risk and complexity.

With growing demands for streamlined data access, advanced analytics, and tight data governance, enterprises are looking for architectures that consolidate security and analytical capabilities directly at the data layer, minimizing both data movement and exposure.

 
 
 

Introducing The A-Shaped Enterprise Data Architecture

In contrast to the V-shaped model, DataWalk introduces the A-shaped architecture paradigm that shifts the computational focus closer to the data itself. DataWalk's A-shaped architecture exemplifies this approach by bringing a wide variety of computations directly to the data layer. This approach minimizes data movement and allows for seamless integration of various analytical techniques.

Figure 2. Illustration of an A-shape architecture

Figure 2. Illustration of an A-shape architecture

 

In the DataWalk A-shaped architecture, the computational engine is built into the data layer, enabling direct execution of complex queries, graph traversals, relational algebra, search, and many other techniques without the need to shuttle data between disparate layers. This results in a more cohesive and efficient analytical process. DataWalk's platform supports a broad spectrum of operations, including property graphs and semantic web techniques, within a single, unified database.

The A-shaped model offers inherent scalability by keeping computations local to the data, reducing the need for extensive scaling of the service layer. This not only reduces costs but also simplifies the architecture, making it easier to maintain and evolve. With computations occurring where the data resides, enterprises can swiftly adopt and integrate new analytical techniques without requiring extensive re-engineering. This agility allows organizations to remain competitive in rapidly evolving markets delivering analytical products faster and cheaper.

Additionally, by keeping computations close to the data, A-shaped architectures enhance data governance and security. Since data does not need to be moved across different layers for processing, organizations can maintain tighter control over their data assets with a unified permissions model, reducing the risk of data breaches or compliance issues.

 
 
 

Advantages of DataWalk’s A-Shaped Architecture 

Embracing an A-shaped architecture with DataWalk unlocks numerous advantages for enterprises:

  1. Reduced Data Movement: As pointed out, an A-shaped architecture significantly reduces the need for data movement across layers by executing computations directly within the data layer. This leads to lower latency, faster query processing, and more responsive analytics. With reduced data movement, you can execute analytical pipelines that were previously infeasible due to data transfer limitations, enabling the exploration of more complex data relationships and insights.
  2. Seamless Integration of Techniques: DataWalk's A-shaped architecture seamlessly integrates diverse analytical techniques. Whether it's search, graph analysis, or OLAP, computations are performed close to the data, enabling complex, multi-faceted analysis without the latency issues inherent in V-shaped architectures. Eliminating extra data transformations further streamlines the process, enhancing both efficiency and speed.
    This can enable you to save weeks in the transition from idea to results, allowing analysts to quickly test hypotheses and refine their analyses without lengthy setup times.
  3. Enhanced Agility: With computations centralized in the data layer, organizations can rapidly adopt and deploy new analytical methods. This agility is crucial in today's dynamic business environment, enabling enterprises to stay ahead of the curve by swiftly adapting to emerging trends and technologies.
  4. Deployment Speed: Reduce time-to-deploy for new analytics solutions by up to 10x.
  5. Time-to-Value (TTV): Project outcomes can be achieved in weeks or months rather than years, allowing businesses to quickly realize the benefits of their analytical investments and make informed decisions sooner.
  1. Improved Data Governance and Security: By minimizing data movement, an A-shaped architecture reduces the risk of data breaches and compliance issues. Data remains within a controlled environment, ensuring that governance policies are easier to enforce and sensitive information is better protected.

DataWalk's A-shaped architecture represents a paradigm shift in enterprise architecture. It offers a robust, scalable, and cost-effective solution for modern data analysis. It empowers enterprises to harness the full potential of their data, driving innovation and delivering insights with unprecedented speed and efficiency.

 
 
 

Balancing the Costs of A-Shaped Architecture

The A-shaped architecture offers a powerful solution by moving computations closer to the data, reducing latency, enhancing scalability, and simplifying data governance. However, as with any architectural choice, this approach comes with its own costs. In IT, much like in nature, nothing is truly "free." Before computation can be brought to the data, the data itself must be moved to a location where this shift in computation can take place. This process requires reshaping the data into a format that supports seamless, direct computation. These preparatory steps—data movement, reshaping, and indexing—are foundational to the effectiveness of A-shaped architecture, but they incur resource demands and costs.

DataWalk addresses this part of the equation by offering a solution that integrates the reshaping and preparation of data into its platform. DataWalk brings the data closer to computational resources and efficiently organizes it, enabling direct operations that would be impossible to achieve at a distance. By managing data transformation and indexing internally, DataWalk minimizes external costs and complexities, ensuring that data is prepped and ready for analysis within its unified environment. Although this initial data preparation has an associated cost, DataWalk’s approach is optimized to keep these expenses low and controlled, delivering high-value, low-latency insights that ultimately offset the initial investment in an A-shaped architecture setup.

This comprehensive handling of both computation and data preparation makes DataWalk’s A-shaped architecture uniquely valuable. It ensures that while data movement and reshaping are inevitable, they are executed efficiently and integrated into a single platform, allowing enterprises to gain the benefits of near-data computation without the significant overhead typically associated with such architectural shifts.

 
 
 

Conclusion

By adopting an A-shaped architecture, enterprises can transform their data operations while minimizing latency, cutting costs, and enabling seamless analytical integration. DataWalk offers a powerful solution for organizations seeking to modernize their architecture, bringing computations directly to the data and eliminating the need for costly infrastructure overhead. In an era where data is the backbone of decision-making, DataWalk’s A-shaped architecture model empowers enterprises to derive actionable insights with unprecedented speed and efficiency.

 
 
 

Unlock the Power of DataWalk

See How It Works!
 
Get A Free Demo