Does your database deliver? How specialization pushes the boundaries of high-performance data processing
Jun 13, 2024 • 9 minWhen pushing the limits of performance, specialization is a must. It’s not just about having the right tool for the job; it’s about refining its design for optimum performance under specific circumstances.
When it comes to data processing in fast-paced, complex environments like supply chain and retail planning, a specialized embedded database ensures the speed, accuracy, and agility required to stay competitive.
There are three characteristics of this embedded database:
- It leverages in-memory computing for faster analytics.
- It employs in-database processing to minimize time-consuming data transfers.
- It integrates with other specialized data processing techniques for a balanced response to different business needs.
By applying these three tactics to everyday planning challenges, companies get the most out of their data, gaining insights that help them plan accurately with longer horizons, automate processes for greater efficiency and adaptability, and make more profitable, data-driven decisions.
The challenge: What’s slowing down data processing?
A surprising fact when dealing with data-intensive applications, such as retail planning and analytics software, is that the actual computations seldom cause the performance bottleneck.
Rather, it’s the frequent need to move lots of data from one place to another. Performance is often determined by how quickly data can be found in the data storage and fetched from there to the central processing unit (CPU). It also depends on how efficiently calculation results can be transferred from the underlying database to the application layer. Smart optimization in these areas allows for performance improvements of several orders of magnitude.
Solution #1: Use columnar databases and in-memory computing to accelerate data processing
Considering that fetching data from disk can be a thousand times slower than fetching data from memory, it is no surprise that in-memory computing (IMC) has become the norm for data-intensive applications, especially when on-demand analytics are involved.
Many traditional systems use row-based approaches to data storage, but for high-speed data analysis, columnar databases are best suited to support in-memory data processing. There are two main reasons for this:
- In-memory data processing of large amounts of data requires efficient compressions. Column-store databases typically contain long sequences of similar or even repetitive data that allow data to be compressed far more efficiently than when using row-store databases. For example, a column consisting of retail sales data would typically contain a lot of small integer values and repetitive zeros.
- Accessing data in memory significantly reduces latency when compared to accessing data on disk. As sequential memory access is significantly quicker than fetching data from multiple locations in physical memory, columnar databases deliver superior performance when a lot of data stored in a limited number of columns needs to be processed together. In retail planning and analytics, this is very common. For example, several years of cleansed sales and weather data per store, product, and day are needed to predict the impact of forecasted weather conditions on sales. Columnar databases make it far easier and faster to analyze this data all at once.
Here’s a closer look at how columnar and row-based databases operate.
Data compression is a science of its own, and we will not go into details here. However, a nice illustration of the impact of seemingly innocuous technical choices is that traditional indexing approaches for speedier location of data can result in index tables that are a thousand times larger than the actual compressed database. This obstacle can be overcome by smart engineering, such using sparse indices for sorted data.
Solution #2: Let the database do the heavy lifting
Columnar databases, effective compression, and in-memory data processing complement other instances of specialization. For instance, the database engine and the general integration between the application and database layers maximize performance by minimizing costly data transfers through in-database processing and making the remaining data transfers as efficient as possible.
- In-database processing means executing data-intensive computations in the database, close to the raw data, to minimize expensive data movement. When data is highly compressed, it is especially advantageous if computations are done not only in the database, but also without ever storing uncompressed data, even temporarily.
A principal tool for data-intensive computations is the database engine. When the database engine is designed to understand the most important and frequent data structures specific to a given domain, data-heavy calculations involving these concepts can be performed in the database, returning only the results of the calculations to the application. This allows for much higher performance than if large amounts of raw data were to be transferred to the application for processing.
This type of database engine understands ubiquitous retail planning and analytics concepts that are typically difficult for out-of-the-box database products to handle, such as promotions, the relationships between supply chain tiers, and reference and replacement products. - An embedded database design maximizes the throughput between the application and the database. No process eliminates data transfers, but as the application and the embedded database are very tightly integrated and use the same physical memory space, there is no overhead caused by de/serialization (i.e., the translation of data into and from the formats required for storage or transfer).
Sending computations to data (rather than data to computations) and employing embedded rather than external databases maximizes performance in data-intensive applications, especially when on-demand analytics are required. These principles hold true regardless of whether we are considering a single-server or a distributed data processing architecture.
Solution #3: Combine processing tactics for a more comprehensive approach
In retail planning and analytics, the biggest performance requirements come from data-intensive queries used, for example, when comparing historical promotion uplifts for different stores, promotion types, and product categories. Therefore, in-memory data processing enabled by columnar, compressed databases is key to great performance. However, data processing cannot be optimized solely from the point of view of analytics. Data inserts and updates also need to be handled in an efficient manner.
Two primary ways to make data updates efficient without hampering the analytics performance of a columnar, compressed database are:
- Bundling data updates into batches to minimize the need to compress the same data blocks repeatedly. Batch processing of data is often understood to mean nightly update runs, which is of course a good option when feasible, but batch processing can also be done in near real-time. Updating a database every minute or even several times a minute with the latest business transactions massively improves performance compared to processing each transaction individually.
- Using both row-based and column-based data structures to make it possible to take advantage of their respective strengths. New or updated data can first be stored in row-based secondary data structures, enabling quick updates and inserts. This data is immediately available for queries, even though the primary database structures are updated and recompressed in batches, such as when a batch of business transaction data has been processed.
Highly compressed columnar databases are not the silver bullet that solves all our data processing challenges. They are best used in tandem with other specialized data processing approaches designed to maximize processing performance in other ways, allowing you to have a truly in-depth understanding of what matters the most in your use cases.
Real-world applications: How high-performance data processing delivers maximum business value
Much thought, design, and experimentation are needed to deliver the combination of fast on-demand analytics, state-of-the-art calculations, and extreme reliability in a retail setting with thousands of stores, each carrying thousands of products that constantly generate new data.
RELEX Solutions is one of very few solution providers that has developed its own database and database engine. Why is that?
We aim to equip our customers with the best in retail and supply chain planning. Remarkable data processing performance and specialized technology are critical to achieving this goal. Our exceptional computational performance allows us to do things that have previously been very difficult or even impossible to do.
Let’s illustrate this with some examples.
Expanding planning horizon visibility
At one of our biggest customers, we calculate day-level demand forecasts and replenishment order proposals 280 days in advance for 84 million SKUs in 55 minutes.
Large retailers require this level of data processing speed to update their replenishment plans daily and integrate the latest transaction and planning data, as well as external data such as weather forecasts.
Data processing power also further refines calculations based on updated forecasts. To support proactive planning, we enable our customers to routinely map all future states of their supply chain on the SKU-store/warehouse-day level to understand factors such as:
- What inventory will be located where.
- What orders will be placed and delivered.
- What volumes will need to be picked and transported.
Our planning solution calculates this every day, several months or even over a year ahead, while other solutions struggle to get this kind of visibility for the upcoming week. The improved visibility helps planners match inventory to demand far more accurately, improving sales, inventory management, and customer satisfaction.
Increasingly sophisticated automation for greater efficiency and adaptability
Having full control of the database engine has enabled us to develop our own query language tailored to retail planning and analytics. This allows our customers’ planning experts the flexibility to easily develop their own queries (including custom metrics) to meet all their analytics needs without struggling with clunky SQL.
Furthermore, alongside our custom query language, we’ve developed a graphical user interface that enables our customers to create automated queries that can be run regularly or in response to an exception. We call these automated queries Business Rules, but they can also be considered built-in Robotic Process Automation (RPA).
Our customers typically use the Business Rules Engine to:
- Let the system make autonomous decisions in well-defined circumstances.
- Automatically optimize operations.
- Automate exception management to support business priorities.
- Create automatic routines to manage their master data.
For example, in the case of autonomous prioritization, customers can employ virtual ring-fencing to ensure product availability when the same stock pool supplies both physical stores and online channels. They can also use autonomous issue resolution when, for instance, the system detects that future store orders will exceed projected available inventory. In this scenario, customers can set automated scarcity allocations that protect defined business targets (e.g., maximizing total sales or prioritizing product availability in stores with narrow assortments).
Large retailers can easily have tens or even hundreds of business rules to control different product categories, store types, suppliers, exceptions, etc. Business rules are configured without any programming, enabling our customers to fully control development without being dependent on IT resources, budgets, or RELEX itself as the solution provider.
Moreover, because configurations are separate from the software applications, they do not complicate updates of the underlying software. Even as software versions evolve, customers’ specific configurations remain unchanged, resulting in smoother transitions during upgrades and ensuring business continuity.
Realistic scenario planning for profitable, data-driven decisions
Our digital twin technology supports several parallel working copies, making ultra-granular scenario planning easier than ever before.
When needed, planners can:
- Move instantly into a complete copy – a digital twin – of their current production environment (including all data, parameters, customized business rules, etc.).
- Make any changes they wish to parameters such as delivery schedules, demand forecasts, or the timing of specific deliveries.
- Review the resulting impact on, for example, capacity requirements in any part of the supply chain on any level of detail. Users can run multiple scenarios, compare them to each other and to the current state, and apply any changes they like to the production environment.
This is an essential part of Retail Sales & Operations Planning, where detailed analysis of impacts across the network is an inescapable requirement with far-reaching implications. For instance, when preparing for peak seasons, large food retailers can save millions each year simply by managing the Christmas season more accurately.
Creating a scalable, tomorrow-proof data ecosystem
An embedded database delivers high-speed data processing thanks to its columnar structure, in-memory computing, and in-database processing capabilities. This level of specialization helps companies power the fast, data-driven decisions that daily operations require.
But it doesn’t end there.
We continue to develop sophisticated and automated planning processes that support supply chain, merchandise, and workforce planning in retail.
Remember when we talked about the embedded database’s complementary relationship with other data management strategies?
Our data platform expands on that exponentially, weaving the RELEX embedded database into a network of best-in-class data management capabilities. This platform delivers rapid transactional processing, powerful analytics, and scalable data storage and management to create an adaptable, future-proof platform.