Session 4 Case Study Retail Case 6a346

Retail Sales

Overview          

Four-step dimensional design process Transaction-level fact tables Additive and non-additive facts Sample dimension table attributes Causal dimensions Degenerate dimensions Extending an existing dimension model Snowflaking dimension attributes Avoiding the “too many dimensions” trap Surrogate keys

1

Four-Step Dimensional Design Process 1.

Select the business process to model.  

2.

not business department or function E.g., purchasing, ordering, shipping, invoicing, inventorying

Declare the grain of the business process.  

Specifies individual fact table row E.g., individual line item on sales ticket, daily snapshot of the inventory levels for a product

Four-Step Dimensional Design Process 3.

Choose the dimensions that apply for each fact table row.  

4.

Identify the numeric (measured) facts that will populate each fact table row.   



Q: How do business people describe the data that results from the business process? E.g., date, product, store, customer, transaction type Q: What are we measuring? Typical facts are numeric additive figures E.g., quantity ordered, dollar cost amount

In making decisions regarding the 4 steps, consider both the requirements as well as the realities of the source data

2

Retail Case Study  

Large grocery chain: 100 grocery stores over 10regions Each store:    



Departments: grocery, frozen foods, dairy, meat, produce, bakery, floral, health/beauty aids, etc. 60,000 products (SKUs = stock keeping units) on shelves 55,000 SKUs with UPCs 5,000 SKUs without UPCs but with assigned SKU numbers

Data is collected:  

from cash s into a point-of-sale (POS) system at back door where vendors make deliveries

Retail Case Study – Cont’d 

Management concerns     

Logistics of ordering, stocking, and selling products Maximizing profit Product pricing Lowering cost of acquisition and overhead Use of promotions to increase sales  temporary price reductions  newspaper ads  grocery store displays  coupons

3

Step 1. Select the Business Process 



Decide what business process to model, by combining an understanding of the business requirements with an understanding of data realities. The first dimensional model built should be the one   

 

with the most impact, that answers the most pressing business questions, is readily accessible for data extraction.

In retail case study: POS retail sales Business Question: What products are selling in which stores on what days and under what promotional conditions?

Step 2. Declare the Grain 



What level of data detail should be made available in the dimensional model? Choose the most atomic information captured by the business process. 



Atomic data  Most detailed, cannot be subdivided  Facilitates ad hoc, unexpected usage and ability to drill down to details

Case study grain: individual line item on a POS transaction

4

Step 3. Choose the Dimensions   



A careful grain statement determines the primary dimensions. It is then usually possible to add additional dimensions. If an additional desired dimension violates the grain by causing additional fact rows to be generated, then the grain statement must be revised to accommodate this dimension. Case study dimensions: date, product, store, promotion

Preliminary Retail Sales Schema 

POS Sales Transaction Fact      



Product Dimension  





Promotion Key (PK) Promotion attributes TBD

Date Dimension  



Product Key (PK) Product attributes TBD

Promotion Dimension 



Date Key (FK) Product Key (FK) Store Key (FK) Promotion Key (FK) POS Transaction Number Other facts TBD

Date Key (PK) Date attributes TBD

Store Dimension  

Store Key (PK) Store attributes TBD

5

Step 4. Identify the Facts  

Picking the business measurements for the fact table: true to the grain. Case study - Facts collected by POS system:  

Sales quantity, sales price/unit, sales $ amount, standard cost $ amount Gross Profit = cost – sales 



Recommendation: Include in fact table even though it can be calculated. Eliminates the possibility of error.

For non-additive measurements such as percentages and ratios (e.g., gross margin) store the numerator (gross profit) and denominator ($ revenue) in the fact table. The ratio can be calculated in a data access tool for any slice of the fact table. Caution: Calculate the ratio of the sums, not the sum of the ratios

Date Dimension  

Ubiquitous in every data mart Use verbose, self-explanatory values rather than coded values. They are used as column headers in reports. By decoding in the database, we ensure consistency across different application environments. 

 



E.g., Holiday Indicator – use values: Holiday, Nonholiday; as opposed to Y/N

Date Key should be an integer rather than a date data type Data warehouses need an explicit date dimension table to describe fiscal periods, seasons, holidays, weekends, and other calendar calculations that are not ed by the SQL date function. If transaction time is of interest, we may need a separate Time Dimension table

6

Product Dimension     

Describes every SKU in the store Fill this dimension with as many descriptive attributes as possible. “Robust dimension attributes deliver robust analytic slicing and dicing capabilities.” Hierarchies = groups of attributes Merchandise hierarchy  



SKUs roll up to brands to categories to departments. Each is a many-to-one relationship

Although there will be redundancy, no need to normalize. Given the relative size of the dimension (as compared to the fact table) space saving is minimal.

Store Dimension 



The store dimension: Store Key (PK), Store Name, Store Number (Natural Key), Store Address, … Possible to represent multiple hierarchies in a dimension table 



Store to any geographic attribute (e.g., ZIP, county, state) Store to store district to region

7

Promotion Dimension  



Describes the promotion conditions under which a product is sold Called a “causal dimension” – describes factors thought to cause a change in product sales (price reductions, ads, displays, coupons) Could keep all 4 causal mechanisms in a single dimension  



… or split into 4 separate dimensions  



They are highly correlated, so not much difference in space requirements More efficient browsing for finding out how various promotions are used together May be more understandable to business istration may be more straightforward

To avoid null keys in the fact table (violation of referential integrity), for line items not being promoted include a row in the promotion dimension to indicate “No Promotion in Effect”

Factless Fact Table   

Q: Which products were under promotion but did not sell? Cannot answer yet. POS sales fact table has only products that were sold Answer: Create Promotion Coverage Factless Fact Table  



Factless Fact Table = has no measurement metrics Contains date, product, store, and promotion keys

Two-step process to answer Q:   

Query Promotion Coverage table: products under promotion on given date From POS Sales Fact table: products sold Answer is the set difference of above

8

Degenerate Dimension (DD) 

  



Dimension keys used in fact table without corresponding dimension tables In case study: POS Transaction # Still useful for grouping by transaction Common DDs: order numbers, invoice numbers Fact table primary key: Product Key and POS Transaction Number

Retail Schema Extensibility 





Original schema extends gracefully because POS transaction data was modeled at its most granular level. Premature aggregation limits ability to extend if new dimensions do not apply to higher grain Case study new dimensions:   

Frequent Shopper Clerk Time of Day

9

Schema Extensibility 

Dimensional models can handle extensions without invalidating existing applications: New dimension attributes – simply add columns to dimension table. If new attribute is only available after point in time, populate old dimension records with something like “Not Available” New dimensions – add foreign field keys to fact table New measured facts – add to fact table. If not at the same grain, then need separate fact table Dimension becoming more granular – create new dimension. May imply more granular fact table, in which case, may have to rebuild the fact table. Addition of a completely new data source involving existing and new dimensions – usually needs new fact table



  



Resisting Dimension Normalization 

Snowflaking = Dimension table normalization 



  





Redundant attributes are removed from the denormalized dimension table and are placed in normalized secondary dimension tables Fully snowflaked schema = 3NF ER diagram

The dimension tables must not be normalized, and should remain as flat tables. Numerous tables and s usually translate into slower query performance. Efforts to normalize any of the tables in a dimensional database solely in order to save disk space are a waste of time. Disk space savings gained by normalizing the dimension tables are typically less than one percent of the total disk space needed for the overall schema. Normalized dimension tables destroy the ability to browse within a dimension or across dimensions (e.g., list package types for each brand in a category). SQL needed becomes too complex. The fact table is naturally normalized.

10

Too Many Dimensions 





Too many dimensions increase space requirements for the fact table. A very large number of dimensions typically means that several dimensions are not completely independent and should be combined. A single hierarchy should not be captured in separate dimensions.

Surrogate Keys 

 

Surrogate keys are integers assigned sequentially as needed to populate a dimension. They serve to dimension tables to the fact table. Avoid embedding intelligence in the data warehouse keys. Benefits: 

  

Surrogate keys buffer the DW environment from operational changes. What happens when operations decide to recycle numbers after some period of inactivity? Fine for operational systems, but problematic for DW if it is using numbers as a PK. Can more easily integrate data from multiple operational systems, even if they lack consistent source keys. Performance advantages because small size of surrogate keys leads to smaller fact tables Surrogate keys are used to one of the primary techniques for handling changes in dimension table attributes.

11

Inventory

Overview  

   

Value chain implications Inventory periodic snapshot model, transaction and accumulating snapshot models Semi-additive facts Enhanced inventory facts Data Warehouse bus architecture and matrix Conformed dimensions and facts

12

Value Chain 





The value chain identifies the natural, logical flow of an organization’s primary activities. Operational source systems produce transactions or snapshots at each step in the value chain. They generate interesting performance metrics along the way. Each business process generates one or more fact tables.

Inventory Models 

Inventory periodic snapshot 



Inventory transactions 



Inventory level of each product measured daily (or weekly) – represented as a separate row in a fact table As products move through the warehouse, all transactions with impact on inventory levels are recorded

Inventory accumulating snapshot 

One fact table row for each product updated as the product moves through the warehouse

13

Inventory Periodic Snapshot Model 

Business need 

Analysis of daily quantity-on-hand inventory levels by product and store



Business process



Granularity



Dimensions









Retail store inventory Daily inventory by product at each store Date, product, store

Fact 

Quantity on hand

Inventory Periodic Snapshot Model Challenge 

Very dense (huge) fact table 

   

As opposed to retail sales, which was sparse because only about 10% of products sell each day

60,000 items in 100 stores = 6,000,000 rows If 14 bytes per row: 84MB per day One-year period: 365 x 84MB = 30GB Solution: Reduce snapshot frequencies over time   

Last 60 days at daily level Weekly snapshots for historical data For a 3-year period =208 snapshots vs. 3x365=1095 snapshots; reduction by a factor of 5

14

Semiadditive Facts Inventory levels (quantity on hand) are additive across products or stores, but NOT across dates = semi-additive facts Compare to Retail Sales:







once the product is sold it is not counted again

Static level measurements (inventory, balances…) are not additive across date dimension; to aggregate over time use average over number of time periods.



Enhanced Inventory Facts     

Number of turns = total quantity sold / daily average quantity on hand Days’ supply = final quantity on hand / average quantity sold Gross profit = value at latest selling price - value at cost Gross margin = gross profit / value at latest selling price GMROI (Gross Margin Return On Inventory)   



Need additional facts: 



GMROI = number of turns * gross margin measures effectiveness of inventory investment high = lot of turns and more profit, low = low turns and low profit quantity sold, value at cost, value at latest selling price

GMROI is not additive and, therefore, is not stored in enhanced fact table. It is calculated from the constituent columns.

15

Inventory Transactions Model 

Record every transaction that affects inventory            

Receive product Place product into inspection hold Release product from inspection hold Return product to vendor due to inspection failure Place product in bin Authorize product for sale Pick product from bin Package product for shipment Ship product to customer Receive product from customer Return product to inventory from customer return Remove product from inventory

Inventory Transactions Model - Con’t     

Dimensions: date, warehouse, product, vendor, inventory transaction type. The transaction-level fact table contains the most detailed information possible about the inventory. It is useful for measuring the frequency and timing of specific transaction types. It is impractical for broad data warehouse questions that span dates or products. To give a more cumulative view of a process, some form of snapshot table often accompanies a transaction fact table.

16

Inventory Accumulating Snapshot Model  

Build one record in the fact table for each product delivery to the warehouse Track disposition of a product until it leaves the warehouse       



 

Receiving Inspection Bin placement Authorization to sell Picking Boxing Shipping

The philosophy of the inventory accumulating snapshot fact table is to provide an updated status of the product shipment as it moves through above milestones. Rarely used in long-running, continuously replenished inventory processes. More on this in chapter 5.

Value Chain Integration      

Both business and IT organizations are interested in value chain integration Desire to look across the business to better evaluate overall performance Data marts may correspond to different business processes Need to look consistently at dimensions shared between business processes Need an integrated data warehouse architecture If dimension table attributes in various marts are identical, each mart is queried separately; the results are then outer-ed based on a common dimension attribute = drill across

17

Data Warehouse Bus Architecture    

Cannot built the enterprise data warehouse in one step. Building isolated pieces will defeat consistency goal. Need an architected incremental approach  data warehouse bus architecture. By defining a standard bus interface for the data warehouse environment, separate data marts can be implemented by different groups at different times. The separate data marts can be plugged together and usefully coexist if they adhere to the standard.

Data Warehouse Bus Architecture – Cont’d 



During architecture phase, team designs a master suite of standardized dimensions and facts that have uniform interpretation across the enterprise. Separate data marts are then developed adhering to this architecture.

18

Data Warehouse Bus Matrix

 

The rows of the bus matrix correspond to business processes  data marts Separate rows should be created if:   



the sources are different, the processes are different, or a row represents more than what can be tackled in a single implementation iteration.

Creating the DW bus matrix is a very important up-front deliverable of a DW implementation. The DW bus matrix is a hybrid resource: technical design tool, project management tool, and communication tool.

Conformed Dimensions 

Conformed dimensions are:  



Conformed dimensions have consistent    



identical, or strict mathematical subsets of the most granular, detailed dimension. Dimension keys Attribute column names Attribute definitions Attribute values

If two marts have dimensions (e.g., customer, product) that are not conformed, then they cannot be used together

19

Types of Dimension Conformity 

Mean same thing  



Rolled-up level of granularity 



Roll-up dimensions conform to the base-level atomic dimension if they are a strict subset of that atomic dimension. (see Fig. 3.9)

Dimension subset at same level of granularity 



Single shared table or physical copy Consistent data content, data interpretation, presentation

At same level but one represents only a subset of rows

Combination of above

Centralized Dimension Authority 

The major responsibility of the centralized dimension authority is to:   

 

establish, maintain, and publish the conformed dimensions to all client data marts.

90% of up-front data architecture effort Political challenge

20

Conformed Facts 





In general, facts table data is not duplicated explicitly in multiple data marts. If facts live in more than one location, then their definitions and equations must be the same and they must be called the same. If it is impossible to conform a fact exactly, then different names should be given to different interpretations. This will make it less likely that incompatible facts will not be used in a calculation.

Procurement

21

Overview  



Value chain reinforcement Blended versus separate transaction schema Slowly changing dimension techniques

Procurement Case Study 

Procurement involves a wide range of activities:    



Negotiation of contracts Issuing of purchase requisitions & POs Tracking receipts Authorizing payments

Common analytic requirements:  

 

Which materials/products are purchased most frequently? Who supplies them? At what prices? Across the enterprise, are there opportunities to negotiate contracts by consolidating suppliers, single sourcing? Are we purchasing from the preferred vendors? How are vendors performing?

22

Procurement Transactions 1.

Business process to model: Procurement 

2. 3.

Grain: 1 row per procurement transaction Key dimensions:



Transaction date Product Vendor (1 row for each vendor) Contract (1 row for each set of negotiated with a vendor) Procurement transaction type



Transaction date and product are conformed dimensions

   

4.

Transactions: purchase requisitions, purchase orders, shipping notifications, receipts, and payments

Measured facts:  

Procured units Transaction amount

Multiple- vs. Single-Transaction Fact Tables: Issues 



Business s view purchase orders, shipping notices, warehouse receipts, and vendor payments as separate & unique processes Several procurements transactions come from different source systems   



Several transaction types have different dimensionality 

 

Purchasing system: purchase requisitions and purchase orders Warehousing system: shipping notices and warehouse receipts s payable system: vendor payments E.g., discounts applicable to vendor payments but not to other types of transactions

Control numbers such as PO # generated during procurement process are candidates for degenerate dimensions Design decision: 

Build separate fact tables for each transaction type, or Build a blended transaction fact table with a transaction type dimension



No simple answer.



23

Multiple- vs. Single-Transaction Fact Tables - Con’t. 

Questions to ask:      



What are the s’ analytic requirements? How do s usually analyze data? Do they analyze multiple transaction types together or do they usually look at a single transaction type? Are there multiple unique business processes? Yes  leaning towards separate tables Are multiple source systems involved? Yes  leaning towards separate tables What is the dimensionality of the facts? Do some dimensions apply only to some transaction types? Yes  leaning towards separate tables

Solution: multiple transaction fact tables

Multiple Fact Tables  

Multiple fact tables for procurement processes Advantages:   



Richer, more descriptive dimensions and attributes Simplified staging activities, since operational data exist in separate source systems Loading data into separate fact tables will be less complex than attempting to integrate from multiple sources

Disadvantages: 

More time to manage and ister: more tables to load, index, and aggregate

24

Slowly Changing Dimensions 





We have assumed dimensions to be independent of time, but some dimensions (other than natural keys) may change slowly with time Need to track change, without full-blown normalized structure; without making every dimension time-dependent For each attribute in our dimension tables, we must specify a strategy to handle change

Techniques for Dealing with Dimension Change – Type 1: Overwrite the Value 





Overwrite the old attribute value in the dimension row, replacing it with the current value.  the attribute always reflects the most recent assignment The type 1 response is easy to implement, but it does not maintain any history of prior attribute values Question: Is there a business need for retaining the old attribute value?

25

Techniques for Dealing with Dimension Change – Type 2: Add a Dimension Row  

Create a new dimension row reflecting the new attribute Two separate surrogate keys – one for old row and one for new 

   

Could also use a “most recent row indicator” to tell us which of the two rows is the current Fact table is again untouched Adding a dimension row is the primary technique for accurately tracking SCD attributes Advantage:  



Product Key is used as the primary key instead of the SKU number, which is the natural key and is the same for both rows

New dimension row automatically partitions history in fact table – pre-change fact rows use the pre-change surrogate key No need to revisit preexisting aggregation tables

Disadvantage:  

Accelerated dimension table growth Does not allow us to associate the new attribute value with the old fact file or vice versa

Techniques for Dealing with Dimension Change – Type 3: Add a Dimension Column   

Add a new dimension column containing the old attribute value (E.g., “Prior Department”) Overwrite the old value with the new More appropriate when there is a need to associate new attribute values with old fact history 

   

E.g., business need to track both old and new values of department attribute both forward and backward

Management can use either value for analysis Allows for observing new and historical fact data by either the new or prior attribute values Used less frequently Inappropriate to track numerous intermediate attribute values

26

Hybrid Slowly Changing Dimension Techniques 

Two approaches that combine the basic SCD techniques: 





Predictable changes with multiple version overlays Unpredictable changes with singleversion overlay

These approaches provide more flexibility at the cost of greater complexity

Predictable Changes with Multiple Version Overlays   

Used in cases of sales organization realignments Example: Over a 5-year period the sales organization is reorganized five times. At first sight, candidate for Type 2 approach (add dimension row), but more complex business requirements. E.g.,   

 

Report each year’s sales using the district map for that year Report each year’s sales using the district map from an arbitrary different year Report an arbitrary span of years’ sales using a single district map from a chosen year.

Type 3 is also inappropriate because >2 district maps Because changes are predictable, an extension of Type 3 is possible  Multiple District columns: 

Current District; District 2001; District 2002; …

27

Unpredictable Changes with SingleVersion Overlay 



Preserve historical accuracy surrounding unpredictable attribute changes while ing the ability to report historical data according to the current values Issue a new dimension row (type 2) to capture the change and add a new dimension column to track the historical value (type 3). Also, overwrite “Current Department” value (Type 1).

More Rapidly Changing Dimensions 



Break off the rapidly changing attributes into one or more separate dimensions Two foreign keys in fact table: 1. 2.

Primary dimension table Rapidly changing attribute(s)

28

Session 4 Case Study Retail Case 6a346

Overview 4q3b3c

More details 26j3b

Related Documents 171j1w

Session 4 Case Study Retail Case 6a346

Session 4 Case Study Questions Trendsetter Inc 673c2n

Retail Canadian Tire Case Study 5c3z3d

Abbott Case Study-retail Case.pdf 2y1lz

Ccna 4 Case Study w164

Case Study 4 1v5o4m

More Documents from "Vikas" 2e4j6g

Tikona Tariff Plan National(india) 1w6q2y

15 Bollywood Movies That Were Way Ahead Of Their Time 5i1w6l

Amr Constructions Ltd..pdf i1wc

Rare Astro Yogas Part 3 3w391u

Haryan Return Help Utility 1kt40

Supply Chain Plan Ppt 5z6s2a