Discusión y conclusiones - adicciones vol. 33, nº 4 · 2021 índice

5.5.2.1 Data versus Aggregations The most exciting advancement to ASo cubes in the past four years without question has got to be the creation of Data Slices, which provides every Essbase developer the ability to do true incremental data loads and true regional data clears. Those two abilities totally change how ASo cubes are currently used, and offer new and exciting possibilities for how they might be used in the future. The reality of real-time or near real-time updates to ASo cubes is now within reach. to understand why this would create such stupefying excitement (it literally makes me giddy to discuss it), you would need to think back first to where ASo cubes have come from. In the first release of ASo cubes, what was the behavior of the cube if you did AnythIng at all to it? Why, thanks for asking. The database would unceremoniously (and mostly without warning) just dump the data. This was frustrating beyond belief.

Picture yourself completing a benign task that you did all the time in BSo cubes;

something simple, such as adding a new member to the metrics dimension. The member would be added, the formula completed, and off to Excel^® you went to check the results.

you whipped a query right up and it would come back as #mISSIng. “hmmm,” you would say, “I must have done something wrong in my mDx statement (since, of course, I am brand new to this).” Accordingly, you would go look at the formula and study its

Figure 5.10 Creating a Query Hint. (From Oracle Essbase Administration Services. With permission.)

construction, going back and forth with mDx references before finally determining that it was written correctly. “hmmm,” you would pause again, “I must have done something wrong with my query.” Back to Excel you would go to adjust the query only to have it return #mISSIng again. off you would go to database properties to do some checking on your aggregations only to find that you have 0 cells loaded and 0 cells aggregated on this cube. “What the …,” you would literally scream, “I hate this tool. It just dumped the data.” yep it did—all the time.

That is the place where ASo cube developers started—with a new technology that carried with it a great deal of excitement, but within a tool that felt very fragile. If you disturbed the cube in almost any way, the data had to be reloaded. Those early days in ASo cube development found developers creating load files that needed no Load rules so that reloads were quick and easy, because they were done over and over and over again. It certainly is no wonder that many shunned this new storage kernel after only a few attempts to use it.

Fast forward to present day ASo cube development. now, not only can you update dimensionality without disturbing the data, you can incrementally update the data-base without disturbing the main dataset by using Data Slices. Incremental data load times are proportionate to the size of the data that is being loaded, not to the size of the existing database and this is a very significant differentiation. Just because the database is enormous does not mean adding data to it will take an excessive amount of time.

Present day, not only can you incrementally update the data, but you do not have to rerun the aggregations because they update automatically as needed. The system does it for you. how fantastic is that? I feel a bit like I am in a television infomercial right now and want to yell “kABoom.” Seriously, that is how exciting this new development in ASo cube processing is.

This section will walk through a sequential explanation of how Data Slices work, and the steps involved in using them in a summary fashion. once that is completed, the journey will continue by exploring a week of time and processing in a real-life pro-duction set of cubes that are using these specific techniques (this is very fresh, as those cubes went into production while this book was being written). This will allow for com-plete access to production code, production situations, and production pros and cons.

In addition, some assurance can be felt in the knowledge that these strategies and tech-nologies are currently being used successfully and, hopefully, that knowledge will help to alleviate any fears or angst over trying it out.

The usage of Data Slices is a very simple concept. In its most basic form, the steps to use a Data Slice include:

• Data is loaded into the main part of the database and aggregated using normal processes.

• A new load is defined and when the data is loaded from the buffer, the system is told to situate it next to the existing database as a Data Slice, instead of merging it directly into the existing database.

• When the Data Slice is loaded, the system automatically creates the required views for that slice and the system completes this task prior to making the new Data Slice visible to user queries.

• multiple slices can be added during the same process.

• Eventually, query performance will degrade as the slices pile up; a fact that will be seen in the statistics provided in EAS on the cost of querying the slices.

• once it is determined that querying the slices is too expensive, there are two options to resolve this issue:

• merge together the incremental slices into a single incremental slice; or

• merge all of the slices with the existing database to recreate a single unit

• Aggregations will be dropped and the merge process completed (no worries, the system will issue a reminder if the aggregations were not dropped before the process was started).

The decision regarding which of these two options is the best choice is based on numerous factors that might include size of the main database and current manage-ment process for the cube as a whole. There are many factors to consider that are unique to each and every installation.

5.5.2.2 Use Case Example This use case example is real and will discuss (demonstrate where appropriate) and provide production code for each of the following tasks:

• Creating the main database

• Creating a Slice

• updating dimensions

• merging Slices

• Clearing Slices

Note: I felt it was incredibly important in this section to provide real production code as we found a significant amount of issues with code that we tried to use from various sources when we were first beginning to use Data Slices. I will say that we were in an early release of this feature and the newer documentation (all sources) appears to be much more accurate and complete than the sources that we were using.

This use case example involves a set of five cubes being used in retail for analysis (sorry in advance about having to be so generic in descriptions; it would be a lot more interesting to be able to disclose the details). The requirements for this implementation had some unique characteristics that no previous implementation had:

1. Essbase was to be the database of record; data from 25 disparate sources would be provided. The staging tables would hold one week only and then be cleared.

no other database in the corporate data warehouse would contain the complete two years of history plus the one year building.

2. The data would need to exist at a very granular level for one dimension; this would create a dimension that went down six levels and had a million plus members (with potential for growth).

3. Weekly load volumes were estimated at 50 million records to start, growing to approximately 300 million records when the implementation was fully deployed.

4. most data would be created in the staging area on the weekend, but an additional data load to the staging area would occur on monday and on Thursday. This data would need to be loaded while the cube was in use, i.e., no downtime for the load and aggregation process; the cubes would need to be up monday through Friday.

5. Dimensional updates would be on the weekends only (the cubes could be taken offline at that time).

6. Essbase Classic Add-In templates would be used for reporting; no ad hoc report-ing (thank you) and the SLA for report retrievals was less than 30 seconds.

After reviewing these requirements, frankly, I was scared and a little bit leery of whether this could even be done with Essbase. mind you, I was well past those early skeptical days I discussed at the beginning of this chapter. I knew what could be done with ASo cubes, but this exceeded anything I had ever attempted. I did not want to be cutting edge. The project was too high profile to end up not working. So, what were the concerns? First, being the database of record was very concerning, especially since it looked like it was going to be a very large database. not having a corporate data ware-house to fall back on to reload all of history, should something go wrong, was very scary. Also, Essbase being the database of record places some additional pressures and responsibilities on me as the architect. For instance, it is the architect’s job to design a fail-proof back-up plan so that in every instance the data is safe and can be easily recovered. Another responsibility is to make sure there is a method predetermined for extracting the data if any down-stream systems later determine that they need this data.

The second concern was the size of the largest dimension. I had never worked with a million member dimension in a cube; I had only read about them in case studies and white papers. I could not find anyone amongst my peers using something this large in real life. I had no idea what to expect, and no one to ask. The final concern was that, like it or not, I was going to be on the cutting edge (no, bleeding edge) of relatively new technology. I was going to have to take a leap of faith and use these new Data Slices in my design to make this all work. What about volume? no, I had already done cubes with as much volume (or more), so the volume was not a worry. I knew Essbase could handle that. to try and mitigate some of these concerns (and risks), I did two things before I committed to this implementation.

First, I built a Proof of Concept (PoC) using a six-dimension model with 10 million members in the largest dimension and 24 metrics. This was pretty close to what I under-stood was needed. This model was loaded with two years of data. Existing elements in the data warehouse were used to provide real distribution within the ASo cube.

We felt this would make the queries (using default aggregations only) as close to the final experience as was possible. The results were simply astounding. The speed with which we could query this cube blew our socks off. no one could actually even believe it. Completing this PoC confirmed two critical pieces of information:

1. The existing shared platform could handle this size of a cube (although I would eventually order more memory and disk space).

2. There was a good chance the SLAs being requested could be met (barring any-thing dramatic changing in the rollout, like 10 dimensions instead of 6, or 200 metrics instead of 24 in the final design).

The second thing I did to try and mitigate my risk was to call my oracle partners.

I asked that they please find me at least one company that had a model this big running in production. oh, and did I mention I preferred the reference also be running Essbase on AIx? I did not need to talk to them (there can be much political angst when you make that kind of request), I just wanted an outline of their dimensionality (number of members in each dimension) and an assurance that they were running in production.

It took them a few days to come back to me and they found exactly one. This allowed me the freedom to tell my management (with some modicum of honesty) that I was not the first person to do this.

The final solution had five cubes. one cube held the million member dimension; the rest were all significantly smaller. For clarity and ease of reference, this case study will just reference the largest cube—cube1—and just one of the smaller cubes—cube2—

because the small ones were all handled in exactly the same manner. For a reference point, table 5.2 provides the dimensionality and member counts for cube1.

In contrast, table 5.3 reflects dimensionality and member counts of cube2 (the larg-est of the four smaller cubes).

The process that was built is fully scripted in shell scripts calling maxL commands.

running on an AIx LPAr, the server is a dual core 64-bit with three (3) CPu and 28 gb of memory. none of the processes described are done manually; everything is fully automated, although any of it can be run on demand as needed. Assuming that the first week has occurred already, the weekly process that was put into place is as follows:

• Saturday morning:

• Drop aggregations on cube1 and cube2.

• Export Data on cube1 and cube2.

• merge Slices.

• re-aggregate cube2.

• Sunday morning:

• rebuild Dimensions on cube1 and cube2.

• Aggregate cube1.

Table 5.2 Case Study Cube1 Dimension Members and Counts

Dimensions of cube1 Total Stored

metrics 171 169

Table 5.3 Case Study Cube2 Dimension and Member Counts

Dimensions of cube2 Total Stored

metrics 41 40

• Sunday Afternoon:

• Add Current Week Data Slices to cube1 and cube2.

• monday morning:

• Add Data Slice to cube2.

• Thursday Evening:

• Add Data Slice to cube2.

• Saturday morning: Start the Process All over Again.

This is the final process and in its current state, it is not exactly how it existed when it was originally put into production. A few steps had to be altered as discoveries were made on what does and does not work well with slices in real life with production vol-umes. A review of this information and discovery process is included in the next few pages. In addition, as each component of the process is discussed, the code relevant to that part of the process is also provided.

5.5.2.2.1 The Saturday Morning Process In the original deployment, the starting concept was how very cool it would be to keep the aggregations each week and not have to rebuild them. After all, this is an advantage of using Data Slices, and aggregations take up the most time in the ASo cube-building process. In real life, there were two complications with keeping the aggregations. The first complication was that the system required you to drop the aggregations to merge the slices. Because the team had no idea how long you could legitimately go without merging slices before performance was affected, and the process needed to be automated, and there was an entire weekend that could be used for processing, proceeding with caution and merging weekly seemed to be the best solution. The second complication was more practical: dimension builds take forever and a day if you perform them on a large aggregated cube. It is really not the dimension build that takes so long, but the restructure. This was an initial mistake that was made out of ignorance. In the original process, the reaggregation of all the cubes was done on Saturday nights, and the dimension builds were completed on Sunday mornings when the files were available. on cube1 the dimension builds were excessive even when the cube had only one or two weeks of data, and they got worse as data was added. Each week that was added caused the build and restructure process to extend one to two hours more. It was very quickly recognized that it would not take many weeks to blow the timeline right out of the water, and by year’s end this step in the weekly update would be taking until Wednesday to finish. hence, a change was made to the process to accommodate these real life lessons learned. The small cubes are still aggregated on Saturday because the system can handle the time these take from a resource perspective.

The servers are much busier on Sundays and getting the small cubes out of the way is one less task that has to be done. The code used to complete the first Saturday task to drop the aggregations is:

alter database ${APP_NAME}.${DB_NAME} clear aggregates;

The second task on Saturday mornings is to export the data. The export is a critical part of the disaster recovery (Dr) plan and it has already been implemented several times. Essentially, if the EtL (extract, transform, and load) folks provide the wrong data and too many steps are processed, there may come a moment that the need to revert

back to the starting point is identified. The lesson that was learned the hard way has been that although calculations can be written to clear specific cube regions, on cube1, in particular, it is much faster in some instances to reload the cube to that week’s start-ing point on Saturday mornstart-ing and reprocess it completely. The exports are also a key part of the Dr plan, if the primary backups fail for some reason. If the cube is of any size at all, dropping the aggregates seems to positively affect the export speed. The code used to complete the second Saturday task to export the data is:

export database ${APP_NAME}.${DB_NAME} level0 data to data_file '${APP_NAME}.export.txt';

The third task on Saturday morning is to merge the data slices. While the export might be slightly more efficient after the merge, the gains in efficiency did not warrant the assumed risk. Essentially, the export needs to be completed prior to altering the cube in any significant way to avoid risk of any type. The code used to complete the third Saturday task to merge all data slices into the main data slice is:

alter database ${APP_NAME}.${DB_NAME} merge all data;

It is good to note here that there are options with regards to how data is merged.

There is another variant of the command to merge all data slices into the main slice and remove zero values:

alter database ${APP_NAME}.${DB_NAME} merge all data remove_zero_

values;

In this use case, that variation of the command can never be used because meaning-ful zeros are loaded into the database. It would be detrimental, to say the least, to remove those zeros programmatically.

There is also a variant of the command to merge the incremental data slices into a single slice:

alter database ${APP_NAME}.${DB_NAME} merge incremental data;

In document adicciones vol. 33, nº 4 · 2021 índice · index (página 65-69)