Dreaded nested IF’s, why you should have a dates table, and an interesting daily SSIS job.

I love committing sins and I’m sure this is one of them.

We have a daily SSIS job that runs to populate our OLAP database which triggers nightly around 2am. This job is kind of cool in as much as it looks like this:

  1. Trigger stored procedure in database, aka the Daily Parent, which then runs about 20 different stored procedures to update the “base tables” (i.e. the raw data). Takes about 20-30 minutes.
  2. Run (2) stored procedures at the same time that both update very large tables which are built using functions and the data in the base tables. Each job takes 30 minutes to run, or they take 30 minutes to run together.
  3. Run the Daily Child, which is structurally identical to the Parent. It triggers another series of stored procedures which then build tables from the tables that run in step 2, these packages are designed for doing complex reporting on and are psuedo-cubes.

Now, I know what you’re saying. That sounds horrible. But wait, there’s more.

We have decided we now need to run a weekly job every Saturday, although upon discussing it and wishing to not complicate our lives in the future we also decided to add a monthly job. These will not repeat the pattern above, but instead simply mimic the logic in Step 1 (i.e. it will run (1) stored procedure that will then trigger a series of jobs in order, but there will be no children, and no concurrency.)

Oh, and for fun we still want the daily job to run, and for this weekly or monthly job to run after it finishes. For added fun we decided to it all inside the same SSIS package instead of making 2 new ones in order to have better insights into logging, etc.

Shit, I forgot about the most fun part. If the Saturday is the first Saturday in a month, then run the monthly job and not the weekly job, but if its not the first Saturday in the month then just run the weekly job.

You see, those jobs that run concurrently only take 30 minutes because they’re only doing some work, and only recalculating the last 90 days of data plus any new data that came into the base tables, however data can change after 90 days, and over time these tables will be incorrect compared to the live views which actively hit the raw base tables. Running a full year at a time can take about 2 hrs, and we’d like to run two years at a time each week, unles its the first Saturday and we want to actually blow up the tables and refresh them which would take quite awhile.

Now in order to do this all we need to do is figure out whether its a Saturday today, and if so whether its the first Saturday of the month. And at first I thought about doing some manipulation such as select datepart(dw, getdate()) and working from there, but I thought that might be a little difficult for a future coder to inherit compared to just using our dates table, which is very handy for things like this, and very easy to pull into your database.

In the end I think a nested loop here is inevitable, and in conclusion we now have this monstrosity which I have brought to life and now lives in production.

IF (SELECT DayOfWeekNum FROM DIM.Dates WHERE DateKey = CAST(getdate() AS date)) = 7 BEGIN IF (SELECT 1 FROM DIM.Dates WHERE DateKey = CAST(getdate() AS date) AND DayOfWeekNum = 7 AND DayOfMonthNum BETWEEN 1 AND 7) = 1 BEGIN --I am a Saturday SELECT 3 END ELSE BEGIN --I am the first Saturday of the month SELECT 4 END END ELSE BEGIN --I am not a Saturday. Stop job SELECT 2 END 

TL;DR: I suck at SQL

…but wait, there’s more! What about if our beloved DBA’s need to take a Saturday off for maitenance? What if they want us to not run it on a Tuesday? Do we have to go manually stop the job? Nope. We built a logging table where we can schedule the job to simply kill itself before it starts by inserting the date you want, so anyone who has write access to the OLAP (me, my immediate peer, and the DBA team only) can just:

insert into table select '2021-06-21' 

or

insert into table select getdate() 

And poof, job will run, log that it ran, and then stop without touching any of the other sprocs.

Now you may all tell me how horrible that would be to inherit 🙂

submitted by /u/stiffupperleg
[link] [comments]


Go to Source of this post
Author Of this post: /u/stiffupperleg
Title Of post: Dreaded nested IF’s, why you should have a dates table, and an interesting daily SSIS job.
Author Link: {authorlink}