Here is how you would filter a SQL query by last run date of the job in Scribe Insight:
1. Get the file path for the scribe dts. Easiest way to do this is navigate to the collaborations folder and shift + right click and select “copy as path.”
2. Create a calculated variable called LastRunDateTime.
3. Set the formula for the calculated variable to be
GETLASTRUNDATE( "C:\Program Files (x86)\Scribe\Collaborations\CustomerTest.dts" ) (replace the file path with the one copied in step 1).
4. Call that in your query as follows:
SELECT TASKID, FLOWID, STEPID, STARTTIME, ENDTIME, STATUS, LOCKBY, LOCKTIME, USERID, USERKEY, USERKEY2, PRIORITY, DEADLINEFLAG, DEADLINE,
AVAILABLEDATE, FILE_NO, FOLDER_NO, DESCRIPTION, DRAWER, PACKAGETYPE, FROM_USERID, DATE_INITIATED, LOCKPERIOD, FROM_FLOWSTEP, DOCTYPE,
FROM dbo.IR_TASK_DATA (nolock)
where ENDTIME >= :LastRunDateTime AND File_NO <> ' ' and File_NO is not null
ORDER BY ENDTIME
Considerations and shortcomings of this approach
- If you lose your scribeinternal database, your process will fall apart—this approach depends on the executionlog tables. Backup, backup, backup.
- The last run date approach can be problematic with related records, for example when accounts and contacts are created. If you have an account load running off of last run date and you have a contact load running based on last run date, you risk data loads failing if the account and contacts are created after the account load has run but the contact load has not run. In this case your contact will fail because the parent customer doesn’t exist. I recommend combining these into one step—getting all contact and parent account information in one query where the modified on the contact is greater than last run date or the modified on the account is greater than last run date—that way you won’t have any dependency/timing issues causing records to fail to load.
- The last run date approach can be problematic if rows fail. Since it is only getting records that have changed since the last time the job ran, if during a load a row fails for some reason, next time the job runs it will not retry that row. I recommend adding a “retry bit” to the source data table, and if for some reason a row fails, you can update the source for retry. Then, in your DTS, have your source query include records that were modified after the last run date OR records where retry = 1/yes. This is one way to eliminate the timing issues in the previous point—if a row fails because a dependency is not present, flag the row for retry and by the time the job runs again, the dependency will probably be there.
- Make sure that if you are reading from a MSCRM database using the lastrundate approach, you are reading from the filtered views, not the base views or tables. Keep in mind that the modified date field will be 4+ hours later in the base tables and views so if you are reading from the base tables and views your last run date logic will not work dependably.
Post by: Joel Lindstrom,