As you may know, the default storage for attachment file is the database. The file contents are transformed into Base64-encoded text and stored in Blob at PC_DATA_WORKATTACH table (Class: Data-WorkAttach-File). The size gets increased by 33% with this encode. If your application is going to host massive volume of attachment files, database usage will slow down your system because of its query, size, and encoding/decoding overhead. In that case, use of other content storage such as Repository is recommended. For Repository, you can choose from JFrog Artifactory, Amazon S3, File System, or Microsoft Azure. In my project, we used File System (NAS) and had an inevitable issue. In this post, I will share what the issue was, and how we solved it.
1. Issue with using Repository (File System)
In Repository rule, you can specify only one resource path as below. Resource path can't be multiple.
This means that all the attachment files are stored in a single directory. There is no hierarchical structure. However, most of NAS product has a limitation in number of files that can be saved in a single directory. For example, the one we used could not host more than 4,000,000 files. This is a hard limit and we knew this ceiling will be hit within a couple of months after Go-Live. There was not much we could do from infrastructure side and we had to resolve this issue at Pega side.
2. Environment example and use case
Operator attaches a word file "PegaInstallation.docx" to a case "W-1". Both Pega node and NAS operating system is Linux. Pega node mounts NAS's directory "/share" to its local "/mnt/nas" directory (Command: mount –t nfs 192.168.3.200:/share /mnt/nas). In Pega side, create a Repository rule "MyRepo" and set /mnt/nas for the Resource Path. Also set MyRepo and /attachments directory in the Application rule.
In above scenario, let me explain how things work. The file "PegaInstallation.docx" is stored at NAS, under /share/attachments directory. The file is renamed as "PegaInstallation_W-1.docx". But where exactly is the destination path managed when saving an attachment file? The answer is, operator's clipboard "Application.pyRepoStorageTarget" (see below).
Attachment file gets stored at /share/attachments/PegaInstallation_W-1.docx.
Our approach was to create a sub directory for each day (i.e. YYYYMMDD) under attachments directory, and store attachment files in there. The number of files in sub directory is not counted in this 4,000,000 and there is no way that the limit is reached in a single day. To do this, we have overridden Code-Security.ApplicationProfileSetup activity to do Property-Set:
This activity is an extension point and executed when an operator logs on. If your application requires any special initialization processing for authenticated operators, you can override this rule in your application's ruleset to perform such processing. I've also noticed that this activity gets executed every time case is created too. With this customization, the clipboard will be changed as below when you log on or create a new case:
Attachment file now gets stored at /share/attachments/20210913/PegaInstallation_W-1.docx.
You may wonder if this attached file can be opened without any issue from the next day. That is no problem, because the actual stored directory path is recorded in Blob ("pyContentLocation") in PC_DATA_WORKATTACH table (Class: Data-WorkAttach-File). When attachment file is opened, system reads this value. Operator's clipboard, "Application.pyRepoStorageTarget.pyFolderId", is used only when writing, not when reading.
Initially I was going to manually create sub directories for each day but I've figured system creates it automatically if it doesn't exist. So there is no need to manually prepare this directory structure in advance yourself.
You may also wonder, "If Code-Security.ApplicationProfileSetup activity is executed every time case is created as well, doesn't overwriting the value by "Application.pyRepoStorageTarget.pyFolderId + ..." add another unnecessary sub directory in a loop?" No, the value is not saved to the database, and it is only overwritten on memory. Every time case is created, the entire Application page on clipboard gets initialized before this activity is called. So it shouldn't be a weird loop structure - no worries.