Document versioning in Liferay and Amazon S3
Posted 9 years ago by janhaj
Liferay Portal is an enterprise collaboration platform with document management system. It supports multiple users to have access to the same files. Making a mistake, while editing a file represents a risk for all the users. What if someone deletes an important part of a text? Is it definitely lost? No, it isn’t. We can rely on Document Pros versioning system to address this concern. But, what if you use an external system as your document management repository (such as Amazon S3) that provides an own versioning capability too? Which versioning you should rely on then? The article below gives some insights to Liferay and Amazon S3 versioning.
Liferay has a built-in versioning system that’s enabled by default. Every edit of a document is saved as new revision. We can look at previous versions at anytime and even revert back to it as needed. I prepared a little example how it works:
- Create test document (100kB, version 1.0)
- Upload new version (400kB, version 1.1)
- Upload new version (500kB, version 1.2)
- Revert to version 1.1 (400kB, version 2.0)
- Upload new version (600kB, version 2.1)
- Upload new version (1MB, version 2.2)
In summary, if we upload a new version of the document, Liferay increments the minor version number (e.g.: 1.0 -> 1.1). If we revert to a previous version, Liferay increments the major version number and resets the minor version number (e.g.: 1.2 -> 2.0).
How does Liferay store files and revisions in the storage repository? You need to know the file system structure in Liferay isn’t identical to your local file system. You wouldn’t find the file directly on the storage, if looking for it based on the file name. Every file has its own unique directory with its unique name that holds all revisions of the file (each revision represented by a unique file). File names of revisions is based on their version number, such as 1.0, 1.1 and so on. You can see the way files are stored physically on drive on the screenshot below.
As you see on the screenshot, Liferay saves full files, not differential updates.I looked through the configuration options and checked the documentation too and it seems like differential updates are not supported today. Even when reverting to previous versions, the new version with incremented major version number is created as a copy of the reverted file and thus we are storing the same file (with different version numbers) twice. Each update is going to eat into your storage coapacity.
Amazon S3 versioning
Amazon S3 also provides versioning capability, it can be enabled on S3 Management Console, which you might know from previous articles introducing Amazon S3. Versioning can be enabled only for the whole bucket and it can’t be disabled, only suspended.
How does it work on Amazon S3? It works differently from Liferay. The Amazon S3 versioning system is based on file names. Unlike in Liferay, you could find the file based on its file name and see all the associated versions.
I have tried to simulate the same steps in the S3 management console that I did in Liferay. You have the option to hide versioning in Amazon S3. When you hide it, you always see the latest version only. We are interested in versions, so let them to be shown. You can see a list of versions with the date and time of creation under each file name. Amazon S3 doesn’t do differential updates either, so as you see, file sizes match the file sizes that we have seen in the Liferay example.
Note: There isn’t a revert button in Amazon S3. One needs to download and upload a revision again.
To enable one or to use the combination of Liferay and Amazon S3 versioning?
Is it possible to choose, which versioning system we want to use? Yes and no. It’s up to you to enable Amazon S3 versioning, but Liferay’s versioning can’t be disabled out-of-the-box according to the available documentation (you could customize it programatically in case). Anyway, how does it look like if we use Amazon S3 versioning system simultaneously with Liferay versioning system?
You need to know one thing to understand this. Liferay uses temporary files called PWC (private working copy). It is a temporary file which is created anytime we manipulate the document that needs exclusive rights for operations such as edit, checkin. Existence of this file means that someone is editing the file in Liferay at the moment. PWC is deleted after finishing all changes. See example and screenshot below.
Example scenario: We have uploaded version 1.0 (100kB) in Liferay and want to update this file with a new version of a new size (400kB). When the new version is uploading, the previous version is temporarily copied to a PWC file (it means no one can edit the document). After the upload finishes, the old PWC is deleted and Liferay creates a new PWC file with a new version. Once that’s done, Liferay creates a file 1.1 with a new version and deletes the PWC file. Interestingly enough, the versioning of Amazon S3 notices the PWC file too and keeps versionining it (that keeps consuming space without added value).
While providing important benefits, version control is consuming space. This needs to be taken into account especially with Amazon S3 as a storage that you pay for, on demand. Consider carefully, if you require version control. Turn it off to save space and consequently money, if it is not important to you. Both Liferay and Amazon S3 provides its own version control system. Both can be turned on/off individually. Amazon’s can be turned on/off via S3 management console. Liferay’s can however be turned on/off only programmatically. If you require version control, do not use both simultaneously. It consumes unnecessary space. Leave only Liferay turned on, because 1. it is turned on by default and the way of turning it off isn’t particularly easy and 2. it saves more space than Amazon S3 versioning, which backs up temporary Liferay files that Liferay normally deletes.
This is our opinion based on a rough anylsis. We haven’t done a detailed feature to feature comparison of the two versioning systems, so you may find out additional advantages or disadvantages of using either these. Let us know, if you find out some!