Free trial

Document versioning in Liferay and Amazon S3

Liferay Portal is an enterprise collaboration platform with document management system. It supports multiple users to have access to the same files. Making a mistake, while editing a file represents a risk for all the users. What if someone deletes an important part of a text? Is it definitely lost? No, it isn’t. We can rely on Document Pros versioning system to address this concern. But, what if you use an external system as your document management repository (such as Amazon S3) that provides an own versioning capability too? Which versioning you should rely on then? The article below gives some insights to Liferay and Amazon S3 versioning.

Liferay versioning

Liferay has a built-in versioning system that’s enabled by default. Every edit of a document is saved as new revision. We can look at previous versions at anytime and even revert back to it as needed. I prepared a little example how it works:versions

  1. Create test document (100kB, version 1.0)
  2. Upload new version (400kB, version 1.1)
  3. Upload new version (500kB, version 1.2)
  4. Revert to version 1.1 (400kB, version 2.0)
  5. Upload new version (600kB, version 2.1)
  6. Upload new version (1MB, version 2.2)

In summary, if we upload a new version of the document, Liferay increments the minor version number (e.g.: 1.0 -> 1.1). If we revert to a previous version, Liferay increments the major version number and resets the minor version number (e.g.: 1.2 -> 2.0).

How does Liferay store files and revisions in the storage repository? You need to know the file system structure in Liferay isn’t identical to your local file system. You wouldn’t find the file directly on the storage, if looking for it based on the file name. Every file has its own unique directory with its unique name that holds all revisions of the file (each revision represented by a unique file). File names of revisions is based on their version number, such as 1.0, 1.1 and so on. You can see the way files are stored physically on drive on the screenshot below.

listfiles

As you see on the screenshot, Liferay saves full files, not differential updates.I looked through the configuration options and checked the documentation too and it seems like differential updates are not supported today. Even when reverting to previous versions, the new version with incremented major version number is created as a copy of the reverted file and thus we are storing the same file (with different version numbers) twice. Each update is going to eat into your storage coapacity.

Amazon S3 versioning

Amazon S3 also provides versioning capability, it can be enabled on S3 Management Console, which you might know from previous articles introducing Amazon S3. Versioning can be enabled only for the whole bucket and it can’t be disabled, only suspended.

Snímek obrazovky-S3 Management Console - Mozilla Firefox-3

How does it work on Amazon S3? It works differently from Liferay. The Amazon S3 versioning system is based on file names. Unlike in Liferay, you could find the file based on its file name and see all the associated versions.

I have tried to simulate the same steps in the S3 management console that I did in Liferay. You have the option to hide versioning in Amazon S3. When you hide it, you always see the latest version only. We are interested in versions, so let them to be shown. You can see a list of versions with the date and time of creation under each file name. Amazon S3 doesn’t do differential updates either, so as you see, file sizes match the file sizes that we have seen in the Liferay example.

Snímek obrazovky-S3 Management Console - Mozilla Firefox-4

Note: There isn’t a revert button in Amazon S3. One needs to download and upload a revision again.

To enable one or to use the combination of Liferay and Amazon S3 versioning?

Is it possible to choose, which versioning system we want to use? Yes and no. It’s up to you to enable Amazon S3 versioning, but Liferay’s versioning can’t be disabled out-of-the-box according to the available documentation (you could customize it programatically in case). Anyway, how does it look like if we use Amazon S3 versioning system simultaneously with Liferay versioning system?

You need to know one thing to understand this. Liferay uses temporary files called PWC (private working copy). It is a temporary file which is created anytime we manipulate the document that needs exclusive rights for operations such as edit, checkin. Existence of this file means that someone is editing the file in Liferay at the moment. PWC is deleted after finishing all changes. See example and screenshot below.

Example scenario: We have uploaded version 1.0 (100kB) in Liferay and want to update this file with a new version of a new size (400kB). When the new version is uploading, the previous version is temporarily copied to a PWC file (it means no one can edit the document). After the upload finishes, the old PWC is deleted and Liferay creates a new PWC file with a new version. Once that’s done, Liferay creates a file 1.1 with a new version and deletes the PWC file. Interestingly enough, the versioning of Amazon S3 notices the PWC file too and keeps versionining it (that keeps consuming space without added value).
versioningboth

Summary

While providing important benefits, version control is consuming space. This needs to be taken into account especially with Amazon S3 as a storage that you pay for, on demand. Consider carefully, if you require version control. Turn it off to save space and consequently money, if it is not important to you. Both Liferay and Amazon S3 provides its own version control system. Both can be turned on/off individually. Amazon’s can be turned on/off via S3 management console. Liferay’s can however be turned on/off only programmatically. If you require version control, do not use both simultaneously. It consumes unnecessary space. Leave only Liferay turned on, because 1. it is turned on by default and the way of turning it off isn’t particularly easy and 2. it saves more space than Amazon S3 versioning, which backs up temporary Liferay files that Liferay normally deletes.

This is our opinion based on a rough anylsis. We haven’t done a detailed feature to feature comparison of the two versioning systems, so you may find out additional advantages or disadvantages of using either these. Let us know, if you find out some!

Used resources

Amazon S3 User guide
Liferay 6.1 User guide

amazon liferay portal

Leave a Reply

Related articles

JSON

Let’s make LLMs generate JSON!

In this article, we are going to talk about three tools that can, at least in theory, force any local LLM to produce structured output: LM Format Enforcer, Outlines, and Guidance. After a short description of each tool, we will evaluate their performance on a few test cases ranging from book recommendations to extracting information from HTML. And the best for the end, we will show you how forcing LLMs to produce a structured output can be used to solve a very common problem in many businesses: extracting structured records from free-form text.

Notiondipity: What I learned about browser extension development

Me and many of my colleagues at profiq use Notion for note-taking and work organization. Our workspaces contain a lot of knowledge about our work, plans, or the articles or books we read. At some point, a thought came to my mind: couldn’t we use all this knowledge to come up with project ideas suited to our skills and interests?

From ChatGPT to Smart Agents: The Next Frontier in App Integration

It has been over a year since OpenAI introduced ChatGPT and brought the power of AI and large language models (LLMs) to the average consumer. But we could argue that introducing APIs for seamlessly integrating large language models into apps developed by companies and independent hackers all over the world can be the true game changer in the long term. Developers are having heated discussions about how we can utilize this technology to develop truly useful apps that provide real value instead of just copying what OpenAI does. We want to contribute to this discussion by showing you how we think about developing autonomous agents at profiq. But first a bit of background.

Tags