How to identify/handle duplicate assets in AEM(Adobe Experience Manager) Assets?
In this blog post, let us explore how to identify/handle the duplicate assets in Adobe Experience Manager Assets.
The default behavior is while uploading the asset if the asset already exists with the same name under the same folder, the popup will be displayed - you should be able to keep both files or replace the existing one or create a new version for the existing files, this will not check the binary to identify the duplicate files.
Duplicate Detection:
The duplicate detection feature(binary level) can be enabled if required, Duplicate detection is disabled by default. If you attempt to upload an asset that exists in Adobe Experience Manager Assets, the duplicate detection feature reports the asset as duplicate.
The duplicate detection feature can be enabled through the "Day CQ DAM Create Asset" OSGI config - http://localhost:4502/system/console/configMgr/com.day.cq.dam.core.impl.servlet.CreateAssetServlet(ensure the OSGI configuration changes are enabled through code)
When a user attempts to upload an asset that exists in AEM, the system checks for conflict and indicates it. The assets are identified using SHA-1 hash stored at jcr:content/metadata/dam:sha1, which means duplicate assets are detected irrespective of the filenames.
The SHA-1 hash is generated during the asset creation and stored under jcr:content/metadata/dam:sha1 - the SHA-1 hash is generated based on the asset binary
Now while you upload the same image, the duplicate detection feature reports the asset as a duplicate asset - the duplicate detection feature verifies the assets across the DAM repository irrespective of the name.
The duplicate asset can be kept or skipped from upload(Delete)
The duplicate detection feature helps us to avoid duplicate uploads of the same files if the same file already exists in the system.
MSM for Assets:
The MSM for Assets can be used to reuse the same assets within different websites by customizing the required metadata.
Using MSM for Assets, you can:
- Create assets once and then make copies of these assets to reuse in other areas of the site.
- Keep multiple copies in synchronization and update the original primary copy once to push the changes to the child copies.
- Make local changes by temporarily or permanently suspending the linking between parent and child assets.
MSM maintains a live relationship between the source asset and its live copies so that:
- Changes to the source assets are applied (rolled out) to live copies as well, that is the live copies are synchronized with the source.
- You can update the live copies by suspending the live relationship or removing the inheritance for a few limited fields. The modifications to the source are no longer applied to the live copy.
To create the live copy, select the asset, Create, Live Copy
Select the destination folder, where the live copy asset should be stored
Enter title, change the Name if required, select the rollout config
If required, the inheritance can be broken on the copied asset the local changes can be enabled, the source changes will be reflected on the copied assets on rollout/synchronization
You can completely remove the relationship between a source and a live copy using Detach action. The live copy becomes a stand-alone asset or folder after it is detached. You can undo all the local modifications and revert the asset to the state of its source
Please note, the live copy asset is also considered duplicate and reported.
Identify the Existing Duplicate:
The below steps can be followed to identify the existing duplicate assets.
Get all the asset files under a specific folder with the SHA1 value through Query Builder.
http://localhost:4502/bin/querybuilder.json?p.hits=selective&path=/content/dam&p.properties=jcr:path%20jcr:content/metadata/dam:sha1&p.limit=-1&property=jcr:content/metadata/dam:sha1&property.operation=exists
Use the below query to identify if the asset is a live copy
http://localhost:4502/bin/querybuilder.json?p.hits=selective&path=/content/dam&p.properties=jcr:path%20/jcr:content/cq:LiveSyncConfig/cq:master%20jcr:content/metadata/dam:sha1&p.limit=-1&property=jcr:content/metadata/dam:sha1&property.operation=exists
This will provide the output in JSON
Convert JSON to CSV - use any JSON to CSV converter, I am using https://konklone.io/json/
Download the CSV and open it in Excel
Now let’s highlight the cells with duplicate values. Select column "jcr:content/metadata/dam:sha1",Click Home/Conditional-formatting/Highlighted Cells Rules/Duplicate-Values. Then click ok on the next populated window
Apply the filter by selected cells color, select the one of the duplicated SHA1 value - Right-Click - Filter - Filter By Selected cell color
Now you should be able to see all the files which are duplicates
Now you should be able to perform required operations on the duplicated assets
e.g Delete the duplicated asset or delete and create live copy asset etc
 Delete specific asset - curl -u admin:admin -k -X DELETE http://localhost:4502/content/dam/email/hibernate4-2.png
Create Live Copy -  curl 'http://localhost:4502/bin/wcmcommand' --data-raw '_charset_=utf-8&cmd=createLiveCopy&destPath=%2Fcontent%2Fdam%2Femail&title=test5&label=hibernate4-sc1.png&srcPath=%2Fcontent%2Fdam%2Ftest%2Fhibernate4.png'
You can create a script file with multiple curl commands to delete the duplicate assets and/or to create the live copies if required, even you write a groovy script to delete the assets and or to create live copies for the assets.
















No comments:
Post a Comment