Eliminate Sitecore Item Duplication Issues in Data Exchange Framework with this Fun Powershell Script!

PowershellScript to avoid Sitecore Items Duplication


Background

First of all, I know what you all are going to say. How is it possible to have Sitecore Item Duplication if each sitecore item has it's own unique id? I know, it caught me by surprise as well when I was having issues with a third party API that did it as it pleased using the Sitecore Data Exchange Framework. Sitecore Data Exchange framework is a platform that converts any third party API data into Sitecore Items. If you are interested in learning more about this feature that Sitecore offers click here

The Problem

Let me explain what is the problem I was having. The issue was that in this platform I had the following fields: Id, Time, Refrence, Exchange, Date, Location, headline and many others. The problem with this was that I had an Id field. The Id field, not to be confused with the ItemID field that comes OOTB with Sitecore was the one that was having problems. Everytime I executed the job from the Data Exchange Framework, a new item with a new Item ID was created, however, since the Id from the API was not entered as a unique key, the item was duplicated. Here is an example:



As you can see in both figures above, the Id of the item is the same: 1275993. But the Sitecore Ids are different {AA8B1D00-D898-483A-9F7B-51088B262F64} and {258C2805-33C1-4945-A939-ECC0D2C0536B} respecteviley. 

This occured repeatedly when the job was done and the item was added. So I decided to create a PSE script to remove duplicated items. 

The Solution

The solution was the following: create a powershell script that removes the duplicates from the root path of the items that the platform inserted.  First let me explain how the code works in different parts and at the end I will give you the whole code in order for you to just copy and paste in a powershell extension interface in Sitecore. 

# This function creates an item info object for the given item
function CreateItemInfo($item) {
    return @{
        ID       = $item.ID
        Updated  = [DateTime]::ParseExact($item.Fields["__Updated"], "yyyyMMdd'T'HHmmss'Z'", $null)
        Path     = $item.Paths.Path
    }
}
This pretty simple and straightforward, it creates an iteminformation item with ID, Updated and Path fields that is going to be used later on. 
 # This function recursively finds duplicates based on the specified field name
function Get-Duplicates($item, $fieldName) {
    $exist = Test-Path -Path $item.Paths.Path

    if ($exist) {
        $children = Get-ChildItem -Path $item.Paths.Path
        $duplicates = $false

        foreach ($child in $children) {
            $irId = $child.Fields[$fieldName]

            if ($hash.ContainsKey($irId)) {
                $duplicates = $true
                $itemInfo = CreateItemInfo($child)
                $hash[$irId][$itemInfo.ID] = $itemInfo
            } else {
                $hash.Add($irId, (New-Object "System.Collections.Generic.Dictionary``2[System.String,PSObject]"))
                $hash[$irId][$child.ID] = CreateItemInfo($child)
            }

            $childResult = Get-Duplicates $child $fieldName

            if ($childResult -eq "PASS") {
                return $childResult
            }
        }

        if ($duplicates) {
            Write-Host "Removing old duplicates"
            RemoveOldDuplicates $hash
            return "Duplication Deletion Complete"
        } else {
            return "No duplicates found"
        }
    }
} 
This is getting recursively all the child items from the root path. Note that you pass in the $item and $fieldName, this is because you can compare the items with any field if you wish. But for purposes of discussing this solution we will refer the fieldname as "Id" since it was the one that was causing issues. 
 function RemoveOldDuplicates($hash) {
    $totalItems = $hash.Count
    $processedItems = 0

    foreach ($key in $hash.Keys) {
        $newestItem = $null

        foreach ($itemInfo in $hash[$key].Values) {
            if ($null -eq $newestItem -or $itemInfo.Updated -gt $newestItem.Updated) {
                $newestItem = $itemInfo
            }
        }

        foreach ($itemInfo in $hash[$key].Values) {
            if ($itemInfo.ID -ne $newestItem.ID) {
                Write-Host "Removing item id: $($itemInfo.ID) because it is an older duplicate"
                Remove-Item -Path $itemInfo.Path
            }
        }

        $processedItems++
        $percentComplete = ($processedItems / $totalItems) * 100
        Write-Progress -Activity "Removing items" -Status "$processedItems of $totalItems removed" -PercentComplete $percentComplete
    }
}  

And here is where all the magic occurs, this script will keep the latest updated item if it finds any duplicates of the fieldname "Id". It does that by using the OOTB field from Sitecore "__Updated". That is the only limitation to this script, but I think many people will find it helpful and useful. I am also using a hash table so its performance its not affected by nested foreach loops that would make the script take forever, believe me, been there. 

Now, as promised here is the complete code:
$totalItems = 0
$processedItems = 0

# This function finds and removes duplicate items in the given path
function FindAndRemoveDuplicateItems($rootPath, $fieldName) {
    $hash = New-Object "System.Collections.Generic.Dictionary``2[System.String,System.Collections.Generic.Dictionary``2[System.String,PSObject]]"
    $root = Get-Item -Path $rootPath
    $status = Get-Duplicates $root $fieldName

    return $status + " for $rootPath"
}

# This function creates an item info object for the given item
function CreateItemInfo($item) {
    return @{
        ID       = $item.ID
        Updated  = [DateTime]::ParseExact($item.Fields["__Updated"], "yyyyMMdd'T'HHmmss'Z'", $null)
        Path     = $item.Paths.Path
    }
}

# This function recursively finds duplicates based on the specified field name
function Get-Duplicates($item, $fieldName) {
    $exist = Test-Path -Path $item.Paths.Path

    if ($exist) {
        $children = Get-ChildItem -Path $item.Paths.Path
        $duplicates = $false

        foreach ($child in $children) {
            $irId = $child.Fields[$fieldName]

            if ($hash.ContainsKey($irId)) {
                $duplicates = $true
                $itemInfo = CreateItemInfo($child)
                $hash[$irId][$itemInfo.ID] = $itemInfo
            } else {
                $hash.Add($irId, (New-Object "System.Collections.Generic.Dictionary``2[System.String,PSObject]"))
                $hash[$irId][$child.ID] = CreateItemInfo($child)
            }

            $childResult = Get-Duplicates $child $fieldName

            if ($childResult -eq "PASS") {
                return $childResult
            }
        }

        if ($duplicates) {
            Write-Host "Removing old duplicates"
            RemoveOldDuplicates $hash
            return "Duplication Deletion Complete"
        } else {
            return "No duplicates found"
        }
    }
}

# This function removes old duplicates from the items stored in the hash
function RemoveOldDuplicates($hash) {
    $totalItems = $hash.Count
    $processedItems = 0

    foreach ($key in $hash.Keys) {
        $newestItem = $null

        foreach ($itemInfo in $hash[$key].Values) {
            if ($null -eq $newestItem -or $itemInfo.Updated -gt $newestItem.Updated) {
                $newestItem = $itemInfo
            }
        }

        foreach ($itemInfo in $hash[$key].Values) {
            if ($itemInfo.ID -ne $newestItem.ID) {
                Write-Host "Removing item id: $($itemInfo.ID) because it is an older duplicate"
                Remove-Item -Path $itemInfo.Path
            }
        }

        $processedItems++
        $percentComplete = ($processedItems / $totalItems) * 100
        Write-Progress -Activity "Removing items" -Status "$processedItems of $totalItems removed" -PercentComplete $percentComplete
    }
}

# Use this function to find and remove duplicate items by passing the root path and the field to compare by
FindAndRemoveDuplicateItems "/sitecore/content/restofrootpath/duplicatetree" "Id"

To use this script you can modify line 87  arguments to the root path of all the duplicates, and field that is duplicated, in this case is "Id" but you can change to any other field you might think it is suited for your needs.

I hope this was helpful and happy coding!!! 😊😊😊 


Comments

Popular posts from this blog

Guide to Fixing the 'SSC API Key Required' Error while Testing GraphQL Queries from Sitecore 10.3 Using Postman

Simulating Success: How to Test iPhone Apps in Xcode Simulator with Sitecore 10.3 Docker Containers

Troubleshooting Guide: Resolving the 'Cannot Query Field "id" on "ItemSearchResults"' Issue in a JSS App with Sitecore 10.3"