Fast Algorithm To Find Unique Items in JavaScript Array

September 6, 2009

Fast Algorithm To Find Unique Items in JavaScript Array

When I had the requirement to remove duplicate items from a very large array, I found out that the classic method to be not optimised as it took a pretty long time than desired. So, I devised this new algorithm that can sort a large array in a fraction of the original time.

The fastest method to find unique items in array

This method is kind of cheeky in its implementation. It uses the JavaScript’s object to add every item in the array as key. As we all know, objects accepts only unique keys and sure we did capitalise on that.

Array.prototype.unique = function() {
    var o = {}, i, l = this.length, r = [];
    for(i=0; i<l;i+=1) o[this[i]] = this[i];
    for(i in o) r.push(o[i]);
    return r;
};

Some Thoughts On This Algorithm

This is somewhat classified as “Hash Sieving” method and can also be related to a somewhat modified “Hash Sorting Algorithm” where every item in the array is a hash value and a hash function inserts item into a bucket, replacing existing values in case of hash collision. As such, this can be applied to any programming language for faster sieving of very large arrays.

This algorithm has a linear time complexity of O(2n) in worst case scenario. This is way better than what we will observe for the classic method as described below.

About the classic method

The classic (and most popular) method of finding unique items in an array runs two loops in a nested order to compare each element with rest of the elements. Consequently, the time complexity of the classic method to find the unique items in an array is around quadratic O(n²).

This is not a good thing when you have to find unique items within array of 10,000 items.

Array.prototype.unique = function() {
    var a = [], l = this.length;
    for(var i=0; i<l; i++) {
        for(var j=i+1; j<l; j++)
            if (this[i] === this[j]) j = ++i;
        a.push(this[i]);
    }
    return a;
};

Comparing the above two algorithms

Test Data: An array of elements having N random integers.

Sample (N)	Average Case		Best Case
	Classic	New	Classic	New
50	0.43	0.25	0.01	0.02
100	0.60	0.30	0.09	0.16
500	9.57	0.87	0.1	0.2
1000	24.44	1.51	0.21	0.31
5000	584.28	7.74	0.4	1.0
10000	2360.90	15.03	0.7	1.8

Conclusion

This method of finding unique items within an array seems to be particularly useful for large arrays that are tending towards the real-life situations. When there are more items in an array that are similar, there is not much of a difference in performance and in fact, the classic algorithm scores better by a small margin. However, as the array gets more random, the runtime of the classic algorithm increases manifold.

38 responses

Andy L

September 6, 2009 at 5:53 pm

You work wonders at times. This is such a simple trick and yet so effective. Perhaps no one thinks about performance and perfection of algorithms as much as you do!
By the way, in line 3 of your Hash Seiving algorithm, why did you do o[this[i]] = this[i];?

LikeLike

Reply
1. Shamasis Bhattacharya
  
  September 6, 2009 at 6:03 pm
  
  Do not flatter me that much! 😛
  
  o[this[i]] = this[i]; preserves the data-type of the items within the JavScript array. This is because JavaScript object keys are always string and we would not want to needlessly convert a numeric array to string array! By the way, if you are not bothered about the data-type of the unique array, then you can use a modified version of the algorithm that always returns string data-type and is faster due to lesser overhead.
  
  Array.prototype.strUnique = function() { var o = {}, i, l = this.length, r = []; for(i=0; i<l;i++) o[this[i]] = null; for(i in o) r.push(i); return r; };
  
  LikeLike
  
  Reply
JavascriptBank

September 8, 2009 at 6:45 am

Very cool & good tip, thank you very much for sharing.
Can I share this snippet on my http://www.javascriptbank.com/

Awaiting your response. Thanks

LikeLike

Reply
1. Shamasis Bhattacharya
  
  September 10, 2009 at 12:48 am
  
  Sure. Sharing is caring! Care back for me by retaining my link and attribution. 🙂
  
  LikeLike
  
  Reply
Kevin N

November 10, 2009 at 4:06 am

I’m trying to use this (the modified string data-type only function) inside an embedded js tool (i believe it uses rhino) and I’m having difficulty. Instead of removing the duplicates I want to append 1, 2, 3…n at the end of the duplicative strings (space then integer so Kevin,Kevin becomes Kevin,Kevin 1). I’m new to js in general and not sure i’m creating the array correctly – I may ask some stupid questions.
var urlarray = new Array(URLName.getString());
should that work? – as I understand it from there I can call this function using the array?

LikeLike

Reply
1. Shamasis Bhattacharya
  
  February 18, 2011 at 6:54 pm
  
  Kevin,
  
  For your use of prefixing the duplicates within a JavaScript Array with number, the following code would be useful:
  Array.prototype.markDuplicates = function() { var o = {}, i, l = this.length, r = []; for(i=0; i<l; i += 1) { if(o[this[i]] >= 0) { o[this[i]] += 1; r.push(this[i] + ' ' + o[this[i]]); } else { o[this[i]] = 0; r.push(this[i]); } } return r; };
  
  // Usage would look like var myArr = ['Kevin', 'Kevin', 'Shamasis', 'Kevin'], newArr = myArr.markDuplicates();
  
  LikeLike
  
  Reply
Nilton

November 16, 2009 at 3:11 am

reverse for loops are faster for spidermonkey.
As for Kevin’s question:
Array.prototype.toUnique = function() {
var o = {}, i, l = this.length, r = []; n = []; modified=0;
for(i=this.length-1; i>=0;–i){
if(n[this[i]]){
modified=1;
o[this[i]+” “+ n[this[i]]] = this[i]+ n[this[i]]++;
}else{
o[this[i]] = this[i];n[this[i]]=1
}
}
if(!modified)return this;
for(i in o) r.push(o[i]);
return r;
};

LikeLike

Reply
1. Shamasis Bhattacharya
  
  February 18, 2011 at 6:58 pm
  
  I did not want to do a reverse so as to maintain the original order.
  
  LikeLike
  
  Reply
  1. Relic
    
    October 22, 2011 at 10:27 pm
    
    Then ya just need to use .unshift() instead of .push()…. problem solved!
    
    LikeLike
  2. Shamasis Bhattacharya
    
    February 10, 2012 at 9:29 am
    It was actually very stupid on my part to think in this direction! The fastest algo actually does not take into account array order. We could simply do the negative looping here.
    
    Array.prototype.unique = function () { var o = {}, i = this.length, r = []; while (i -= 1) o[this[i]] = this[i]; for (i in o) r.push(o[i]); delete o; return r; };
    
    LikeLike
  3. Neumonicom
    
    February 10, 2012 at 10:27 am
    
    That will only work for values that are object literals. It will not work for objects, as their internal .toString() method will convert them to something like :[object Object]. Thus if you have multiple objects with the same constructor you will continually overwrite the last one.
    
    LikeLike
  4. Neumonicom
    
    February 10, 2012 at 10:28 am
    
    Sorry, I meant primitives instead of object literals in the first line. ‘a 1 0 2’.
    
    LikeLike
  5. Neumonicom
    
    February 10, 2012 at 10:37 am
    
    This is probably the fastest and most reliable way to create a unique array whose members can be of any object. The only downside is that it will sort the results. Perhaps there is a workaround?
    
    Array.prototype.unique = function(){
    this.sort(function(a,b){
    if(a===b)return 0;
    return 1;
    
    });
    var length = this.length;
    while(length–)if(this[length] === this[length-1])this.splice(length,1);
    return this;
    }
    
    LikeLike
  6. Shamasis Bhattacharya
    
    February 10, 2012 at 10:40 am
    
    This will fail on Google Chrome on arrays with many items. The WebKit insertion sort is not stable for equality sort. 😦
    
    LikeLike
Joshua Kalis

March 18, 2011 at 11:12 pm

I don’t like augmenting the default types much, so I just made a function.

function (ar) {
var f = {},
i = 0,
l = ar.length,
r = [];
while (i < l) {
!f[ar[i]] && r.push(ar[i]);
f[ar[i++]] = 1;
}
return r;
};

I have it as a method on a properly name-spaced object and use like this:

var result = Utils.array.unique(array_name);

LikeLike

Reply
Trav

July 11, 2011 at 6:47 am

Hi,

warning… js novice.

I have a js array with 50,000 objects in it. I need to find unique object fields in the array.

eg.

var data = [{name: ‘tom’,age:12},{name:’sam’,age:13},{name:’tom’,age:20}];

so i need to find unique base on object field like:

var return = data.unique(‘name’);

so return would now have:

return = [{name: ‘tom’},{name:’sam’}];

I have this working using jquery.inArray but it is very slow with large data sets.

Can you help me?

LikeLike

Reply
Nemoniccom

July 27, 2011 at 6:42 am

I’m not so sure this is the best method. What about doing the following:

Array.prototype.unique = function(){
Var Len = this.length,
Elems = [],
Elemslen = 0′
Table ={},
I;
For(I=0;I+<Len;I++){
If(!Table[I]){
Table[I] = 1;
This[Elemslen++] = This[I];

}
}

This.length = Elemslen;
}

LikeLike

Reply
1. Neumoniccom
  
  July 27, 2011 at 9:38 am
  
  Sorry, I tried to type this originally on an Ipad. What a pain! The code should be as follows:
  
  Array.prototype.unique = function(){
  var len = this.length,
  elems = [],
  elemsLen = 0,
  table ={},
  i;
  for(i=0;i<len;i++){
  if(!table[this[i]]){
  table[this[i]] = 1;
  this[elemsLen++] = this[i];
  }
  }
  
  this.length = elemsLen;
  return this;
  }
  
  LikeLike
  
  Reply
Izayoi400

August 1, 2011 at 5:54 pm

what i use:

var arr1=[1,2,3,2,5,1,2,1,2,3,6,1,2,1,2,3]
var blah=””for(each1 in arr1) { if(blah.search(arr1[each1]) == -1 ) { blah=blah+arr1[each1] } }
alert(blah)var newblah = blah.split(“”)alert(newblah)

LikeLike

Reply
Izayoi400

August 1, 2011 at 5:56 pm

sorry hopefully this will be formatted better…

var arr1=[1,2,3,2,5,1,2,1,2,3,6,1,2,1,2,3]var blah=””for(each1 in arr1){if(blah.search(arr1[each1]) == -1 ){blah=blah+arr1[each1]}}alert(blah)var newblah = blah.split(“”)alert(newblah)

LikeLike

Reply
1. Neumoniccom
  
  August 1, 2011 at 10:33 pm
  
  This isn’t a very good method. You should never use the for in loop for arrays. That loop should only be used on objects where the number of properties / methods isn’t known ( idea taken from Nicolas C. Zakas’ Book “High Performance Javascript” p 62 – 63). You decrease performance using the for-in loop because you have to access all the properties / members of the object and it’s prototype chain.
  
  Also note that using multiple object methods (IE split , search, etc.) will lead to performance degradation.This is what I was originally trying to post. For string and numeric values I believe it is the most straight forward:By using the below prototype method you can call unique on any array object like this:var a = [4,5,4,3,45,3,12,1,234,5,4,3,2,1];a.unique();Here is the prototype method for the Array object:Array.prototype.unique = function(){ var len = this.length, elems = [], elemsLen = 0, table ={}, i; for(i=0;i<len;i++){ if(!table[this[i]]){ table[this[i]] = 1; this[elemsLen++] = this[i]; } } this.length = elemsLen; return this;}
  
  LikeLike
  
  Reply
Fast Algorithm To Find Unique Items in JavaScript Array | Robin Jakobsson

August 19, 2011 at 4:18 pm

[…] Fast Algorithm To Find Unique Items in JavaScript Array. via @LeaVerou This entry was posted in Web development and tagged algorithm, javascript by Robin. Bookmark the permalink. /* […]

LikeLike

Reply
vikasrao

August 19, 2011 at 2:26 pm

I had the same need and came up with this solution a while ago, good to know it actually has a name 🙂 : http://vikasrao.wordpress.com/2011/06/09/removing-duplicates-from-a-javascript-object-array/

LikeLike

Reply
Msitchen

September 28, 2011 at 3:52 pm

This should be done with a hash table for O(n)

LikeLike

Reply
Gregor

October 7, 2011 at 9:37 pm

Thanks for teaching me the term “Hash Sieving” I’ve been using this trick for 15 years but didn’t know it had a name.

LikeLike

Reply
1. Shamasis Bhattacharya
  
  October 10, 2011 at 12:20 am
  
  You are right, we have used this technique for centuries now! However, for JS it is performance enhancing unlike many other languages. The overhead of dictionaries and other similar structures in many languages are comparatively higher.
  
  Agree?
  
  LikeLike
  
  Reply
Brandon Benvie

October 28, 2011 at 9:56 am

Array.prototype.unique = function(){
return this.filter(function(s, i, a){ return i == a.lastIndexOf(s); });
}

You loop through the array at most an average of 50% of the length of the array per item, when there’s zero doubles. The more items filtered the more efficient it is. Usually doesn’t beat the hash method but it has its place. It wins on style points though cause sometimes looking pretty is better than being smart.

LikeLike

Reply
1. Shamasis Bhattacharya
  
  February 10, 2012 at 9:04 am
  
  Yeah! I always tend to have a fascination towards the “coolKid” variants. Sadly, the overhead of an in-loop function call scares me.
  
  LikeLike
  
  Reply
Aurelio Jargas

February 10, 2012 at 4:43 am
Thanks for the post!

Just note that the two codes you posted are not 100% similar. The fastest version will consider 1 and “1” equal, removing one of them. The classic keeps both in output.
```
[1,2,3,"1","4"].unique_fastest()

["1", 2, 3, "4"]

[1,2,3,"1","4"].unique_classic()

[1, 2, 3, "1", "4"]
```
LikeLike
Reply
1. Shamasis Bhattacharya
  
  February 10, 2012 at 9:02 am
  
  You are correct in noting that. 🙂 In fact, to retain data-type, it becomes a bit more complex where we can create a hash for each type. Then again, we will loose the original array item order. To retain the order, the amount of computation required will defeat the whole purpose of this method! 😦
  
  LikeLike
  
  Reply
2. Neumonicom
  
  February 10, 2012 at 10:52 am
  
  Consider this variation as well. This will preserve any data type without sorting the results. Try running some tests on this to see if its faster. One thing worth noting here is that native methods have been optimized, so avoiding as much conditional logic and loops as possible is a good thing:
  
  Array.prototype.unique = function(){
  var len = this.length,
  a = [],
  item,
  i=0;
  
  for(;i<len;){
  item = this[i++];
  if(a.indexOf(item) === -1)a.push(item);
  
  }
  return a;
  }
  
  LikeLike
  
  Reply
Fast Algorithm To Find Unique Items in JavaScript Array | x443

July 6, 2012 at 3:23 am

[…] Fast Algorithm To Find Unique Items in JavaScript Array. […]

LikeLike

Reply
JavaScript unique method for array prototype | t1u

August 21, 2012 at 12:27 am

[…] This is quite common, which is plainly needlessly complex. I also found this, as an alternative, other than the poster being rather smug about his fast algorithm I still feel there is room for […]

LikeLike

Reply
skrat

October 3, 2012 at 2:06 pm

Nice, problem with all unique implementations I’ve seen is that they rely on toString instead of allowing a custom comparison function.

LikeLike

Reply
1. Shamasis Bhattacharya
  
  October 3, 2012 at 6:41 pm
  
  That’s true. The provision of delegating a comparator function would have been super (acknowledging the fact for the call overhead).
  
  LikeLike
  
  Reply
  1. skrat
    
    October 3, 2012 at 6:45 pm
    
    My implementation for completeness, I wonder how the performance compares. http://pastie.org/4902499
    
    LikeLike
  2. Shamasis Bhattacharya
    
    October 3, 2012 at 7:11 pm
    
    Even before performance, I assume that you are not putting cross-browser limitations as a concern. Right?
    
    LikeLike
  3. skrat
    
    October 3, 2012 at 7:14 pm
    
    That’s right, I don’t care for IE < 9, in other words, the way to handle this is shims.
    
    LikeLike