Data structures Q & A


A very beautiful question ----

Given an unsorted array of size n. Array elements are in range from 1 to n. One number from set {1, 2, …n} is missing and one number occurs twice in array. Find these two numbers.

  1. Traverse through the array and use element at position 'i'. Say val = arr[i]
  2. Change value of arr[val] to some negative value say -999.
  3. If you encounter -999 again then val is the number that is repeated.
  4. Traverse the array again and the index at which value is not -999, that number is missing.

Complexity: O(n)


How to find two strings are anagrams of each other in the most optimal way?


Traverse the array.
While traversing, use absolute value of every element as index and make the value at this index as negative to mark it visited.
If something is already marked negative then this is the repeating element.
To find missing, traverse the array again and look for a positive value.
I will post the code for the same soon…


Assuming that strings contain only English characters.

  1. Maintain an array of 26 elements (or 52 if you are considering both upper case and lower case) arr[26]. Initialize this array to 0. Its indices represent alphabets. 0 -> 'a', 1 -> 'b', 2 -> 'c' ...
  2. Traverse through first string and increment array element corresponding to the character occurring in the string.
  3. Traverse through second string and decrease the array element value by 1 corresponding to the character occurring in second string.
  4. Traverse the array. If strings are anagrams then all values must be 0.

Space Complexity - O(1)
Time Complexity - O(n)


Create a vector with the 26 possible letters. Traverse first string adding 1 to the positions of each letter in the vector.

Then traverse the 2nd string substracting 1 from the vector for each letter in the string.
(if you find a zero in the vector then you can return false as you know that for the current letter the 2nd string has more occurrences than the 1st one)

Finally sum all the elements in the vector, if the sum is 0 then you have

  1. Take a newarray of size N. Initialize every element of this array to 0.
  2. for(i = 1; i <= N; i++)
    newarray[i] = newarray[i] + 1;
  3. find j in newarray, such that, newarray[j] > 1. Find k in newarray, such that, newarray[k] = 0.
  4. j is the element which is repeated twice & k is the element which is missing.


int number_needed(string a, string b) {
auto count = 0;
vector freq(26, 0);
for (auto c : a) { ++freq[c - ‘a’]; }
for (auto c : b) { --freq[c - ‘a’]; }
for (auto val : freq) { count += abs(val); }
return count;

freq is the vector created. It is declared a size of 26 values, all set to 0.
The first two (for) loops go through a string, char by char and use the freq vector to keep track of each char’s occurrence.
++freq is increasing the value of the vector at the index [c minus ‘a’]. What you are seeing is ASCII interpretation for the index. Since ‘a’ is 97, if c was ‘a’, then the index would be [0] because 97-97. Same continues for ‘b’ which would be 98-97 indicating an index of [1].
cool_shark makes good use of one vector by using --freq to account for char matches between the two strings. Note that this is good programming practice instead of using nested loops which would have given a runtime of O(n^2) instead of its current linear runtime.


This Question can be solved in a more efficient way in just one scan.
Here is the logic:

Let ‘A’ be the repeating number and ‘B’ is the missing number.

Sum(Actual) = Sum(1...N) + A - B

Sum(Actual) - Sum(1...N) = A - B. 

Sum(Actual Squares) = Sum(1^2 ... N^2) + A^2 - B^2

Sum(Actual Squares) - Sum(1^2 ... N^2) = (A - B)(A + B) 

= (Sum(Actual) - Sum(1...N)) ( A + B)

Sum(1…N) = N*(N+1)/2
Sum(Actual) = Sum of all elements of the array
Sum(Actual Squares) = Sum of all square of the elements of the array

Now just calculate ‘A’ & ‘B’ from the above equations.