Skip to content Skip to sidebar Skip to footer

Find The Common Occurrences Of Words In Two String Values

Suppose I have two strings which may look like below var tester = 'hello I have to ask you a doubt'; var case = 'hello better explain me the doubt'; This case both strings conta

Solution 1:

You can use a first regular expression as a tokenizer to split the tester string into a list of words, then use such words to build a second regular expression that matches the word list. For example:

var tester = "a string with a lot of words";

functiongetMeRepeatedWordsDetails ( sentence ) {
  sentence = sentence + " ";
  var regex = /[^\s]+/g;
  var regex2 = newRegExp ( "(" + tester.match ( regex ).join ( "|" ) + ")\\W", "g" );
  matches = sentence.match ( regex2 );
  var words = {};
  for ( var i = 0; i < matches.length; i++ ) {
    var match = matches [ i ].replace ( /\W/g, "" );
    var w = words [ match ];
    if ( ! w )
      words [ match ] = 1;
    else
      words [ match ]++;
  }   
  return words;
} 

console.log ( getMeRepeatedWordsDetails ( "another string with some words" ) );

The tokenizer is the line:

var regex = /[^\s]+/g;

When you do:

tester.match ( regex )

you get the list of words contained in tester:

[ "a", "string", "with", "a", "lot", "of", "words" ]

With such an array we build a second regular expression that matches all the words; regex2 has the form:

/(a|string|with|a|lot|of|words)\W/g

The \W is added to match only whole words, otherwise the a element will match any word beginning with a. The result of applying regex2 to sentence is another array with only the words that are contained in regex2, that is the words that are contained both in tester and sentence. Then the for loop only counts the words in the matches array transforming it into the object you requested.

But beware that:

  • you have to put at least a space at the end of sentence otherwise the \W in regex2 doesn't match the last word: sentence = sentence + " "
  • you have to remove some possible extra character form the matches that has been captured by the \W: match = matches [ i ].replace ( /\W/g, "" )

Post a Comment for "Find The Common Occurrences Of Words In Two String Values"