Case study of content comparison across multiple sites

  • Requirement – A federal government agency with a large departmental site and an extensive portfolio of smaller sites, the result of a machinery of government changes, was charged by senior management with identifying duplication and overlap of content and reducing the number of individual sites.
  • Solution – A Codifynd compendium made up of all the sites within the departmental portfolio was generated.  Initial prototyping based on a cluster of thematically consistent content nodes in a single site was undertaken.  Computational linguistic methods were applied to compare the prototype cluster with the entire data set to compute node divergence and content similarity.  Through graphing node proximity based on content similarity the department was able to identify overlapping pages across their portfolio of sites and develop recommendations for merger and deprecation of redundant content.