Terminalfour Search

Used to add Terminalfour Search to a page for site search results.

Terminalfour SearchScreenshot showing example of Terminalfour Search content type.

Content Type Details

  • ID: 326
  • Name: Terminalfour Search
  • Minimum user level: Administrator
  • Compatible with page layouts: Full Width

Content Type Elements Details

NameDescriptionSizeTypeRequiredConditionally Shown
NameThe Name element80 CharactersPlain TextYesNo
TitleAdd the title, this is the main H1 heading on the page80 CharactersPlain TextYesNo
No Results ContentAdd content that is displayed when no results are returned. See WYSIWYG content options.1000 CharactersHTMLYesNo

Overview

Terminalfour Search is powered by three distinct components:

  1. Terminalfour Search Crawler: Responsible for visiting pages and fetching content.
  2. Site Search Dashboard: Responsible for the user interface, ranking, and display logic.
  3. Terminalfour Search content type: Add the Search Results page to the website.

User Interface & Implementation

The Site Search is accessed via the magnifying glass icon in the website header. Clicking this reveals an overlay with a search input.

  • Front-end Code: Located in the Handlebars Partial hccHeaderHTML.
  • Live Example: See the Site Search page.

Data Collection (The Crawler)

The Terminalfour Search Crawler is configured to crawl the website daily to fetch metadata and content.

  • Configuration: Managed via the houston-cc-main configuration, which includes URL settings, exclusions, and metadata mappings.
  • Indexing Logic: The crawler must be able to "find" a page to index it. If a page is hidden from navigation and has no inbound links, the crawler will not discover it.
  • Robots.txt: The robots.txt file added by the Robots File content type is configured to allow the crawler access. For example:
    User-agent: terminalfour-nutch-spider
    Allow: /
    Crawl-delay: 0.5

Metadata

The crawler picks up metadata from pages. For a full list of how fields are assigned, see metadata.

Automatic Data Mapping

The crawler automatically collects and pushes the following data fields to Site Search, which you will see in the Site Search Dashboard:

Field Description
host Host name of the URL
url The full page URL
Id The unique identifier (URL)
content The content extracted from the body of the page
tstamp Timestamp of when the URL was last fetched
urlDepth How many clicks deep the page is from the root
url_keywords Keywords extracted from the URL string

Content Exclusion & Visibility

There are several ways to prevent a page from being indexed or appearing in search results, depending on your access level:

Method Effect Access Level
'Remove from crawl' element Check the 'Remove from crawl' element on a sections 'General' tab, this adds a robots noindex  meta tag. Page is not indexed, but links on the page are still followed. Moderator
'Hide from search' element Check the 'Remove from crawl' element on a sections 'General' tab, this adds a sectionDisplay false meta tag. The page is crawled, but hidden from the search results interface. Moderator
Metadata Tab 'Robots' element Set a custom robots meta tag e.g., noindex, nofollow, would prevent indexing and prevent the crawler from following links. Moderator
Robots.txt Disallow rules Add disallow rules for the crawler to the robots.txt file using the Robots File content type to prevent the crawler from visiting specific areas/folders entirely e.g., Disallow: /component-library/* Administrator
URL Filter Regex Add exclusion rules directly in the Crawler settings. Crawler Access

Search Management & Customization

The functionality of search is managed within the  Site Search Dashboard. This is where settings and behavior are fully customizable and fine-tuned.

  • Searchable Fields: Define which metadata fields are indexed for searching and which are returned to be displayed in the results list
  • Ranking: Adjust weighted ranking criteria e.g., to give more weight to specific metadata fields
  • Search Rules: Create custom logic to improve results:
    • Promotions: Pin specific internal or external pages to the top of results for certain keywords.
    • Synonyms: Group related terms (e.g., "Student" and "Learner") so they return the same results.
    • Stop Words: Define common words to be ignored by the search engine to improve accuracy.
  • Facets (Filters): Manage the sidebar filters that allow users to narrow down results. There is currently a single facet for Type, which is mapped from the sectionType meta tag.