File/program/modules/search/search_view.php

Description

/program/modules/search/search_view.php - interface to the view-part of the search module

This file defines the interface with the search-module for viewing content. The interface consists of this function:

    search_view(&$output,$area_id,$node_id,$module)

This function is called from /index.php when the node to display is connected to this module.

Functions
search (line 251)

perform an actual search

At this point we have collected the following parameters:

  • string $q: the question, not empty
  • int $r: # of results per page
  • int $o: offset relative to the very first result
  • string $t: a comma-delimited string hinting at hits per module OR empty
What needs to be done is to query the relevant search function from one or more modules and present them to the visitor via $theme.

Note: array $qwords contains 1 or more arrays with the original keyword (in $qwords[i][0]), the lowercase version (in $qwords[i][1]) and also the quoted version ready for 'LIKE' in a where-clause (in $qwords[i][2]). This may come in handy when preparing the search context lateron.

The original query is also available, in $q. This is used to construct Previous and Next links below the search results overview.

The user requests $r results. However, we try to find $r+1 results. If that succeeds, we know that after the current $r results there will be at least 1 more result (on the next page). Also this further fills the cache with hints (in parameter $t) for the next iteration.

  • return: written to $theme
output search (object &$theme, array $config, array $qwords, string $q, int $r, int $o, string $t)
  • object &$theme: collects the (html) output
  • array $config: search configuration data
  • array $qwords: list of 1 or more keywords to search for (original AND lowercase each)
  • string $q: original unprocessed query
  • int $r: results per page
  • int $o: offset
  • string $t: comma separated list with hints about # of hits in various modules
search_areas (line 750)

search through area titles

  • return: on error, TRUE on succes + results and hits updated (maybe)
FALSE search_areas (array &$hits, int &$results, array $qwords, int $limit, int $offset, array $config)
  • array &$hits: collects the requested search results if any
  • int &$results: collects the # of hits in this module
  • array $qwords: holds 1 or more keywords to search for (original, utf8lower, quoted)
  • int $limit: maximum number of hits to return
  • int $offset: starting point within this module for returning results
  • array $config: search configuration data (contains scope and area_id)
search_construct_nodeslist (line 593)

construct a helper table with a list of node_id's that are to be searched

prepare a temporary table 'search_nodes' with all the nodes that are to be searched. Nodes that are NOT to be searched are those under embaro, expired or those in areas where we are not allowed to look.

This is quite a complex routine, mainly because MySQL won't let me use the same temporary table twice in the same SQL statement. The work-around is to have more than on temporary table to build a list of searchable nodes.

The strategy is as follows.

  1. Determine the areas selected to be searched based on $config
The search module is configured to either
 - search the current (single) area (config['scope'] = 0)
 - search all public areas (config['scope'] = 1)
 - search all areas where the current user has access (config['scope'] = 2)

Note that if the current user is not logged in, the latter two yield the same result. The area selection is kept in $where which looks like

 - ((a.is_active) AND (a.area_id=AA))
 - ((a.is_active) AND (a.is_private=FALSE))
 - ((a.is_active) AND ((a.is_private=FALSE) OR (a.area_id=A) OR (area_id=B) OR ...))

2. Select all embargoed/expired nodes in the selected areas (into new temp table '$accu')

This gives a list of otherwise valid nodes that happen to be embargoed/expired and therefore are not to be searched. If this list is empty we can proceed with step 6 below. Note that the temp table '$accu' is always created, even if it is empty.

3. Select all the childeren from the nodes selected in the previous step into new temp table '$work'

The effect of embargo/expiry effects all descendants of the embargoed/expired nodes. We have to work our way through all offspring of the embargoed/expired nodes to find all the nodes that are not to be searched. If the list of this generation of descendants is empty we can proceed with step 6 below.

4. Add the generation in temp table 2 ('$work') to the family in temp table 1 ('$accu')

This is the workaround with two temp tables. The new generation in $work is added to the list of ancestors in $accu and $work is cleared for the next round.

5. Select the next generation in $work

If there is a next generation it is stored in the existing table $work and we continue with step 4 (again). If there are no new generations at this time, we fall through to step 6.

This loop 4+5 eventually leaves all embargoed/expired nodes and all their descendants in temp table 1 '$accu'. Table temp table '$work' is no longer required at that point.

6. Invert the selection of nodes currently in temp table 1 '$accu' taking areas into account

This selects all the node_id's in the selected areas (see step 1) that are NOT in temp table 1 '$accu' into temp table 3 ('$nodes').

7. Return the number of records in temp_table3 '$nodes' as the # of searchable nodes.

  • return: on error, otherwis # of nodes + temporary table 'search_nodes' created
FALSE search_construct_nodeslist (array $config)
  • array $config: search module configuration data
search_context (line 947)

return a ready-to-use context with highlights

search $content for keywords in $qwords and return highlighted snippets (if any).

Strategy is as follows:

  1. prepare the $content: make lowercase, strip HTML, etc. if the string is now shorter than the $window we're done: return the highlighted string.
2. iterate through $words and the string, rember the offset and length of any matches in array hits[]. If there are no hits we're done: return an empty array.

3. step through $hits and attempt to combine $hits within a $window into a single run of hits in $runs[] (storing starting and ending offsets).

4. step through $runs and extract the specified substrings from the string, adding about half a window context before and about half a window context after the run. Store the highlighted strings in an array and return that array.

  • return: with 0 or more highlighted lowercase context snippets
  • usedby: htmlpage_search()
array search_context (array $qwords, string $content, [int $window = 100], [int $maxruns = 5], [int $maxhits = 10])
  • array $qwords: holds 1 or more keywords to search for (original, utf8lower, quoted)
  • string $content: original content to search
  • int $window: arbitrary base for size of context snippets
  • int $maxruns: arbitrary limit for number of runs to return
  • int $maxhits: arbitrary limit for # of hits/qword
search_get_places (line 474)

construct an ordered list of places to search perhaps with hints about # of hits

the list of places to search starts with areas and nodes: these are special cases and these places are searched first. After that we give priority to modules htmlpage, newsletter and althing. After that the order is chosen so that the 'dynamic' pages (periodic newsletter, active weblog) are searched first. All current modules (April 2016) are added to the list in a particular order. Additional modules will be added automatically at the end of the list. This could differ between W@S installations, but the order is the same between searches in the same installation of W@S. We do set results to 0 for all modules; when a module of that name exists the results will be set to -1 indicating 'unknonwn'. This way a non-existing module will not yield errors.

The parameter $hints contains the search results from a previous search. This is basically a comma delimited list of integers where the first integer is the # of hits for the areas search, the second one for the nodes search, the third one for the search in htmlpage, etc. This saves trips to the database if the user moves forward in the results list. We do have to trust the user submitted data though. OTOH: if the numbers in $hints are too high, the search will simply fail and no harm is done.

Note: 1-on-1 means: one module record is linked to a node, 1-on-N means that more than 1 results could emanate from the same node (eg. different issues of the newsletter, or different posts in the althing). NOP indicates that that module always performs No OPeration: always 0 results and never any hits.

  • return: FALSE on error, array with places to search otherwise
bool|array search_get_places (string $hints)
  • string $hints: comma delimited list of hits per module
search_get_qwords (line 725)

construct an array with keywords to look for

construct an array with 0, 1 or more keywords to look for in the database we prepare different versions: the original casing, the lowercase variant and a version prepared for the LIKE-operator. The second comes in handy when we are highlighting the context in the results list. The third one is handy when performing database queries.

Note: we KISS and do not take quoted strings into account (for now). Note that earlier we already stripped HTML-tags from $q. (see search_view_dialog_validate()).

  • todo: better parse of keywords taking quotes into account too.
array search_get_qwords (string $q)
  • string $q: holds a list of words to search for
search_highlight (line 1072)

highlight words in a lowercase search context snippet

highlight keywords in $snippet by sandwiching them between '' and '' we first create a list of (unique) words ordered by the lenght and subsequently use that list for matching. This list is based on the utf8lower variant of the keyword.

  • return: with highlights inserted
  • todo: what to do with overlapping qwords, eg. "foo" and "foobar"? result depends on search/replace execution order: either "foobar" or "<b>foobar</b>
  • todo: is there a chance that our static cache $q fails if somehow a different $qwords is used on subsequent calls? Mmmmm...
string search_highlight (array $qwords, string $snippet)
  • array $qwords: holds 1 or more keywords to search for (original, utf8lower, quoted)
  • string $snippet: lowercase version of a context snippet
search_module (line 418)

kickstarter for calling modules' search function (after loading the source files)

The interface with the modules' search functions is fully documented in htmlpage_search() in htmlpage_search.php.

  • return: FALSE on failure, TRUE on success and $hits, $results updated (maybe)
bool|array search_module (array &$hits, int &$results, array $qwords, int $limit, int $offset, array $module)
  • array &$hits: receives search results (at most $limit)
  • int &$results: receives number of hits (negative value on entry means (re-)calculate total)
  • array $qwords: holds 1 or more keywords to search for (original, utf8lower, quoted)
  • int $limit: maximum number of hits to return (could be 0)
  • int $offset: starting point within this module for returning results
  • array $module: holds module information (name, search_script, module_id, etc.)
search_navbar (line 356)

construct a navigation bar with 'previous' 1 2 3 4 ... 'next'

construct a DIV with 'previous' 1 2 3 4 ... 'next' links to navigate the search results If there is less than 1 page worth of results no navigation bar is necessary (an empty array is returned). If we're on the first page, the 'previous' link is text (not a link), if we are on the last page the 'next' link is text (not a link).

The page numbers 1 2 3 4 ... form a sliding window within all pages; width is determined by the global pagination setting (default 7).

  • return: ready to use HTML lines
array search_navbar (int $count, int $r, int $o, array $params)
  • int $count: total number of hits counted sofar (could be grand total)
  • int $r: results per page
  • int $o: offset
  • array $params: basic parameters for the URLs to generate
search_nodes (line 842)

search through node titles and link_texts

there are a few possibilities

1 ($results  < 0) && ($limit <= 0): calc $results, return
2 ($results  < 0) && ($limit >  0): calc $results, retrieve $hits, return
3 ($results == 0) && ($limit <= 0): return
4 ($results == 0) && ($limit >  0): return
5 ($results  > 0) && ($limit <= 0): return
6 ($results  > 0) && ($limit >  0): retrieve selected $hits, return

Cases 3, 4 and 5 are easy: we simply trust the caller's cached $results and we do not have to return any $hits.

Case 1 can be done via counting the number of matches, but not retrieving them, e.g. sql == SELECT count(node_id) AS n FROM $table WHERE field LIKE '%expression%';

Case 2 is a little more difficult because we need to retrieve the records AND calculate the total number of hits. The latter is the # of records in the result set and $limit could be (substantially) less than that.

Case 6 is a matter of selecting a $limit hits, starting at offset. No calculations necessary and thus relatively easy.

In order to keep this code simple we combine cases 1 and 2 (partly) and cases 6 and 2. In other words: if necessary we calculate the # of records (case 1, case 2) if necessary we snatch $limit records at $offset (case 2, case 6) This is two trips to the database in case 2, but only on the initial search. (the subsquent runs use the cached # of hits via the $hints)

  • return: on error, TRUE on succes + results and hits updated (maybe)
FALSE search_nodes (array &$hits, int &$results, array $qwords, int $limit, int $offset, array $config)
  • array &$hits: collects the requested search results if any
  • int &$results: collects the # of hits in this module
  • array $qwords: holds 1 or more keywords to search for (original, utf8lower, quoted)
  • int $limit: maximum number of hits to return
  • int $offset: starting point within this module for returning results
  • array $config: search configuration data (contains scope and area_id)
search_show_form (line 197)

display the search form

  • return: output written to $theme
void search_show_form (object &$theme, array $config, array $dialogdef)
  • object &$theme: collects the (html) output
  • array $config: search configuration data
  • array $dialogdef: array that defines the input fields
search_simple (line 1107)

perform a simple linear search in a module table

link tables search_nodes s and nodes n to $table m, using $node as the link field. retrieves hits in $hits, using either "m.$ttl" or "n.title" as title. keys in $fields are plain fieldnames, values are prefixed with 'm.' preventing $DB ambiguities

  • return: FALSE on failure, TRUE on success and $hits, $results updated (maybe)
bool|array search_simple (array &$hits, int &$results, array $qwords, int $limit, int $offset, string $table, array $fields, string $where, string $order, string $node, string $ttl)
  • array &$hits: receives search results (at most $limit)
  • int &$results: receives number of hits (negative value on entry means (re-)calculate total)
  • array $qwords: holds 1 or more keywords to search for (original, utf8lower, quoted)
  • int $limit: maximum number of hits to return (could be 0)
  • int $offset: starting point within this module for returning results
  • string $table: module table linked 1 : 1 to nodes
  • array $fields: data to retrieve, key="$fieldname", value="m.$fieldname"
  • string $where: full where-clause without the word WHERE
  • string $order: sort order
  • string $node: name of field to link to nodes.node_id 1:1
  • string $ttl: name of field to use as search result title
search_view (line 47)

display the content of the search linked to node $node_id

  • return: TRUE on success + output via $theme, FALSE otherwise
  • todo: should we use the token routines here too?
bool search_view (object &$theme, int $area_id, int $node_id, array $module)
  • object &$theme: collects the (html) output
  • int $area_id: identifies the area where $node_id lives
  • int $node_id: the node to which this module is connected
  • array $module: the module record straight from the database
search_view_dialog_validate (line 166)

validate the data entered by the visitor

  • return: TRUE if valid, else FALSE + messages added to dialogdef
bool search_view_dialog_validate (object &$dialogdef)
  • object &$dialogdef: defines the dialog and will hold POSTed values
search_view_get_config (line 121)

retrieve all configuration data for this search

this retrieves all configuration data for this search. Here is a quick reminder of the structure in $config.

'node_id'
'header'
'introduction'
'scope'
'results'

As a bonus we also add the current area $area_id to the resulting array.

  • return: configuration data in an array or FALSE on error
bool|array search_view_get_config (int $node_id, int $area_id)
  • int $node_id: identifies the page
  • int $area_id: identifies the current area
search_view_get_dialogdef (line 143)

construct a dialog definition for the visitor's search form

this defines the searchform, basically a single text input and a [Search] button.

  • return: datadefinition
array search_view_get_dialogdef ()
search_where (line 902)

construct a where-clause to look for keywords in 1 or more fields

Possible outcomes for fields f,f1,f2 and qwords q,q1,q2:

  • (f LIKE %q%)
  • ((f LIKE %q1%) AND (f LIKE %q2%))
  • ((f1 LIKE %q%) OR (f2 LIKE %q%))
  • (((f1 LIKE %q1%) AND (f1 LIKE %q2%)) OR ((f2 LIKE %q1%) AND (f2 LIKE %q2%)))

  • return: ready to use where-clause (including quoting etc.)
  • usedby: htmlpage_search()
string search_where (array $qwords, array $fields)
  • array $qwords: holds 1 or more keywords to search for (original, utf8lower, quoted)
  • array $fields: holds 1 or more fieldnames

Documentation generated on Tue, 28 Jun 2016 19:11:48 +0200 by phpDocumentor 1.4.0