File/program/lib/filelib.php

Description

/program/lib/filelib.php - utilities for manipulating files

Functions
get_mediatype (line 353)

extract the mediatype and -subtype from a full mimetype

this extracts the mediatype and -subtype from a full mimetype, i.e. 'text/plain' from 'text/plain; charset=US-ASCII' (see also get_mimetype() and RFC2616). If $mimetype doesn't look like a mimetype, we return FALSE.

  • return: FALSE on invalid mimetype, otherwise the extracted mediatype and -subtype in lowercase
bool|string get_mediatype (string $mimetype)
  • string $mimetype: the full mimetype to examine, possibly with parameters
get_mimetype (line 290)

determine the mimetype of a file

This routine tries to discover the mimetype of a file. First we try to determine the mimetype via the fileinfo extension. If that doesn't work, we try the deprecated mime_content_type() function. If that doesn't work, we try to shell out to file(1). If that doesn't work, we resort to "guessing" the mimetype based on the extension of the file or else we return the generic 'application/octet-stream'.

Note that in step 3 we shell out and try to execute the file(1) command. The results are checked against a pattern to assert that we are really dealing with a mime type. The pattern is described in RFC2616 (see sections 3.7 and 2.2):

     media-type     = type "/" subtype *( ";" parameter )
     type           = token
     subtype        = token
     token          = 1*<any CHAR except CTLs or separators>
     separators     = "(" | ")" | "<" | ">" | "@"
                    | "," | ";" | ":" | "\" | <">
                    | "/" | "[" | "]" | "?" | "="
                    | "{" | "}" | SP | HT
     CHAR           = <any US-ASCII character (octets 0 - 127)>
     CTL            = <any US-ASCII control character
                      (octets 0 - 31) and DEL (127)>
     SP             = <US-ASCII SP, space (32)>
     HT             = <US-ASCII HT, horizontal-tab (9)>
     <">            = <US-ASCII double-quote mark (34)>

This description means we should look for two tokens containing letters a-z or A-Z, digits 0-9 and these special characters: ! # $ % & ' * + - . ^ _ ` | or ~. That's it.

Note that file(1) may return a mime type with additional parameters. e.g. 'text/plain; charset=US-ASCII'. This fits the pattern, because it starts with a token, a slash and another token.

The optional parameter $name is used to determine the mimetype based on the extension (as a last resort), even when the current name of the file is meaningless, e.g. when uploading a file, the name of the file (from $_FILES['file0']['tmp_name']) is something like '/tmp/php4r5dwfw', even though $_FILES['file0']['name'] might read 'S6301234.JPG'. If $name is not specified (i.e. is empty), we construct it from $path.

  • return: mimetype of the file $path
  • todo: there is room for improvement here: the code in step 1 and step 2 is largely untested
  • usedby: send_file_from_datadir()
string get_mimetype (string $path, [string $name = ''])
  • string $path: fully qualified path to the file to test
  • string $name: name of the file, possibly different from $path
get_mimetypes_array (line 73)

return an array with mimetypes keyed by file extension

This routine returns an array with 'known' combinations of (lower case) file extensions and (lowercase) mime types. This array can be used in two ways.

Example 1: find a mimetype by extension

$mimetypes = get_mimetypes_array();
$mimetype = $mimetypes['jpg']; // this should yield 'image/jpeg'

Example 2: find an exension by mimetype

$mimetypes = get_mimetypes_array();
$extension = array_search('image/jpeg',$mimetypes); // this should yield 'jpg'

Note that in that last example the first matching element is used. This implies that the most common extension for a certain mimetype should come first in the array, i.e. 'jpg'=>'image/jpeg' should come before 'jpeg'=>'image/jpeg'.

The list below is based on the list of mime types as distributed with the Apache webserver software.

Changes and tweaks to the list below:

application/octet-stream: default extension is an empty string '' application/postscript: default extension is ps audion/mpeg: default extension is mp3 image/jpeg: default extension is jpg text/plain: default extension is txt video//quicktime: default extension is mov

NOTE
Please do not change the mapping for both the empty extension '' and the binary extension 'bin'; these extensions must map to 'application/octet-stream' because this is necessary to defeat tricks with uploading files with double extensions (as used in the File Manager).

  • return: with (lowercase) mimetypes keyed by (lowercase) extension
array get_mimetypes_array ()

Documentation generated on Wed, 11 May 2011 23:45:05 +0200 by phpDocumentor 1.4.0