вторник, 4 декабря 2012 г.

Working with files in JavaScript

Working with files in JavaScript, Part 1: The Basics

Many years ago, I was asked during a job interview at Google what changes I would make to the web in order to provide better experiences. At the top of my list was having some way to work with files other than the <input type="file"> control. Even as the rest of the web was evolving, the way we dealt with files never changed since it was first introduced. Thankfully, with HTML5 and related APIs, we now have far more options for working with files than ever before in the latest versions of desktop browsers (iOS still has no support for the File API).

The File type

The File type is defined in the File API[1] specification and is an abstract representation of a file. Each instance of File has several properties:
  • name – the filename
  • size – the size of the file in bytes
  • type – the MIME type for the file
A File object basically gives you essential information about the file without providing direct access to the file contents. That’s important because reading from files requires disk access, and depending on the size of the file, that process has the potential to take a significant amount of time. A File object is just a reference to a file, and getting data from that file is a separate process altogether.

Getting File references

Of course, access to user files is strictly forbidden on the web because it’s a very obvious security issue. You wouldn’t want to load up a web page and then have it scan your hard drive and figure out what’s there. You need permission from the user in order to access files from their computer. There’s no need for messy permission windows, however, because users grant permission for web pages to read files all the time when they decide to upload something.
When you use a <input type="file"> control, you’re giving the web page (and the server) permission to access that file. So it makes sense that the first place you can retrieve File objects is through a <input type="file"> control.
HTML5 defines a files property for all <input type="file"> controls. This collection is a FileList, which is an array-like structure called FileList containing File objects for each selected file in the control (remember, HTML5 allows multiple file selection in these controls). So at any point in time, you can get access to the files a user has selected using code similar to this:
<input type="file" id="your-files" multiple>
<script>
var control = document.getElementById("your-files");
control.addEventListener("change", function(event) {

    // When the control has changed, there are new files

    var i = 0,
        files = control.files,
        len = files.length;

    for (; i < len; i++) {
        console.log("Filename: " + files[i].name);
        console.log("Type: " + files[i].type);
        console.log("Size: " + files[i].size + " bytes");
    }

}, false);
</script>
This relatively simple code listens for the change event on the file control. When the event fires, it signifies that the file selection has changed, and the code iterates through each File object and outputs its information. Keep in mind that the files property is always accessible from JavaScript, so you don’t have to wait for change to try to read it.

Drag and drop files

Accessing files from form controls still requires the form control and the associated user action of browsing to find the files of interest. Fortunately, HTML5 Drag and Drop[2] provides another way for users to grant access to their files: by simply dragging a file from the desktop into the web browser. All you have to do to take advantage is listen for two events.
In order to read files that are dropped onto an area of the page, you must listen for the dragover and drop events and cancel the default action of both. Doing so tells the browser that you are handling the action directly and it shouldn’t, for example, open an image file.
<div id="your-files"></div>
<script>
var target = document.getElementById("your-files");

target.addEventListener("dragover", function(event) {
    event.preventDefault();
}, false);

target.addEventListener("drop", function(event) {

    // cancel default actions
    event.preventDefault();

    var i = 0,
        files = event.dataTransfer.files,
        len = files.length;

    for (; i < len; i++) {
        console.log("Filename: " + files[i].name);
        console.log("Type: " + files[i].type);
        console.log("Size: " + files[i].size + " bytes");
    }

}, false);
</script>
The event.dataTransfer.files is another FileList object that you can access to get file information. The code is almost exactly the same as using the file form control and the File objects can be accessed in the same way.

Ajax file upload

Once you have a reference to the file, you’re able to do something that’s pretty cool: upload a file via Ajax. This is all possible due to the FormData object, which is defined in XMLHttpRequest Level 2[3]. This object represents an HTML form and allows you to add key-value pairs to be submitted to the server via the append() method:
var form = new FormData();
form.append("name", "Nicholas");
The great thing about the FormData object is that you can add a file directly to it, effectively mimicking a file upload by HTML form. All you have to do is add the File reference with a specific name, and the browser does the rest. For example:
// create a form with a couple of values
var form = new FormData();
form.append("name", "Nicholas");
form.append("photo", control.files[0]);

// send via XHR - look ma, no headers being set!
var xhr = new XMLHttpRequest();
xhr.onload = function() {
    console.log("Upload complete.");
};
xhr.open("post", "/entrypoint", true);
xhr.send(form);
Once the FormData object is passed into send(), the proper HTTP headers are automatically set for you. You don’t have to worry about setting the correct form encoding when using files, so the server gets to act as if a regular HTML form has been submitted, reading file data from the “photo” key and text data from the “name” key. This gives you the freedom to write processing code on the backend that can easily work with both traditional HTML forms and Ajax forms of this nature.
And all of this works in the most recent version of every browser, including Internet Explorer 10.

Up next

You now know the two methods of accessing File information in the browser: through a file upload control and through native drag and drop. There will likely be other ways to access files in the future, but for now, these are the two you need to know. Of course, reading information about files is just part of the problem. The next step is read data from those files, and that’s where part 2 will pick up.

References

  1. File API specification (editor’s draft)
  2. HTML5 Drag and Drop
  3. XMLHttpRequest Level 2

Working with files in JavaScript, Part 2: FileReader

In my previous post, I introduced using files in JavaScript, focusing specifically on how to get access to File objects. These objects contain file metadata obtained only when the user opts to either upload a file or drags and drops a file onto the web page. Once you have files, however, the next step is to read data from them.

The FileReader type

The FileReader type has a single job: to read data from a file and store it in a JavaScript variable. The API is intentionally designed to be similar to XMLHttpRequest since both are loading data from an external (outside of the browser) resource. The read is done asynchronously so as not to block the browser.
There are several formats that a FileReader can create to represent the file data, and the format must be requested when asking the file to be read. Reading is done through calling one of these methods:
  • readAsText() – returns the file contents as plain text
  • readAsBinaryString() – returns the file contents as a string of encoded binary data (deprecated – use readAsArrayBuffer() instead)
  • readAsArrayBuffer() – returns the file contents as an ArrayBuffer (good for binary data such as images)
  • readAsDataURL() – returns the file contents as a data URL
Each of these methods initiates a file read similar to the XHR object’s send() method initiating an HTTP request. As such, you must listen for the load event before starting to read. The result of the read is always represented by event.target.result. For example:
var reader = new FileReader();
reader.onload = function(event) {
    var contents = event.target.result;
    console.log("File contents: " + contents);
};

reader.onerror = function(event) {
    console.error("File could not be read! Code " + event.target.error.code);
};

reader.readAsText(file);
This example simply reads the contents of a file and outputs it in plain text to the console. The onload handler is called when the file is successfully read whereas the onerror handler is called if the file wasn’t read for some reason. The FileReader instance is available inside of the event handler via event.target and it’s recommended to use that instead of referencing the reader variable directly. The result property contains the file contents on success and error contains error information about the failed operation.

Reading data URIs

You can use the same basic setup for reading to a data URI. Data URIs (sometimes called data URLs) are an interesting option if you want to, for example, display an image that was just read from disk. You could do so with the following code:
var reader = new FileReader();
reader.onload = function(event) {
    var dataUri = event.target.result,
        img     = document.createElement("img");

    img.src = dataUri;
    document.body.appendChild(img);
};

reader.onerror = function(event) {
    console.error("File could not be read! Code " + event.target.error.code);
};

reader.readAsDataURL(file);
This code simply inserts an image that was read from disk into a page. Since the data URI contains all of the image data, it can be passed directly into the src attribute of an image and displayed on the page. You could, alternately, load the image and draw it onto a <canvas> as well:
var reader = new FileReader();
reader.onload = function(event) {
    var dataUri = event.target.result,
        context = document.getElementById("mycanvas").getContext("2d"),
        img     = new Image();
 
    // wait until the image has been fully processed
    img.onload = function() {
        context.drawImage(img, 100, 100);
    };
    img.src = dataUri;
};

reader.onerror = function(event) {
    console.error("File could not be read! Code " + event.target.error.code);
};

reader.readAsDataURL(file);
This code loads the image data into a new Image object and then uses that to draw the image onto a canvas (specifying both the width and height as 100).
Data URIs are generally used for this purpose, but can be used on any type of the file. The most common use case for reading a file into a data URI is to display the file contents on a web page immediately.

Reading ArrayBuffers

The ArrayBuffer type[1] was first introduced as part of WebGL. An ArrayBuffer represents a finite number of bytes that may be used to store numbers of any size. The way data is read from an ArrayBuffer is by using a specific view, such as Int8Array, which treats the underlying bytes as a collection of 8-bit signed integers or Float32Array, which treats the underlying bytes as a collection of 32-bit floating point numbers. These are called typed arrays[2], which force you to work with a specific numeric type rather than containing any type of data (as with traditional arrays).
You use an ArrayBuffer primarily when dealing with binary files, to have more fine-grained control over the data. It’s beyond the scope of this post to explain all the ins and outs of ArrayBuffer, just realize that you can read a file into an ArrayBuffer pretty easily if you need it. You can pass an ArrayBuffer directly into an XHR object’s send() method to send the raw data to the server (you’ll have to read this data from the request on the server to reconstruct the file), so long as your browser fully supports XMLHttpRequest Level 2[3] (most recent browsers, including Internet Explorer 10 and Opera 12).

Up next

Reading data from a file using a FileReader is pretty simple. If you know how to use XMLHttpRequest, there’s no reason you can’t also be reading data from files. In the next part of this series, you’ll learn more about using the FileReader events and understanding more about possible errors.

References

  1. ArrayBuffer
  2. Typed Array Specification
  3. XMLHttpRequest Level 2

Working with files in JavaScript, Part 3: Progress events and errors

The FileReader object is used to read data from files that are made accessible through the browser. In my previous post, you learned how to use a FileReader object to easily read data from a file in a variety of formats. The FileReader is very similar to XMLHttpRequest in many ways.

Progress events

Progress events are becoming so common that they’re actually written up in a separate specification[1]. These events are designed to generically indicate the progress of data transfers. Such transfers occur when requesting data from the server, but also when requesting data from disk, which is what FileReader does.
There are six progress events:
  • loadstart – indicates that the process of loading data has begun. This event always fires first.
  • progress – fires multiple times as data is being loaded, giving access to intermediate data.
  • error – fires when loading has failed.
  • abort – fires when data loading has been canceled by calling abort() (available on both XMLHttpRequest and FileReader).
  • load – fires only when all data has been successfully read.
  • loadend – fires when the object has finished transferring data. Always fires and will always fire after error, abort, or load.
Two events, error and load, were discussed in my previous post. The other events give you more fine-grained control over data transfers.

Tracking progress

When you want to track progress of a file reader, use the progress event. The event object for this event contains three properties to monitor the data being transferred:
  • lengthComputable – a boolean indicating if the browser can determine the complete size of the data.
  • loaded – the number of bytes that have been read already.
  • total – the total number of bytes to be read.
The intent of this data is to allow for progress bars to be generated using the information from the progress event. For example, you may be using an HTML5 <progress> element to monitor the progress of reading a file. You can tie the progress value to the actual data using code like this:
var reader = new FileReader(),
     progressNode = document.getElementById("my-progress");

reader.onprogress = function(event) {
    if (event.lengthComputable) {
        progressNode.max = event.total;
        progressNode.value = event.loaded;
    }
};

reader.onloadend = function(event) {
    var contents = event.target.result,
        error    = event.target.error;
 
    if (error != null) {
        console.error("File could not be read! Code " + error.code);
    } else {
        progressNode.max = 1;
        progressNode.value = 1;
        console.log("Contents: " + contents);
    }
};

reader.readAsText(file);
This is similar to the approach that Gmail uses for its drag and drop file upload implementation, where you see a progressbar immediately after dropping a file onto the email. That progressbar indicates how much of the files has been transferred to the server.

Dealing with errors

Even though you’re reading a local file, it’s still possible for the read to fail. The File API specification[2] defines four types of errors:
  • NotFoundError – the file can’t be found.
  • SecurityError – something about the file or the read is dangerous. The browser has some leeway as to when this occurs, but generally if the file is dangerous to load into the browser or the browser has been performing too many reads, you’ll see this error.
  • NotReadableError – the file exists but can’t be read, most likely due to a permissions problem.
  • EncodingError – primarily when trying to read as a data URI and the length of the resulting data URI is beyond the maximum length supported by the browser.
When an error occurs during a file read, the FileReader object’s error property is assigned to be an instance of one of the above mentioned errors. At least, that’s how the spec is written. In reality, browsers implement this as a FileError object that has a code property indicating the type of error that has occurred. Each error type is represented by a numeric constant value:
  • FileError.NOT_FOUND_ERR for file not found errors.
  • FileError.SECURITY_ERR for security errors.
  • FileError.NOT_READABLE_ERR for not readable errors.
  • FileError.ENCODING_ERR for encoding errors.
  • FileError.ABORT_ERR when abort() is called while there is no read in progress.
You can test for the type of error either during the error event or during loadend:
var reader = new FileReader();

reader.onloadend = function(event) {
    var contents = event.target.result,
        error    = event.target.error;
 
    if (error != null) {
        switch (error.code) {
            case error.ENCODING_ERR:
                console.error("Encoding error!");
                break;

            case error.NOT_FOUND_ERR:
                console.error("File not found!");
                break;

            case error.NOT_READABLE_ERR:
                console.error("File could not be read!");
                break;

            case error.SECURITY_ERR:
                console.error("Security issue with file!");
                break;

            default:
                console.error("I have no idea what's wrong!");
        }
    } else {
        progressNode.max = 1;
        progressNode.value = 1;
        console.log("Contents: " + contents);
    }
};

reader.readAsText(file);

Up next

The FileReader object is a fully-featured object with a lot of functionality and a lot of similarities to XMLHttpRequest. By following these last three posts, you should now be able to read data from files using JavaScript and send that data back to the server if necessary. However, the File API ecosystem is quite a bit larger than has been already discussed in this series, and in the next part you’ll learn about a powerful new features designed to work with files.

References

  1. Progress Events
  2. File API

Working with files in JavaScript, Part 4: Object URLs

Up to this point in the blog series, you’ve learned how to use files in the traditional way. You can upload files to the server and you can read file data from disk. These all represent the most common ways of dealing with files. However, there is a completely new way to deal with files that has the capacity to simplify some common tasks. This new way is to use object URLs.

What is an object URL?

Object URLs are URLs that point to files on disk. Suppose, for example, that you want to display an image from the user’s system on a web page. The server never needs to know about the file, so there’s no need to upload it. You just want to load the file into a page. You could, as shown in the previous posts, get a reference to a File object, read the data into a data URI, and then assign the data URI to an <img> element. But think of all the waste: the image already exists on disk, why read the image into another format in order to use it? If you create an object URL, you could assign that to the <img> and access that local file directly.

How does it work?

The File API[1] defines a global object called URL that has two methods. The first is createObjectURL(), which accepts a reference to a File and returns an object URL. This instructs the browser to create and manage a URL to the local file. The second method is revokeObjectURL(), which instructs the browser to destroy the URL that is passed into it, effectively freeing up memory. Of course, all object URLs are revoked once the web page is unloaded, but it’s good to free them up when they’re no longer needed anyway.
Support for the URL object isn’t as good as for other parts of the File API. As of the time of my writing, Internet Explorer 10+ and Firefox 9+ support a global URL object. Chrome supports it in the form of webkitURL while Safari and Opera have no support.

Example

So how would you display an image from disk without reading the data first? Suppose that you’ve given the user a way to select a file and now have a reference to it in a variable called file. You can then use the following:
var URL = window.URL || window.webkitURL,
    imageUrl,
    image;

if (URL) {
    imageUrl = URL.createObjectURL(file);
    image = document.createElement("img");

    image.onload = function() {
        URL.revokeObjectURL(imageUrl);
    };
    
    image.src = imageUrl;
    document.body.appendChild(image);
}
This example creates a local URL variable that normalizes the browser implementations. Assuming that URL is supported, the code goes on to create an object URL directly from file and stores it in imageUrl. A new <img> element is created and given an onload event handler that revokes the object URL (more on that in a minute). Then, the src property is assigned to the object URL and the element is added to the page (you may want to use an already-existing image).
Why revoke the object URL once the image is loaded? After the image is loaded, the URL is no longer needed unless you intend to reuse it with another element. In this example, the image is being loaded into a single element, and once the image has been completely loaded, the URL isn’t serving any useful purpose. That’s the perfect time to free up any memory associated with it.

Security and other considerations

At first glance, this capability is a bit scary. You’re actually loading a file directly from the user’s machine via a URL. There are, of course, security implications to this capability. The URL itself isn’t a big security issue because it’s a URL that’s assigned dynamically by the browser and would be useless on any other computer. What about cross-origin?
The File API disallows using object URLs on different origins. When an object URL is created, it is tied to the origin of the page in which the JavaScript executed, so you can’t use an object URL from www.wrox.com on a page at p2p.wrox.com (an error occurs). However, two pages from www.wrox.com, where one is embedded in the other with an iframe, are capable of sharing object URLs.
Object URLs exist only so long as the document that created them. When the document is unloaded, all object URLs are revoked. So, it doesn’t make sense to store object URLs in client-side data storage to use later; they are useless after the page has been unloaded.
You can use object URLs anywhere the browser would make a GET request, which includes images, scripts, web workers, style sheets, audio, and video. You can never use an object URL when the browser would perform a POST, such as within a <form> whose method is set to “post”.

Up next

The ability to create URLs that link directly to local files is a powerful one. Instead of needing to read a local file into JavaScript in order to display it on a page, you can simply create a URL and point the page to it. This process greatly simplifies the use case of including local files in a page. However, the fun of working with files in JavaScript has only just begun. In the next post, you’ll learn some interesting ways to work with file data.

References

  1. File API

Working with files in JavaScript, Part 5: Blobs

Up to this point, this series of posts has focused on interacting with files specified by the user and accessed via File objects. The File object is actually a more specific version of a Blob, which represents a chunk of binary data. The size and type properties exist on Blob objects and are inherited by File.
In most cases, Blobs and Files can be used in the same places. For example, you can read from a Blob using a FileReader and you can create an object URL from a Blob using URL.createObjectURL().

Slicing

One of the interesting things you can do with Blobs (and therefore, also Files) is to create a new Blob based on a subsection of another one. Since each Blob just represents pointers to data rather than the data itself, you can quickly create new Blob objects pointing to subparts of others. This is accomplished by using the slice() method.
You may be familiar with slice() on strings and arrays, and the one for Blobs behaves in a similar manner. The method accepts three arguments: the offset of the starting byte, the offset of the ending byte, and an optional MIME type to apply to the Blob. If the MIME type isn’t specified, the new Blob has the same MIME type as the original one.
Browser support for slice() isn’t yet ubiquitous, with Firefox supporting it via mozSlice() and webkitSlice() in Chrome (no other browsers support this method currently). Here’s an example:
function sliceBlob(blob, start, end, type) {

    type = type || blob.type;

    if (blob.mozSlice) {
        return blob.mozSlice(start, end, type);
    } else if (blob.webkitSlice) {
        return blob.webkitSlice(start, end type);
    } else {
        throw new Error("This doesn't work!");
    }
}
You can then use this function to, for example, split up a large file to upload it in chunks. Each new Blob being produced is independent from the original even though the data each references has an overlap. The engineers at Flickr use blob slicing to read the Exif information from photos that are uploaded[1] rather than waiting to it on the server. When the file is selected, the Flickr upload page simultaneously starts to upload the file as well as read the Exif information from the photo. This allows them to give a preview of the extracted metadata in the page as the file is being uploaded.

Creating Blobs the old way

Very soon after File objects started appearing in browsers, developers realized that Blob objects were actually quite powerful and so wanted to be able to create them without user interaction. After all, any data can be represented in a Blob, it doesn’t necessarily have to be tied to a file. Browsers quickly responded by creating BlobBuilder, a type whose sole purpose is to wrap some data in a Blob object. This is a non-standard type and has been implemented in Firefox (as MozBlobBuilder), Internet Explorer 10 (as MSBlobBuilder), and Chrome (as WebKitBlobBuilder).
The BlobBuilder works by creating a new instance and calling the append() method with a string, ArrayBuffer, or Blob. Once all of the data has been added, you call getBlob() and pass in an optional MIME type that should be applied to Blob. Here’s an example:
var builder = new BlobBuilder();
builder.append("Hello world!");
var blob = builder.getBlob("text/plain");
The ability to create URLs for arbitrary pieces of data is incredibly powerful, allowing you to dynamically create objects that can be addressed as files in the browser. You could, for example, use a Blob to create a web worker without having a separate file for the worker code. This technique was written up in The Basics of Web Workers[2]:
// Prefixed in Webkit, Chrome 12, and FF6: window.WebKitBlobBuilder, window.MozBlobBuilder
var bb = new BlobBuilder();
bb.append("onmessage = function(e) { postMessage('msg from worker'); }");

// Obtain a blob URL reference to our worker 'file'.
// Note: window.webkitURL.createObjectURL() in Chrome 10+.
var blobURL = window.URL.createObjectURL(bb.getBlob());

var worker = new Worker(blobURL);
worker.onmessage = function(e) {
  // e.data == 'msg from worker'
};
worker.postMessage(); // Start the worker.
This code creates a simple script and then creates an object URL. The object URL is assigned to a web worker in place of a script URL.
You can call append() as many times as you like, building up the contents of the Blob.

Creating Blobs the new way

Because developers kept clamoring for a way to create Blob objects directly, and browsers coming up with BlobBuilder, it was decided to add a Blob constructor. This constructor is now part of the specification and will be the way that Blob objects are created in the future.
The constructor accepts two arguments. The first is an array of parts to combine into a Blob. These would be the same values as passed into the append() method of BlobBuilder and can be any number of strings, Blobs, and ArrayBuffers. The second argument is an object containing properties for the newly-created Blob. There are currently two properties defined, type, which specifies the MIME type of the Blob, and endings, which can be either “transparent” (default) or “native”. Here’s an example:
var blob = new Blob(["Hello world!"], { type: "text/plain" });
As you can see, this is much simpler than using BlobBuilder.
The Blob constructor is currently in the nightly builds of Chrome and will be in Firefox 13. Other browsers have not yet announced plans to implement this constructor, however, it is now part of the File API[3] standard and is expected to be implemented universally.

Conclusion

This is the last part of the series on working with files in JavaScript. As I hope you learned, the File API is incredibly powerful and opens up entirely new ways of working with files in web applications. You no longer need to stick with plain file upload boxes when users need to upload files, and now that you can read the files in the client, that opens up all sorts of possibilities for client-side manipulation. You could resize an image that’s too large before uploading (using FileReader and <canvas>); you could create a text editor that works purely in the browser; you could split up large files to upload piece by piece. The possibilities aren’t quite endless, but are pretty damn close.

References

  1. Parsing Exif client-side using JavaScript by Flickr Team
  2. The Basics of Web Workers by Eric Bidelman
  3. File API – Blob Constructor

JavaScript Регулярные выражения