The Power of the Dark Side

First let’s cut to the chase: in-room far-field frequency response measured at the listening position using 1/3 octave warble tones, measured with a Rode NT1-A mic, corrected for mic response

InRoomFreqResp

  • The red line is what you hear – near perfection!
  • The solid blue line is with room treatments, but without EQ
  • The dotted blue line is without room treatment

In short, you can see that room treatment (huge tube traps and copious use of thick RPG acoustic foam) made a huge difference. Then EQ finessed that to something near perfection.

Aside: this FR curve makes me wonder why people often say Magnepans don’t have good bass. Mine are near-flat to 32 Hz (and you can hear 25 Hz) with a level of taughtness, speed and clarity that few conventional speakers can match. A subwoofer can go lower, which is great for movies and explosions, but most lack the accuracy and refinement needed for serious music listening.

Now, for the details:

I’ve been an audiophile since my late teen years, long before my income could support the habit. As an engineer and amateur musician I always approached this hobby from a unique perspective. The musician knows what the absolute reference really sounds like – live musicians playing acoustic instruments in the room. The engineer believes objectivity – measurements, blind listening tests, etc. – is the best way to get as close as possible to that sound.

Part of this perspective is being a purist, and one aspect of being a purist is hating equalizers. In most cases, EQ falls into one of 2 categories:

  1. There are flaws in the sound caused by the speakers or room interactions, and instead of fixing them you use EQ as a band-aid. This flattens the response but leaves you with distortions in the phase or time domain, like ringing.
  2. You don’t want to hear what live acoustic music really sounds like, you prefer a euphonically distorted sound and use an EQ to get it.

Equalizers are the dark side of audio. Powerful and seductive, yet in the end they take you away from your goal: experiencing music as close as possible to the real thing. Recently I traveled to the dark side and found it’s not such a bad place. Share my journey, if you dare.

I had my audio room here in Seattle dialed in nicely after building big tube traps, thick acoustic foam and careful room arrangement based on repeated measurements. However, it still had two minor issues:

  1. A slight edge to the midrange. From personal experience I describe it as the sound I hear rehearsing on stage with the musicians, rather than being in the 2nd row of the audience.
  2. The deepest bass was a bit thin, with 30 Hz about -6 dB. I have a harp recording where Heidi Krutzen plays the longest strings, which have a fundamental around 25 Hz. I could hear this in my room, but it was a subtle whisper. It would be nice to hear that closer to a natural level.

My room treatments made a huge improvement in sound (and I have the measurements to prove it). But I don’t know of any room treatment that can fix either of these issues. The sound was very good both objectively (+/- 4 dB from 35 Hz to 20 kHz at listener position) and subjectively, and I enjoyed it for years. Then I got the LCD-2 headphones and Oppo HA-1 DAC. As I listened to my music collection over the next year (a couple thousand discs, takes a while), I discovered a subtle new dimension of natural realism in the music and wanted to experience that in the room.

Since my upstream system was entirely digital, equalization might not be as terrible as any right-thinking purist audiophile would fear. I could equalize entirely in the digital domain, no DA or AD conversion, before the signal reaches the DAC. And since the anomalies I wanted to correct were small, I could use parametric EQ with gradual slope, virtually eliminating any audible side effects.

That was the idea … now I had to come up with an action plan.

After a bit of Googling I found a candidate device: the Behringer DEQ2496. Price was the same on B&H, Adorama and Amazon, and all have a 30 day trial, so I bought one. The DEQ2496 does a lot of things and is complex to use and easy to accidentally “break”. For example, when I first ran the RTA function, it didn’t work. First, the pink noise it generates never played on my speakers. After I fixed that, the microphone I plugged in didn’t work. After I fixed that, the GEQ (graphic equalizer) settings it made were all maxed out (+ / – 15 dB). Finally I fixed that and it worked. All of these problems were caused by config settings in other menu areas. There are many config settings and they affect the various functions in ways that make sense once you understand it, but are not obvious.

NOTE: one easy way around this is before using any function for the first time, restore the system default settings, saved as the first preset. This won’t fix all of the config settings; you’ll still have to tweak them to get functions to work. But it will reduce the amount of settings you’ll have to chase down.

In RTA (room tune acoustic?) mode, the DEQ2496 is fully automatic. It generates a pink noise signal, listens to it on a microphone you set up in the room, analyzes the response and creates an EQ curve to make the measured response “flat”. You can then save this GEQ curve in memory. You have two options for flat: Truly flat measured in absolute terms, or the 1 dB / octave reduction from bass to treble that Toole & Olive recommend (-9 dB overall across the  band). This feature is really cool but has 2 key limitations:

  1. It has no built-in way to compensate for mic response. You can do this manually by entering the mic’s response curve as your custom target response curve, but that is tedious.
  2. It provides only 15 V phantom power to your mic. Most studio condenser mics (including my Rode NT1-A) want 48 V, but aren’t that sensitive to how much voltage they get and work OK with only 15 V. But you always wonder how much of the mic’s frequency response and sensitivity you lose when you give it only 15 V. Perhaps not much, but who knows?

The GEQ settings the DEQ2496 auto-generated were too sharp for my taste, so I looked at the FR curve it measured from the pink noise signal. This roughly matched the FR curve I created by recording 1/3 octave warble tones from Stereophile Test Disc #2. Since both gave similar measurements, I prefer doing it manually because I can correct for the mic’s response, and my digital recorder (Zoom H4) gives the mic full 48 V phantom power.

So the curves match: that’s a nice sanity check – now we’re rolling.

Using the DEQ 2496, I created parametric EQ settings to offset the peaks and dips. This enabled me to use gentle corrections – both in magnitude and in slope. I then replayed the Stereophile warble tones and re-measured the room’s FR curve. The first pass was 2 filters that got me 90% of the way there:

  • +4 dB @ 31 Hz, 1.5 octaves wide (slope 5.3 dB / octave)
  • -3 dB @ 1000 Hz, 2 octaves wide (slope 3 dB / octave)

These changes affected other areas of the sound, so I ran a couple more iterations to fine tune things. During this process I resisted the urge to hit perfection. Doing so would require many more filters, each steeper than I would like. It’s a simple engineering tradeoff: allowing small imperfections in the response curve allows fewer filters with gentler slope. Ultimately I ended up with near-perfect frequency response measured in-room at the listening position:

  • Absolute linearity: from 30 Hz to 20 kHz, within 4 dB of flat
  • Relative linearity: curve never steeper than 4 dB / octave
  • Psychoacoustic linearity: about -0.8 dB / octave downslope (+3.9 dB @ 100 Hz, -3 dB @ 20 kHz)

The in-room treble response was excellent to begin with, thanks to the Magnepan 3.6/R ribbon tweeters. Some of the first EQs impacted that slightly, reducing the response from 2k to 6k, so I put in a mild corrective boost.

Subjectively, the overall before-after differences are (most evident first):

  • Midrange edge eliminated; mids are completely smooth and natural, yet all the detail is still there.
  • Transition from midrange to treble is now seamless, where before there was a subtle  change in voicing.
  • Smoother, more natural bass: ultra-low bass around 30 Hz is part of the music rather than a hint
  • Transition from bass to lower midrange is smoother and more natural.

In other words, audiophile heaven. This is the sound I’ve dreamed of having for decades, since I was a pimpled teenager with sharper ears but less money and experience than I have now. It’s been a long road taken one step at a time over decades to get here and it’s still not perfect. Yet this is another step toward the ideal and now about as close as human engineering can devise. The sound is now so smooth and natural, the stereo stops reminding me it’s there and enables me to get closer to the music, which now has greater emotional impact. And it’s more forgiving of imperfect recordings so I can get more out some old classics, like Jacqueline DuPre playing Beethoven Trios with Benjamin Britten and Arthur Rubinstein playing the Brahms F minor quintet with the Guarneri.

Throughout this process, I could detect no veil or distortion from the DEQ2496. The music comes through completely transparently. I measured test tones through the DEQ2496 in both pass-through and with EQ enabled; it introduced no harmonic or intermodulation distortion at all. That is, anything it might have introduced was below -100 dB and didn’t appear on my test. This is as expected, given that I’m using it entirely in the digital domain – no DA or AD conversions – and my EQ filters are parametric, small with shallow slope.

While I was at this, I created a small tweak for my LCD-2 headphones. Their otherwise near perfect response has a small dip from 2 to 8 kHz. A little +3 dB centered at 4.5 kHz, 2 octaves wide (3 dB / octave, Q=0.67) made them as close to perfect as possible.

Overall, I can recommend the DEQ2496. Most importantly, it enabled me to get as close to humanly possible to perfect sound. That in itself deserves a glowing recommendation. But it’s not a magic box. I put a lot of old fashioned work into getting my audio system in great shape and used the DEQ2496 only to span that last %. Like any powerful tool, the DEQ2496 can be used for evil or for good. So to be fair and complete I’ll list my reservations:

  • The DEQ2496 is not a magic band-aid. You still need to acoustically treat and arrange your room first to fix the biggest problems. After you do that, you might be satisfied and not need the DEQ2496.
  • The DEQ2496 is complex to use, creating the risk that you won’t get it to work right or you’ll get poor results.
  • To use the RTA feature you’ll need an XLR mic with wide, flat frequency response.
  • I cannot assess its long term durability, having it in my system for only a few days. Many of the reviews say it dies after a year or two,  but they also say it runs hot. Mine does not run hot, so maybe Behringer changed something? Or perhaps mine runs cooler because I’m not using the D-A or A-D converters. It does have a 3 year manufacturer warranty, longer than most electronics.

Housing in San Francisco

Kudos to Eric Fischer for a detailed analysis of Housing in San Francisco.

https://experimental-geography.blogspot.com/2016/05/employment-construction-and-cost-of-san.html

He did a regression and found 3 key features that correlate with housing prices:

  • Housing Supply: how much housing is available on the market
  • Salaries: how much are people in the area earning?
  • Employment: how many people in the area are employed?

Interestingly and surprisingly, the trend of rents over time was quite steady unaffected by the introduction of policies like rent control. The data & regression suggests that housing follows the basic laws of supply & demand just like other commodities.

Android 6 / Cyanogenmod 13 – Not Yet Ready for Use

Update: not so fast. Today I picked up my phone and certain icons were missing from the home screen. apps stored on the SD card randomly, intermittently disappear from the home screen (even without rebooting, just waking from sleep). And on booting, I get a system modal error dialog and their icons disappear. The apps are still installed and I can run them if I wait a few moments after booting. Turns out it has the same problem my tablet did before. Adoptable storage is still not working as designed.

On top of that, WiFi mysteriously stopped working and could not be enabled, even after rebooting. And the MAC address reverted to 02:00:00:00:00. Looks like CM 13 is not yet ready to use on the Galaxy Note 2. I reverted to my CM 12.1 backups and may try CM 13 again in a few months.

A while ago, I had this to say about Android 6 adoptable storage. Android 6 is great, but this feature just didn’t work. No big deal, I simply kept using the SD card the same old way I’ve been using it for years.

That was a custom port of CM 13 to my old 8″ Galaxy Tab 3. Recently, Cyanogenmod released nightly builds of version 13 for my Galaxy Note 2. Yeah, it’s an old phone. I’ve been using the same phone for almost 4 years. But why not? It’s a fantastic phone, still works like new with great performance and amazing battery life.

I decided to give adoptable storage another try, this time with an official build (albeit a nightly build). Long story short: it works perfectly having none of the problems I encountered before.

  • The SD card mounts to /storage/emulated/0. This is the same location where internal storage used to be mounted.
  • Because of this, apps like the AOSP camera store their data on the SD card, even when set to “internal”. They’re storing data to the same location, not realizing the SD card is mounted there.
  • Same with the system Downloads directory – nice!
  • Apps set to prefer external storage install to the SD card, as well as additional data they download. All automatically.
  • From the system Apps menu, you can change some apps between external and internal storage.
  • For generic shareable folders like music and videos, I’ve had no problems with permissions.
  • I’ve had no problems with Folder Sync and other apps that read & write the same files across several apps.
  • When you plug the phone into a computer via USB, only 1 directory shows up. It’s the SD card. Previously, 2 directories showed up: internal and external.
  • File Managers like Solid Explorer still detect the SD card and use it correctly.

Simple advice:

  • Use a high quality class 10 or faster card. Cards are so cheap now, the speed and reliability is definitely worth it.
  • Use external storage for apps that accumulate lots of data: mapping apps like Sygic and Droid EFB that download entire states or areas, Evernote which can store GB of data locally, Folder Sync which syncs Dropbox and Box to your local device, Titanium Backup, etc.

Overall, it works automatically and seamlessly, simplifying storage space on the phone. I have not had to do any tweaks.

Ubuntu 16 Has Arrived – But Wait!

Ubuntu 16 was released a few weeks ago, the latest version and a LTS release, which means long term support (5 years). All even numbered releases are LTS. I’ve been running Ubuntu since version 10 and updated three machines to Ubuntu 16. A year or two ago I switched to the XUbuntu variant because I don’t like the Unity interface and XFCE is faster and lighter on CPU and RAM. My advice is to stick with Ubuntu 14, if that’s what you’re already running. At least for now.

First, if you have a laptop that needs support for power management, you need the version 4 Linux kernel and must already be running Ubuntu 15. Just keep running it. If you have a desktop, you’re probably running Ubuntu 14, which is a solid release and still supported.

Second, Ubuntu 16 has few practical improvements or upgrades that you might notice. The only difference I’ve noticed is that the Openconnect VPN script is fixed; Ubuntu 15 required a route command after connecting; Ubuntu 16 is fixed and does not. Ubuntu 14 never had this bug.

Third, the Ubuntu 16 upgrader is broken and crashes, so if you try to update you’ll have to fix and complete it manually.

Fourth, Ubuntu 16 has a serious bug: a memory leak in the Xorg process. Previously it used 50 – 100 MB of RAM. On Ubuntu 16 it slowly but constantly grows, after a couple of days reaching a couple of GB, until the system starts swapping. You need to log out to kill the Xorg process and start a new one. This bug occurs only on my desktop using open source video drivers. The other desktop with Nvidia binary drivers, and laptop with Intel HD graphics do not have this bug.

Details: I updated two desktops and a laptop, all 64-bit. One desktop has a generic video card using open source drivers. The other has an Nvidia Quadro K600 using the Nvidia binary driver from the Ubuntu repo. The Laptop is a 2015 Thinkpad Carbon X1 with Intel HD graphics. All three were running Ubuntu 15.10, fully up-to-date, before upgrading, which was running great on all of them.

In all cases, I ran do-release-upgrade from a command prompt. It crashed – didn’t fail, but actually crashed – about halfway through, leaving my machine reporting itself as Ubuntu 16, but with apt-get in a broken state. To complete the install, I ran the following sequence of commands:

apt-get -f install
apt-get update
apt-get dist-upgrade
apt-get autoremove

I repeated this sequence until apt-get stopped reporting errors. This completed the install – at least I think so. All the repos seem to have updated, the system is running kernel 4.4.0, system reports itself as Ubuntu 16 and is running fine.

It’s nice to have VPN work without needing an extra route command. But unless you have a burning need to get on Ubuntu 16, I advise waiting for Canonical to fix the upgrader before installing.

How Strong are Small Airplanes?

The FAA defines 3 categories for small airplanes:

Normal: all standard private and commercial flight maneuvers up to 60* bank angle. Must withstand 3.8 G or more.

Utility: additional flight maneuvers like spins and > 60* bank angles. Must withstand 4.4 G or more.

Acrobatic: any maneuver or bank angle not prohibited by the POH. Must withstand 6.0 G or more.

All certified GA airplanes meet the Normal category, many (like the Cessna 172) meet the Utility category, and some meet Acrobatic. With 3.8 G as the minimum, this means airplanes are built very strong.

You don’t really know how strong the airframe is because the G rating is a minimum. It can handle that G load under normal operation. Certainly it can handle more, but how much more is unknown. If you exceed it, you’re the test pilot – that’s bad, don’t do that.

Being certified Utility doesn’t necessarily mean the airplane can perform any maneuver exceeding 60* of bank. For example many aircraft certified Utility are not approved for spins. Prohibitions like this are listed in the POH.

Airplanes certified for multiple categories may not always satisfy all categories. For example the Cessna 172 is certified Utility only when gross weight is under 2,100 lbs. and CG is forward of a certain point. Otherwise, it’s certified Normal.

On the Minimum Wage

Lots of news about the minimum wage lately. I’m disappointed at how poorly people understand it – especially people with some knowledge of economics. This leads to the nearly universal view that it is a policy that benefits the poor. I believe this view is incorrect. Here’s why.

I characterize the minimum wage as a form of welfare, or a  policy intended to help the poor. Any such policies should meet 2 basic guidelines.

  1. The benefits should be focused on the poor.
  2. The costs should be paid by the non-poor (middle class or rich).

The minimum wage fails both of these guidelines. I’ll take them in order.

People making the minimum wage are new or inexperienced workers – but not necessarily poor. Many new or inexperienced workers are young people spanning the entire economic spectrum including middle class and wealthy families. High school and college kids working part time jobs during the school year and full time during summers. Minimum wage jobs are frequently taken by middle class or wealthy retired people looking for something to do and a little extra income. Of course there are also some adult head of households working minimum wage jobs to support themselves or their families. Whatever benefits the minimum wage provides are not focused on the poor, but distributed equally to all of these different people, many of whom aren’t poor and don’t need the benefit.

When we think about what kind of businesses have minimum wage jobs, what comes to mind? Fast food, stores like Wal-Mart and Target, etc. And what do these businesses have in common? They are patronized by the lower and middle classes. Rich people are less likely to eat at McDonalds or shop at Wal-Mart. Increasing the minimum wage makes the products and services these businesses provide, more expensive. And the people paying those higher prices are the people who shop there – not the rich, but the lower and middle class.

In short, the minimum wage (1) fails to focus its benefits on the poor, and (2) its cost is not fully paid not by the rich, but also borne by the poor and middle class. It fails both legs of the test of charity.

Yet the problems with the minimum wage don’t stop there. It also fails the test of economics.

Minimum wage doesn’t increase the productivity of labor. All it does is make it illegal to sell one’s own labor below a certain price. People whose productivity of labor is below that price will be unemployed. So each person who benefits from the minimum wage, does so at the expense of others who can’t get jobs at all. Economists argue over how much a minimum wage raises unemployment – not whether it does.

Minimum wage laws also exacerbate the pernicious effects of discrimination. Suppose you own a fast food restaurant and 2 people apply to flip burgers: one is a black high school dropout, the other is a clean-cut white kid attending college. Who are you going to hire? Remember you have to pay them the same. Normally, a kid attending college doesn’t compete with high school dropouts because his skills enable him to demand a higher rate for his labor. But when minimum wage is high enough, he competes with people having fewer skills and experience. The unemployment rate among young black males is already more than twice as high as the national average. Higher minimum wages will only make that worse, not better.

In this sense, minimum wage laws harm the very people they are supposed to protect –  the newest, least experienced workers, especially minorities and otherwise disadvantaged – by forcing them to compete with more skilled and experienced workers for jobs.

Minimum wage laws passed over broad areas like an entire state have another problem: variable cost of living. The cost of living – and wages – are much higher in San Francisco than in Redding; or in Seattle vs. Spokane. The $15 minimum wage proposed for CA is a bad idea for San Francisco, yet in Redding the effects would be even worse.

Finally, a word about the popular phrase living wage. Many of the people working minimum wage jobs are part-time college students, retirees, and others who have independent means of support and don’t need a  living wage. Yet those who do need a living wage to support themselves or a family, aren’t working minimum wage jobs very long. A minimum wage job is an entry level job. Minimum wage workers quickly gain on-the-job experience and skills and move on to higher paying jobs. Even if the total number of people working minimum wage jobs is growing, they’re not the same people year over year. The group of minimum wage workers has high churn – a constant influx of new low skilled people entering the job market, as others leave the group moving on to higher paying jobs. The minimum wage doesn’t necessarily help heads of households working entry level jobs because:

  • It makes it harder for them to get a job in the first place.
  • They don’t hold minimum wage jobs very long before they move on to higher paying work.

Alternatives

I don’t like to shoot something down unless I offer an alternative. In this case, a better alternative is refundable tax credits. They already exist, so could be easily expanded. Unlike the minimum wage, this passes both legs of the charity test: focused on people who actually need it, and paid for by general funds, rather than by businesses that cater to the lower class.

We should consider requiring able-bodied people receiving this benefit to do whatever work they are capable of doing – sweeping streets, filling potholes, filing paperwork at the local DMV or public school, etc. This is typically work that the govt would perform, whether directly or through contracts, so this would save taxpayers money while taking few jobs from private industry. And it would benefit the people working by teaching them skills that make them employable. And it makes common sense fairness.

Most Common GA Accident Types

In the past 100 years nobody has created a new way to destroy an airplane. Here are the most common ways, roughly in order of most common first:

  1. Weather: pilot didn’t respect Mother Nature (she doesn’t have to respect you – she was here first).
  2. Fuel: pilot ran out of fuel (airplane engines run better with fuel).
  3. Planning: or lack thereof – over gross weight, out of CG limits, density altitude, VFR into IMC, etc.
  4. Maintenance: pilot departed with known aircraft deficiency (airplanes work best when properly maintained).
  5. Pilot was cognitively impaired (fatigue, drugs, etc.).
  6. Stupidity: pilot intentionally did something stupid (buzzing, “watch this”, etc.).

Every aviation accident I know of falls into at least one of these categories – sometimes more than one. The good news is, improving safety is simple common sense. Don’t do these things! Safety improves one pilot at a time. If you don’t do these things, you’ve improved your safety roughly 10-fold and you’re making GA safer than driving or bicycling.

Why Ignore Unique Words in Vector Spaces?

Lately I’ve been working on natural language processing, learning as I go. My first project is to discover topics being discussed in a set of documents and group the docs by these topics. So I studied document similarity and discovered a Python library called GenSim which builds on top of numpy and scipy.

The tutorial starts by mapping documents to points in a vector space. Before we map we do some basic text processing to reduce “noise” – for example stripping stop words. The tutorial casually mentions removing all words that appear only once in the corpus. It’s not obvious to me why one would do this, and the tutorial has no explanation. I did a bunch of googling and found this is commonly done, but could not find any explanation why. Then I thought it about a little more and I think I know why.

We map words into a vector space having one dimension per distinct word in the corpus. A document’s value or position along each dimension (word) is computed. It could be the simple number of times that word appears in that doc – this is the bag of words approach. It could be the TF-IDF for that word in that document, which is more complex to compute but could provide better results, depending on what you are doing. However you define it, you end up with a vector for each document.

Once you have this, you can do lots of cool stuff. But it’s this intuitive understanding of what the vector space actually is, that makes it clear why we would remove or ignore all words that appear only once in the corpus.

One way to compute document similarity is to measure the angle between the vectors representing documents. Basically, are they pointing in the same direction? The smaller the angle between them, the more similar they are. This is where cosine similarity comes from. If the angle is 0, cosine is 1: they point in the exact same direction. If the angle is 180, cosine is -1: they point in opposite directions. Cosine of 0 means they are orthogonal. The closer to 1 the cosine is, the more similar they are.

Of course, no matter how many dimensions the vector space has, the angle between any 2 vectors lies in a 2-D plane – it can be expressed as a single number.

Let’s take a 3-D example: we have 3 words: brown (X axis), swim (Y axis), monkey (Z axis). Across our corpus of many documents, suppose only 1 doc (D1) has the word monkey. The other words each appear in several documents. That means the vectors for every document except D1 lie entirely in the X-Y plane – their Z component is 0. D1 is the only document whose vector sticks out from the X-Y plane.

Now it becomes easy to see why the word monkey does not contribute to similarity. Take any 2 vectors in this example. If both are in the X-Y plane then it’s obvious that the Z axis has no impact on the angle between them. If only one is in the X-Y plane (call it Dx), it means the other (not in the X-Y plane) must be D1. Here, the angle between D1 and Dx is different from the angle between Dx and the projection or shadow of D1 onto the X-Y plane. But, it doesn’t matter because this is true when comparing D1 to every other vector in the set. The relative differences between D1 and each other vector in the set are the same whether we use D1 or the projection of D1 onto the X-Y plane. In other words, using cosine similarity they still rank in the same order nearest to furthest.

Another way to see this is to consider the vector dot product between D1 and D2. As a reminder, the dot product is the sum of the products of each vector’s components in each dimension. Any dimension that has a value of 0 in either vector contributes nothing to the dot product. And of course, every vector except D1 has a 0 for the Z dimension, so the Z component of the dot product will always be 0. The cosine of the angle between any 2 vectors Dj and Dk is equal to their dot product divided by the product of their magnitudes. If we normalize all vectors to unit length, the denominator is always 1 and cosine is the dot product.

Because of this, any word that appears exactly once in the corpus can be ignored. It has no effect on the similarity of documents. But we can actually make a stronger statement: any word that appears in a single document can be ignored, no matter how many times it appears in that document. This is a misleading – yet not incorrect – part of the tutorial. It removes words that appear only once in the corpus. It could go further and remove words that appear in only 1 document, even if they occur multiple times.

I haven’t found any explanation for this at all, so I can’t confirm my explanation. But I suspect this is why once-occurring words are often ignored. In fact, sometimes people get better results by ignoring words that occur in more than 1 document, if they occur only in a very small number of docs. The reasoning seems to be that words appearing in a handful of docs from a corpus of thousands or millions, have negligible impact on similarity measures. And each word you ignore reduces the dimensionality of the computations.

Ubuntu Linux and Blu-Ray

Getting Linux to work with Blu-Ray took some custom configuration. The state of Linux and Blu-Ray has much to be desired and doesn’t work out of the box. But it can be made to work if you know what to do. Here’s how I got it to work.

Reading Blu-Rays

This was the easy part. You can do it in 2 ways: VLC and MakeMKV

Blu-Rays don’t play in VLC because of DRM. To play them in VLC you need to download a file of Blu-Ray keys, like here: http://vlc-bluray.whoknowsmy.name/. This may not be the best approach because the file is static. New Blu-Rays are coming out all the time. But it works if you regularly update this file and it has the key for the Blu-Ray you want to play.

MakeMKV is software that reads the data from a Blu-Ray and can write it to your hard drive as an MKV file. It can also stream the Blu-Ray to a port on your local machine. Then you can connect VLC to play the stream from that port. Viola! You can watch the Blu-Ray on your computer with VLC, even if you don’t have the keys file. MakeMKV is shareware – free for the first 30 days, then you should pay for it.

Writing Blu-Rays

The first challenge writing Blu-Rays is Ubuntu’s built-in CD writing software, cdrecord. It’s a very old buggy version. This happens even with the latest repos on Ubuntu 15.10. It works fine for Audio CDs, data CDs and DVDs. But not for Blu-Ray. The first step is to replace it with a newer, up-to-date version. The one I used is CDRTools from Brandon Snider: https://launchpad.net/~brandonsnider/+archive/ubuntu/cdrtools.

Whatever front end you use to burn disks (like K3B) works just the same as before, since it uses the apps from the underlying OS, which you’ve now replaced. After this change I could reliably burn dual-layer (50 GB) Blu-Rays on my Dell / Ubuntu 15.10 desktop using K3B. My burner is an LG WH16NS40. It is the bare OEM version and works flawlessly out of the box.

Now you can burn a Blu-Ray, but before you do that you need to format the video & audio and organize into files & directories that a Blu-Ray player will recognize as a Blu-Ray disc. What I’m about to describe works with my audio system Blu-Ray player, an Oppo BDP-83.

The command-line app tsmuxer does this. But it’s a general transcoder that can do more than Blu-Ray, and the command line args to do Blu-Rays are complex. So I recommend also installing a GUI wrapper for it like tsmuxergui.

sudo apt-get install tsmuxer tsmuxergui

Now follow a simple guide to run this app to create the file format & directory structure you need for a Blu-Ray. Here’s the guide I used. Do not select ISO for file output. When I did that, K3B didn’t know what to do with the ISO – my first burn was successful, but all it did was store the ISO file on the disk. Instead select Blu-ray folder. This will create the files & folders that will become the Blu-Ray. Also, you might want to set chapters on the tsmuxer Blu-ray tab. For one big file that doesn’t have chapters, I just set every 10 mins and it works.

When tsmuxer is done, run K3B to burn the files & folders to the blank Blu-Ray. Key settings:

In K3B:
Project type: data
The root directory should contain the folders BDMV and CERTIFICATE
Select cdrecord as the writing app
Select Very large files (UDF) as the file system
Select Discard all symlinks
Select No multisession

Then let ‘er rip. Mine burns at about 7-8x, roughly 35 MB / sec. When it’s done, pop the Blu-Ray into your player and grab some popcorn!