Subject Headings Galaxy

Information

lcsh-galaxy
 

The Library of Congress Subject Headings (abbreviated LCSH) are a controlled vocabulary used to catalog books.  For example, the book “The Hound of the Baskervilles” (a Sherlock Holmes tale) has the heading Detective and mystery stories.  These headings are an early search technology.  Before computers, they let librarians and their patrons lookup books by content rather than author or title. 

Subject headings have relationships. Detective and mystery stories is broader than Gothic fiction (Literary genre) and Noir fiction. It's narrower than Adventure stories and Fiction. All of these relationships define a set of objects and connections. Mathematicians call such things graphs or networks.

The LCSH GALAXY shows over 100,000 subject headings and their relationships. Like the picture at left, each heading is shown as a small circle. Connections are shown as lines between headings. With 100,000 headings, the picture becomes considerably more complicated.

 

In the LCSH Galaxy, the position of each point is derived from a smaller set of connections between headings. These connections constitute a minimum spanning tree for the network. From the minimum spanning tree, a program called LGL computes locations for each subject heading to make the lines overlap as little as possible. This problem is difficult and many lines still overlap in the final picture.

 

The final picture of the galaxy uses the distance from the center of the galaxy to color each edge. Red edges are far from the center and yellow edges are close.

Most of the color in the picture is from the headings themselves. Connections between the headings generate groups. The CLUTO program identifies these groups. Each group is assigned a different color.

 

There are far too many subject headings to display all of their titles. Showing only the headings with many books or items in the Library of Congress catalog helps make the image interpretable. In this case, only headings used over 500 times or that have more than 150 related headings are shown.

Actually, even limiting the displayed strings with these restrictions produces too many headings to show without becoming a jumbled mess. To fix the mess, a heuristic algorithm for the maximum independent set problem identifies all the possible labels that do not overlap.

 
The final image shows all of these pieces assembled together. Please explore the galaxy to see more detail.