Stephanie Lemieux is a passionate advocate of taxonomy, search and content organization. The Synaptica team had the chance to talk with Stephanie about her role as President and Principal Consultant at Dovecot Studio. As well as speaking, blogging and writing whenever she can to help spread the good taxonomy word we also asked her about her role as Conference Program Chair for Taxonomy Boot Camp, part of KMWorld.
Tell us about you, your experience and your interests.
SL: Like many who work in taxonomy, my initial training was as a librarian. I went to library school expecting to be a regular librarian, but during the first week of classes, I had a depressing realization that working in a library wasn’t the right fit for me. Luckily, the library science program was starting a whole new concentration in knowledge management (KM) that year.
That first KM class was so eye-opening: the teacher (my first mentor) was an ex-consultant and she brought in all these wonderful crunchy case studies of corporate knowledge sharing and problem solving. I have a background in anthropology, and much of KM work sounded like it was about understanding how people think and how to use that to transfer knowledge between people and organizations — a much better fit for me. Essentially, knowledge management came and rescued me from what could have been a big career mistake!
I took a course called “Knowledge Taxonomies” and that’s when my love affair with language and meaning started. While still in school I met my second mentor, Seth Earley, who brought me in as a taxonomy consultant right after graduation. It was pretty unusual to go straight into consultancy from training, so it was a great opportunity.
Taxonomy was very niche 15 years ago. It certainly wasn’t as widespread as it is today. It meant that if you had any type of experience or formal training you were considered a hot commodity.
What makes a good taxonomy?
SL: A good taxonomy is one that is useable and understandable. A good taxonomy is also tested and proven. If it’s an enterprise taxonomy, then it has to be harmonized across the organization and structured to fit multiple needs. There are a few general rules to follow around grammar, structure, usability and consistency– but there are also lots of context-dependent factors to consider.
There is a big difference between building a taxonomy that is meant to sit in the backend of a machine to drive automated processes and designing for real people to interact with. If you build a taxonomy that is going to be used by people for tagging, searching or browsing, it has to be tailored to that group’s way of looking at the world and account for all the different ways they might think. Usability is critical.
Does Dovecot work in collaboration on projects?
SL: As a small firm we often partner with others in adjacent niches. We have a couple of partners in the academic publishing world. We do a lot of work in digital asset management and we work with companies that focus on marketing and creative operations. We also collaborate with vendors whose systems have deep metadata or taxonomy needs, and they may recommend us to a client who needs support during deployment.
Are there any other types of projects you are involved with?
SL: One thing we’re doing more of lately is tidying up and expanding an existing taxonomy. Enough time has now passed since the early 2000s when people started building taxonomies en masse. Now we are looking for a five- or ten-year-old taxonomy that needs review and enrichment to serve new needs or to become an enterprise structure.
We sometimes do a more robust metadata streamlining process during re-platforming projects (e.g., digital asset management, content management, etc). This can involve looking at how content has been tagged as well as metadata-related functionality. We apply some quantitative and qualitative criteria to the metadata and help decide what to keep and what to remove. It might also involve improving vocabularies and bringing them up to current ways of working and thinking.
These projects allow us to leverage text analytics a lot more. We can use term extraction on collections to get a better sense of what new subject matters havw emerged since the taxonomy was built. Clients are also asking for ways to have this process continuously run in the background. They want to ensure the content being created can be monitored and the terms extracted so that the taxonomy can be more dynamically kept up-to-date.
How did you get involved with Taxonomy Boot Camp, part of KM World?
SL: I’ve been attending for a long time, back to when the taxonomy world was still pretty niche — many of us “regulars” knew each other and presented regularly. One year Mike Randall, the previous Program Chair, approached me to see if I wanted to become more involved. Of course, I did! This is now my 4th year as Program Chair at the Washington D.C. event.
As Chair, I’m responsible for picking a theme for each year and designing the program. With a team of volunteer reviewers, we evaluate anywhere between 50-100 proposals based on fit for the audience, relevance to the event theme for the year, diversity of viewpoint, and more. From this, we put together the final program. At the actual event, I get involved with introducing speakers and moderating panel debates. I also read all the event feedback after the conference and incorporate that into the next year’s program.
Our theme last year looked at AI and machine learning, a hot topic in KM at the moment as taxonomists prepare for this new world. For 2019 the theme is building strong foundations. If you are building a new taxonomy today, how do you take what we have learned in the last few years and use that to build the most robust and solid taxonomy? How do you build a taxonomy that isn’t just doing one minor function but is part of the information enterprise architecture of your whole organization? Taxonomies play a key role in supporting core business functions like content publishing or business intelligence. Taxonomists need to have a wider perspective because taxonomies are used as foundations for organizational functions. Even if you are being asked to build a taxonomy for web navigation, the reality is it will be used for more — tagging content or as part of a product management information system. The uses can snowball quickly.
Where do you see linked data fitting into the value proposition for enterprise taxonomies?
SL: The value of linked data depends on the use case and subject domain for the taxonomy we are working on. To be honest, we are often doing very bespoke and internal-facing taxonomies that have to be tailored to corporate processes and how employees think and work. We are also often working in very narrow domains where there just isn’t much effort being put into open data in the community at large. We see a bit more value from linked data in medicine and pharmaceuticals, where there is a more vibrant community of data out there that you can leverage.
What are your views on AI and machine learning vs rules-based auto-categorization systems?
SL: I don’t think one is systematically more appropriate than the other. It’s very dependent on the type of client as well as the volume and complexity of the content.
The simpler the content and the lower the volume, the more you can rely on a relatively simple set of rules. This is especially true if the content is highly structured. In these cases, there is not a huge amount of value to be gained from adding complex machine learning algorithms. In a world where the content is unstructured and it’s highly varied, then you will be more reliant on the new work happening in the machine learning space.
Ahren Lehnert and I were working on a project proposal recently for a job site. They were managing millions of job postings and trying to use taxonomy to improve search and personalization. You would think job descriptions would be structured somewhat consistently so that you could accurately extract and normalize titles and job functions, but the reality is very different. Job descriptions are often very bespoke to suit an organization’s culture and needs and it’s hard to get anything out of them programmatically. Teaching a machine or even a human to accurately tag job postings with job functions and titles was a challenge. Machine learning was a key element in making this solution work.
On the other side, we have another client who publishes hundreds of recipes per week. For this use case, using machine learning to manage recipes didn’t add value. All of the metadata elements on a recipe tend to be pretty straightforward and structured. If it’s called Blueberry Pie, then the main ingredients will be blueberry and the dish type is pie. You can get 90% of the way there using simple concept matching and rules on the structured data of the recipe and the recipe instructions. I would instead leverage machine learning for personalization and behavior tracking and discovering how people are interacting with the content rather than basic auto-tagging.
What advice would you give others developing their taxonomy project?
SL: Try not to be myopic about what you’re building. Even if you are working from a specific project or use case, the taxonomy that you build might end up being a foundation for a large number of applications and have a life beyond the original aim. Anytime you’re building a taxonomy, you want to put an enterprise-wide hat on as much as possible and go as broad as you can in your discovery. Make sure you are building something that can support multiple functions within an ecosystem.
For example, if you’re working on the marketing side there is a marketing taxonomy ecosystem that includes social media distribution and managing digital assets. If you’re building a taxonomy, question how that taxonomy is likely to flow through the organization. You never want to build a taxonomy for one tiny use case. It seldom turns out that way. Talk to lots of people and understand the full scope of what the taxonomy might be and who it might serve.
What do you think are the biggest challenges in taxonomy?
SL: There are several. The first is ensuring there is enough management support and budget. Too often we work with clients who hire us to build a taxonomy, but once it’s complete they don’t have resources in place to support it. Consider who is going to take care of the taxonomy and care about it. Do you have a headcount before you start this project? Do you have enough money to support it over its lifespan? Where will it sit within the organization?
Another challenge relates to who “owns” the taxonomy. As an enterprise function, knowledge management and taxonomies have to be given the attention and seriousness they deserve. Marketing teams often build taxonomies but don’t always have the resources or clout to make proper governance happen. So, there needs to be an effort to figure out where we can place the taxonomy function so that it gets the budget and attention it deserves.
One more challenge is figuring out how machine learning and AI play into the management and the development of taxonomies. We are still in a kind of experimentation mode. A lot of people are talking about machine learning and not a lot of people are giving concrete and reusable examples on how to make this work in an organization. It’s still not completely marketable or operationalizable in our average project.
What do you think are the emerging trends in taxonomy?
SL: For Taxonomy Boot Camp 2019, we received a lot of submissions from the content strategy world. I think content strategists and taxonomists are starting to play more together, especially in the world of personalization which is also a growth area. There is a push to figure out how to use machine learning and behavioral analytics effectively in conjunction with taxonomy.
Synaptica Insights is our popular series of use cases sharing interviews, conversations, news and learning from our customers, partners, influencers, and colleagues. You can view the full series on our website. Synaptica LLC are Diamond Sponsors of KMWorld 2019 and co-located events Enterprise Search & Discovery, Office 365 Symposium, Taxonomy Boot Camp, and Text Analytics Forum. KMWorld takes place in November in Washington D.C. and members of the team will be speaking and moderating at this year’s event.