Business people analyzing marketing reports

Efficiency is vital for data science teams working to complete time-sensitive projects. Having to dedicate their resources to the tedious but essential task of building datasets only moves them farther from reaching their goals, including helping to solve business challenges using this data. In addition to losing time that otherwise could be spent on more urgent work, other obstacles can present themselves when data science teams choose to gather their data without the help of external consultants.

In the following interview, Carl Robinson, Senior Corporate Solutions Director at CCC, discusses some of the key issues data science teams we’ve worked with encounter when building their own datasets, and how a vendor like CCC can help eliminate these issues and make it easier for members of the team to focus on the work that matters most to their projects.

What are the biggest challenges data science teams encounter when creating their own datasets?

We regularly hear from leaders of data science teams that they are concerned about a loss of institutional knowledge. When internal teams develop datasets, they rely on their own staff’s expertise and understanding of how the data was gathered, and this knowledge can be easily lost in the event of personnel changes. This in turn raises questions about the maintenance and further development of the datasets, potentially endangering the completion of the projects relying on them.

On top of this, the preparation of useful and effective collections of data, both in the acquisition and normalizing of data, can take a great deal of time and manual effort. This effort can be further impacted when factoring in any navigation of licensing necessary to use the data. Data licensing can sometimes be complex, and the terms of use can vary widely depending on the source.

How can working with a vendor help solve these challenges?

One of the main advantages of working with a vendor such as CCC are the methods they use to maintain accessible knowledge bases of what has been done, how it was done, why it was done, and so forth. In the event of personnel changes within a vendor organization, the processes leading to the creation of datasets for clients are not lost because it was in someone’s head or buried in their files. Vendors specialize in working across distributed teams and have developed robust ways of preserving knowledge so as not to impact clients negatively. Further, we as a vendor can offer our own expertise and insights into the building process, and we can also adapt solutions as needed based on the specific data needs of a team.

A vendor is also able to bring speed, focus, and efficiency to the work of creating valuable datasets, allowing companies to allocate their resources more effectively. By outsourcing the collection, normalization, and structuring of data to vendors, organizations can free their data scientists to focus on what they do best: interpreting data and applying it to solve complex problems. This not only maximizes the value data scientists bring to the table, but it also avoids bogging them down with the tedious aspects of data preparation.

How does RightFind XML help organizations fulfill the needs of their data science teams?

For many projects, data scientists seek “off the shelf” datasets, ones that may provide some flexibility, but are most valuable for the relevant, normalized, and structured data they provide in return for a relatively minimal investment of time and resources.

RightFind XML is a software solution we offer that gives organizations access to full-text article content, as well as text mining rights from over 50 publishers. It simplifies the extraction of scientific knowledge so data science teams can focus on completing their projects while minimizing time-consuming manual tasks and reducing legal risks.

Because it acts as a unified source for machine-readable articles, RightFind XML makes it easier for data science teams to integrate scientific literature into their knowledge extraction tools. It also helps to reduce an organization’s infringement risk, as CCC negotiates commercial text mining permissions across multiple publishers to create a consistent and uniform text mining license.

What are deep search solutions, and how can they help data science teams in their work?

Deep search solutions are designed to fulfill specific and targeted needs, as opposed to standard search tools that navigate through broad information. With deep search, we create highly specialized datasets to answer particular questions or meet unique requirements. For example, if a client needs to track unsanctioned import/export of pharmaceuticals across borders, a deep search solution can provide the necessary insights.

Another example could be when the data you need is time-bound or transient. A number of organizations use deep search approaches to track very specific pre-conference information automatically, like who’s speaking, on what topics, what’s in the abstracts, and so on. They can then filter this information quickly to get a better sense of where the market is going, and what others, especially competitive organizations, are likely to be researching. This gives them a sense of where they fit in the market, and what they need to do to take advantage of opportunities. It adds to their competitive intelligence activities.

How are deep search solutions evolving, and what should organizations consider when looking into them?

Deep search is becoming increasingly integral to how businesses handle complex dataset challenges. The key for organizations is to balance the need data science teams have for specialized solutions with the flexibility and scalability of general search platforms, which is something we at CCC continue to refine through our own offerings.

Businesses should consider not just the immediate benefits of deep search solutions in helping their data scientists complete their projects more efficiently, but also how these solutions can fit into their long-term data strategy and competitive intelligence workflows.

Topic:

Author: Christine McCarty

Christine Wyman McCarty is Product Marketing Director for corporate solutions at CCC. Through over a decade of experience working with clients at R&D intensive companies, she has gained an understanding of the challenges they face in finding, accessing, and deriving insight from published content. She draws on this expertise to shape innovative product offerings that solve market problems. Christine has held a variety of positions at CCC including roles in software implementation and product management. Christine has a Masters in Library and Information Science from Simmons University and practiced librarianship for several years before finding her passion for helping companies digitalize their knowledge workflows with software.