Structuring the Unstructured: From Startup to Making Sense of eBay’s Huge eCommerce Inventory
Ido Guy, Kira Radinsky
SIGIR 2017: 1351
Electronic commerce continues to gain popularity in recent years. On eBay, one of the largest on-line marketplaces in the world, millions of new listings (items) are submitted by a variety of sellers every day. This renders a rich diverse inventory characterized by a particularly long tail. In addition, many items in the inventory lack basic structured information, such as product identifiers, brand, category, and other properties, due to sellers’ tendency to input unstructured information only, namely title and description. Such inventory therefore requires a handful of large-scale solutions to assist in organizing the data and gaining business insights. In 2016, eBay acquired SalesPredict to help structure its unstructured data. In this proposed presentation, we will share the story of a research startup from its inception until its acquisition and integration as eBay’s data science team. We will review the numerous challenges from research and engineering perspectives of a startup and the principal challenges the eBay data science organization deals with today. These include the identification of duplicate, similar, and related products; the extraction of name-value attributes from item titles and descriptions; the matching of items entered by sellers to catalog products; the ranking of item titles based on their likelihood to serve as “good” product titles; and the creation of “browse node” pages to address complex search queries from potential buyers. We will describe how the eBay data science team approaches these challenges and some of the solutions already launched to production. These solutions involve the use of large-scale machine learning, information retrieval, and natural language processing techniques, and should therefore be of interest to the SIGIR audience at large.