Skip to content Skip to navigation

Data Science in Tokopedia

Analytics Seminar 30th June 2019 by Angus Kong

Data Science in Tokopedia


Angus Kong
Assistant Vice President, Data Science
Tokopedia

Angus Kong, the Assistant Vice President (AVP) of Data Science at Tokopedia spent the afternoon on August 30 in SMU, speaking about data science and its applications in the Indonesian e-commerce company. Prior to joining Tokopedia, Angus worked at Microsoft, Google and Appier, where he worked on famous features such as Gmail SmartReply and language understanding features for the Google Assistant. As the AVP of Tokopedia, Angus oversees the company’s Artificial Intelligence and Machine Learning strategies, where he aims to help data science teams succeed and integrate data science within the company’s services and products.

By way of introduction to the company, Tokopedia just turned 10 and aspires to provide all services needed by its users to function in their daily lives, from payments to commerce to other such services. Its underlying vision is to democratise commerce- because commerce is inextricably linked to human societies and our endeavours, and pushing commerce with technology helps push us as a whole towards greater prosperity and success. As an example of that, Tokopedia provides financial solutions to give people opportunities to start their own small businesses.

For an e-commerce company, Indonesia actually represents an incredibly vast amount of untapped potential. It is the 4th most populous country in the world, with high mobile phone, bank and internet penetration rates but astonishingly low e-commerce penetration rates of only 5%. Hence, both the population and fundamental factors indicate a favourable outlook for Tokopedia.

When it comes to data science, what it means for the company is four things: problems for the company to solve, infrastructure with which to deliver a solution, data to support the problem-solving process and finally people to work on the data to create the solution. The broad categories of data science problems that Tokopedia deals with include functions and applications that the company uses, more fundamental problems that belie all company functions and the more advanced topics that can help create greater business value.

Examples of the first category include optimising for search and advertisement relevance, creating good recommendations for the home page of the app, and fraud detection for the payment and registration stages. Tackling that last problem of fraud is always challenging, as people with malicious intent can be very creative in coming up with new ways to commit fraud and so new challenges constantly pop up for the fraud detection team. Here, Angus shared some interesting anecdotes of fraud cases, one of which had a buyer and seller colluding by raising the price on an item to illegally gain from a cashback programme.

More fundamental problems for the company pertains to those that are relevant to many other business functions. Keyword identification and discovery are such problems, as are Natural Language Processing problems. The core functions behind the entire e-commerce premise of the company benefits from any improvements or solutions to such problems. Additionally, more advanced overarching research objectives for the company, such as transfer learning techniques that will help Tokopedia develop new AI models more efficiently, without investing too much resources into training and developing it. Another area of improvement is the correct representation of data and information to the company’s users, merchants and products.

The supporting infrastructure used to facilitate data storage and retrieval, however, is much more difficult to build and requires vast amounts of time and resources to develop. Hence, Tokopedia currently works with cloud platform providers such as Google Cloud, Alibaba Cloud, Microsoft and Nvidia to support the country’s data science operations. Angus painted an interesting picture of a scene during one of the company’s shopping holidays, when engineers were gathered in a “war room” to constantly monitor to ensure that the company’s servers were online to support the traffic coming in from consumers. Apart from Tokopedia engineers, groups of engineers from the four different platform providers were present as well, huddled in their respective corners and working on their respective computers.

Finally, with respect to the people involved in doing data science for the company, Angus highlighted the fact that every problem is unique and needs to be solved by a team of people. The right team consists of the right people with the right mindset and who possesses the right skillset. Contrary to popular belief, it should not be a team of data scientists but instead data scientists working with engineers, product managers and designers, with each individual pitching in to collectively find a path towards a solution. The DNA of the company is to really focus on the end-user of the product, whether it be consumers or internal stakeholders, to have a growth mindset, and to make it happen by making it better. What this means to Angus, is to be positive and to constantly learn new things, apply that knowledge in an attempt to create a solution, and to iterate on that solution. It’s perfectly alright to make mistakes because it’s rarely the first iteration of a solution that really solves a problem, but improving and never making the same mistake twice. Hence, make it happen, and make it better.

To expand on that more, Angus touched on what data science is really like in the wild. In relation to a typical software development lifecycle that consists of 5 stages, plan, design, build, test and support, it’s not very clear at first glance where or how data science fits in. In fact, it is at the planning stage that the greatest difficulties lie, because there is too much uncertainty involved in data science problems. It is often the case that your understanding of the problem develops with time, and you might find yourself solving a different problem than what you thought you initially had. Hence, to paraphrase what Angus’ former colleague from Google said: in the planning stage, if you know when (you’re going to deliver the product), you cannot know what (you’re going to be able to deliver). If you know what, you cannot know when.

Other prominent challenges include testing data science solutions, because it’s basically impossible to enumerate all cases and contexts where the solution might be used. Different priorities for requirements of the solution might also exist between data scientists and product managers. Finally, it’s often difficult to progress from one iteration to the next for data science problems. Taking the case of programming a self-driving car for instance, changing the decision-making model of your car from one iteration to the next would render all the data gathered in the previous iteration useless, because they would no longer be relevant to the given model. Hence, managing and planning such shifts can be extremely challenging.

Wrapping things up, Angus has some advice for current students in analytics: as with many things in the past, there aren’t many people trained in the area of a new emerging technology, but that barrier to entry is slowly lowered with time as tools and software emerge that makes it easier. Hence, it might be good for students to pick up other skills, or mindsets along the way. One key aspect, is the spirit of engineering, which in the most basic sense is being able to work together with a large group of people in developing a single product, as opposed to doing your own individual research and problem solving. In fact, many researchers become too tunnel-visioned on their own research areas and forget to look out for developments happening in the industry and relevant trends.

Last updated on 16 Sep 2019 .