Project Synopsis on DEVELOPING A CHATBOT USING SEQUENCE MODELLING Submitted as partial fulfillment for the award of
BACHELOR OF TECHNOLOGY DEGREE Session 2018-19 in
Information Technology By
DEEPANSHU SHARMA (1503213041) AYAAN SAMAD (1503213036) KASHISH SINGHAL (1503213054)
Under the guidance of Ms. Deepali Dev & Dr. Kanika Gupta
ABES ENGINEERING COLLEGE, GHAZIABAD
D TO U.P. TECHNICAL UNIVERSITY, LUCKNOW
1|P ag e
DECLARATION We hereby declare that the work being presented in this report entitled “Developing a Chatbot using Sequence Modelling” is an authentic record of our own work carried out under the supervision of Kanika Gupta.”
“Ms. Deepali Dev and Dr.
The matter embodied in this report has not been submitted by us for the award of any other degree.
Dated
Signature: Name: Department:
Dated :
Signature: Name: Department:
Dated :
Signature: Name: Department:
This is to certify that the above statement made by the candidate(s) is correct to the best of my knowledge.
Signature of HOD Prof.Amit Sinha IT Department
Signature of Supervisor Ms.Deepali Dev Assistant Professor IT Department
Signature of Supervisor Dr. Kanika Gupta Associate Professor IT Department Date............................
2|P ag e
ACKNOWLEDGEMENT We wish to extend our sincere gratitude to our project mentor, Ms.Deepali Dev and Dr. Kanika Gupta, Department of Information Technology for their valuable guidance and encouragement which has been absolutely helpful. We are indebted to Dr.Amit Sinha (HOD, Department of Information Technology) for his valuable .
Deepanshu Sharma 1503213041 Kashish Singhal 1503213054 Ayaan Samad 1503213036
3|P ag e
ABSTRACT Our project developing a Chatbot using sequence modelling focuses on developing a model which will be able to generate responses automatically to the questions asked using a number of different machine learning techniques. This document gives an introduction to the basic aspects of the proposed model. The proposed model may be used for order generating in restaurants or in call centre to deal with the common problems faced. We are pursuing our objective in three phases: Design, analysis and Implementation. For analysis page we have taken a large segment of data from social networking sites to generate the outcomes for commonly discussed problems. The parameters are prioritized based on the interpretation of this data. We have also planned to include algorithms to differentiate the data according to whether we can generate answer for that or we need to use the internet in the types of questions asked. The UI is also pursued from analysis point of view.
4|P ag e
TABLE OF CONTENTS Sr. No.
Content
Page No.
1
Introduction
7
2
Objective
8
3
Steps for model development
10
4.1
Methodology
11
4.2
Tools Used
11
4.3
Work breakdown structure
12
5.2
Processing and arranging data
14
5.3
Classification algorithms
15
6
Result and discussion
17
8
Conclusion
18
9
References
19
5|P ag e
LIST OF FIGURES:Sr.No.
Name
Page no.
1.1
Process flow diagram
8
1.2
Workflow diagram
11
1.3
Snapshot of collected data
12
1.4
Snapshot of code to collect data from facebook
13
1.5
Data entered by s with structure
14
1.6
K nearest algorithm structure
15
1.7
vector machine structure
16
1.8
test case 1 screenshot
17
1.9
test case 2 screenshot
17
6|P ag e
1) INTRODUCTIONA chatbot (also known as a talkbots, chatterbot, Bot, IM bot, interactive agent, or Artificial Conversational Entity) is a computer program or an artificial intelligence which conducts a conversation via auditory or textual methods. Such programs are often designed to convincingly simulate how a human would behave as a conversational partner, thereby ing the Turing test. Chatbots are typically used in dialog systems for various practical purposes including customer service or information acquisition. Some chatterbots use sophisticated natural language processing systems, but many simpler systems scan for keywords within the input, then pull a reply with the most matching keywords, or the most similar wording pattern, from a database. The term "ChatterBot" was originally coined by Michael Mauldin in 1994 to describe these conversational programs. Today, most chatbots are either accessed via virtual assistants such as Google Assistant and Amazon Alexa, via messaging apps such as Facebook Messenger or WeChat, or via individual organizations' apps and websites. Chatbots can be classified into usage categories such as conversational commerce (e-commerce via chat), analytics, communication, customer , design, developer tools, education, entertainment, finance, food, games, health, HR, marketing, news, personal, productivity, shopping, social, sports, travel and utilities. Chatbots can be added to a buddy list or provide a single game player with an entity to interact with while awaiting other "live" players. If the bot is sophisticated enough to the Turing test, the person may not even know they are interacting with a computer program. As consumers continue to move away from traditional forms of communication, chat-based communication methods are expected to rise. Chatbot-based virtual assistants are increasingly used to handle simple tasks, freeing human agents to focus on higher-profile service or sales cases. This leads to cost savings -- employees cost more -- and it also allows companies to provide a level of customer service during hours when live agents aren't available.
7|P ag e
2.OBJECTIVEOur objective is to construct a chatterbot such that it will provide responses for a person instead of people needing to attend the problems themselves and it will answer based on the data provided by the to avoid disturbance in many matters. PROCESS FLOW DIAGRAM
Fig 1: process flow diagram for chatbot 8|P ag e
An untrained instance of ChatterBot starts off with no knowledge of how to communicate. Each time a enters a statement, the library saves the text that they entered and the text that the statement was in response to. As ChatterBot receives more input the number of responses that it can reply and the accuracy of each response in relation to the input statement increase. The program selects the closest matching response by searching for the closest matching known statement that matches the input, it then chooses a response from the selection of known responses to that statement. We can teach chatbot by training it with examples of existing conversations. Example : bot.train([ 'How are you?', 'I am good.', 'That is good to hear.', 'Thank you', 'You are welcome.', ])
9|P ag e
2) STEPS FOR MODEL DEVELOPMENTThe process of creating a chatbot follows a pattern similar to the development of a web page or a mobile app. It can be divided into Design, Building, Analytics and Maintenance.
Design The chatbot design is the process that defines the interaction between the and the chatbot. The chatbot designer will define the chatbot personality, the questions that will be asked to the s, and the overall interaction. It can be viewed as a subset of the conversational design. In order to speed up this process, designers can use dedicated chatbot design tools that allow for immediate preview, team collaboration and video export. An important part of the chatbot design is also centered around testing. testing can be performed following the same principles that guide the testing of graphical interfaces.
Building The process of building a chatbot can be divided into two main tasks: understanding the 's intent and producing the correct answer. The first task involves understanding the input. In order to properly understand a input in a free text form, a Natural Language Processing Engine can be used. The second task may involve different approaches depending on the type of the response that the chatbot will generate.
Analytics The usage of the chatbot can be monitored in order to spot potential flaws or problems. It can also provide useful insights that can improve the final experience.
Maintenance To keep chatbots up to speed with changing company products and services, traditional chatbot development platforms require ongoing maintenance. This can either be in the form of an ongoing service provider or for larger enterprises in the form of an in-house chatbot training team. To eliminate these costs, some startups are experimenting with Artificial Intelligence to develop self-learning chatbots, particularly in Customer Service applications.
API's There are lots of API's available for building your own chatbot like Wikipedia api which helps us to get data from Wikipedia etc. 10 | P a g e
3) METHODOLOGYStart
Obtaining and preparing data to train Analyzing data for intents
Analyzing data to build answer system Deg interface
Analyzing relevant entities in questions and answers
Test the model
Stop Fig 1.2 Workflow Diagram
TOOLS USED Python: for developing algorithms and chatbot backend development Wikipedia API : for searching data on the internet and loading it in our chatbot
Chatterbot:
ChatterBot is a Python library that makes it easy to generate automated responses to a ’s input.
Php: for developing a social media website to pick data Sql: database Jquery: used for social networking site Html : for frontend development
11 | P a g e
5.Work breakdown structure 5.1) Data CollectionFor analysis, we needed a large data set with information on different topics so we can either create it or take it from social networking sites Let’s say we want to scrape the New York Times’ Facebook page. We would send a request to https://graph.facebook.com/v2.4/nytimes?access_token=XXXXX and we would get:
Fig 3: snapshot of collected data In this way we have collected data from different facebook pages to get good storage of data to run our chatbot
12 | P a g e
Fig4: code to scrap data from facebook
5.2PROCESSING AND PUTTING IT T OGE THER We just have to process each post. If you’re an avid Face book , you know that not all of these attributes are not guaranteed to exist. Status updates may not have text or links. Since we’re making a spreadsheet with an enforced schema, we need to validate that a field exists before attempting to process it. Now we have a full plan for scraping, we query each page of Facebook Page Statuses (100 statuses maximum per page), process all statuses on that page and writing the output to a CSV file, and navigate to the next page, and repeat until no more statuses left.
13 | P a g e
DATA FORMAT
Fig 1.5: data entered by s
5.3) Apply classification techniquesAlgorithms used to get the data on the basis of inputs: K-Nearest Neighbors Algorithm A type of supervised machine learning algorithm KNN is extremely easy to implement in its most basic form, and yet performs quite complex classification tasks. It is a lazy learning algorithm since it doesn't have a specialized training phase. Rather, it uses all of the data for training while classifying a new data point or instance.
14 | P a g e
KNN is a non-parametric learning algorithm, which means that it doesn't assume anything about the underlying data. This is an extremely useful feature since most of the real world data doesn't really follow any theoretical assumption e.g. linear-separability, uniform distribution, etc.
Fig 1.6: K NEAREST ALGORITHM
vector machines (SVMs) SVM’s are a set of supervised learning methods used for classification, regression and outliers detection. The advantages of vector machines are: Effective in high dimensional spaces. Still effective in cases where number of dimensions is greater than the number of samples. Uses a subset of training points in the decision function (called vectors), so it is also memory efficient. 15 | P a g e
Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.
Fig1.7: vector machine representation
16 | P a g e
6.RESULT AND DISCUSSION When the bot is unable to find the matching statement in its data set it returns the first line of data. This problem can be seen in the below two tests we performed. Even though we add large amount of data whenever we run out of data we can go to internet or use the Wikipedia api to get a part of data from Wikipedia as result on the asked question.
Fig 1.8: test case 1 screenshot When there are lots of matching for the same word we can use the regression algorithm to generate the output which has the maximum amount of hits in the past. We can also create a list of responses for a particular question to keep the chat interesting and keep changing the common correct answers. Example : when reply hello we can reply back with hi, hello, good morning etc and we can also initiate further Reponses like how are you ? Or what can I help you with? Etc.
Fig 1.9: test case 2 screenshot 17 | P a g e
7.CONCLUSIONWe have started collection data on different fields and have built a basic sequence modeling chatbot in linux. We have also started deg a small social networking site with limited features so as to show real time modification of data and improved accuracy in generating results We have a basic model developed and our applying algorithms on the model to test the best algorithm. We have also starting planning to create our own social media in php and directly read data from it and modify the data held previously. We have plans to add speech recognition using google api for speech to text conversion and will try to add it into our project We are also trying to use more api like Wolfram Alpha to make our search results come faster and complexity is reduced We are also trying to add a data classifier into our project so that when we read data we can solve the problems if they are related to maths instead of wasting time to search the whole database.
18 | P a g e
8.REFERENCES [1] BUILD BETTER CHATBOTS: A COMPLETE GUIDE TO GETTING STARTED WITH CHATBOTS by Anik Das [2] CHATBOT: Architecture, Design, & Development By Jack Cahn [3] https://en.wikipedia.org/wiki/Chatbot [4] https://www.skillshare.com
19 | P a g e