Week 5— MongoDB & Robo3T
Learning about the Basics of Databases and through Python
--
MongoDB is one of the most popular databases used by developers to store their data. However, one of its intrinsic downsides as a database is that it does not support a GUI (Graphic User Interface) for the user to visually see the data nor the modifications being made.
Yet, there is a solution: using robo3T, you can inspect the inside of the database. Before digging deeper into MongoDB and robo3T, we shall learn a bit about databases.
Databases
By and large, there are two types of databases: SQL and NoSQL. Refer to the image shown below.
RDBMS (SQL) is a database with a set structure that is unlikely (and very inconvenient to be modified). Imagine an excel file with predetermined labels for rows and columns. If you wanted to add an extra column for an additional variable at the 10th value of data, it would be very inconvenient to do so. Less, this formational organization is a benefit in the context of data categorization and analysis.
No-SQL (ex. MongoDB), on the other hand, is a database that stores data in the form of dictionaries. Therefore, the user is not anymore mandated to maintain the same format for each value, and this allows for freedom in data storage. However, as expected, conformity is lost.
Manipulating MongoDB Using Pymongo
Refer to the grammatical rules presented below:
from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client.dbsparta
# insert / find / update / delete
# insert - example
doc = {'name':'bobby','age':21}
db.users.insert_one(doc)
# find one - example
user = db.users.find_one({'name':'bobby'})
# find many - example (exclude _id)
same_ages = list(db.users.find({'age':21},{'_id':False}))
# update - example
db.users.update_one({'name':'bobby'},{'$set':{'age':19}})
# delete - example
db.users.delete_one({'name':'bobby'})
Application
Last time, we saw that data can be collected through Python as below:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('<https://movie.naver.com/movie/sdb/rank/ rmovie.nhn?sel=pnt&date=20200303>',headers=headers)
soup = BeautifulSoup(data.text, 'html.parser')
movies = soup.select('#old_content > table > tbody > tr')
for movie in movies:
a_tag = movie.select_one('td.title > div > a')
if a_tag is not None:
rank = movie.select_one('td:nth-child(1) > img')['alt']
title = a_tag.text
star = movie.select_one('td.point').text
print(rank,title,star)
Yet, this time, in order to store the result in the database, we will add the following component to the code above:
import requests
from bs4 import BeautifulSoupfrom pymongo import MongoClient # import pymongo
client = MongoClient('localhost', 27017) # mongoDB goes into 27017
db = client.dbsparta # make db named 'dbsparta'headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('<https://movie.naver.com/movie/sdb/rank/ rmovie.nhn?sel=pnt&date=20200303>',headers=headers)
soup = BeautifulSoup(data.text, 'html.parser')
movies = soup.select('#old_content > table > tbody > tr')
for movie in movies:
a_tag = movie.select_one('td.title > div > a')
if a_tag is not None:
rank = movie.select_one('td:nth-child(1) > img')['alt']
title = a_tag.text
star = movie.select_one('td.point').text
print(rank,title,star)
Next, we will create a separate doc for the variables:
import requests
from bs4 import BeautifulSoupfrom pymongo import MongoClient # import pymongo
client = MongoClient('localhost', 27017) # mongoDB goes into 27017
db = client.dbsparta # make db named 'dbsparta'headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('<https://movie.naver.com/movie/sdb/rank/ rmovie.nhn?sel=pnt&date=20200303>',headers=headers)
soup = BeautifulSoup(data.text, 'html.parser')
movies = soup.select('#old_content > table > tbody > tr')
for movie in movies:
a_tag = movie.select_one('td.title > div > a')
if a_tag is not None:
rank = movie.select_one('td:nth-child(1) > img')['alt']
title = a_tag.text
star = movie.select_one('td.point').text
doc = {
'rank' : rank,
'title' : title,
'star' : star
}
db.movies.insert_one(doc)
Now, it’s done! Let’s look at the result:
Beautiful. Now, we can also manipulate the data points using the commands introduced earlier!
Next time, we will be looking at Flask to build our own API. Excited? Me too! Then, see you next week!
Fin.