Bible.com Site Scraper

Site scraper script that can store Bible versions from Bible.com in a .csv to be used in OpenLP

Motivation:

At the Church I attend I am in charge of the media within the church. If we have to record or live stream or display content in-house I am the one who is in charge. The church originated in Ghana so the Twi language is quite prominent within the ministry. To display images, slideshows and bible verses on the screens we use a software called OpenLP. It is a very versatile worship presentation program. However, I came across a problem there were no Twi bibles versions on the software with the possibility to display. I did notice that there were ways to import your own bibles, through OSIS or .csv. This sparked an idea in my head as I noticed that bible.com a site and app created by YouVersion has 2,795 bible versions in 1,863 languages. I realised there must be a way to scrape this site for all the bible verses in a chosen version and then store that information in a .csv file to be uploaded to OpenLP.

Visit GitHub Page

Skills

Python
Site Scraping
Problem Solving

Process

TL;DR

First I began by inspecting the bible.com site.

After going through the site for a while I noticed that each bible version has its own specific ID and each chapter has its own code (e.g. GEN for Genesis). I realised that this information is in the URL of every chapter of every book. So I knew if I could find the code for a version I wanted and all the mini codes for each Book in the bible and how many chapters each respective Book had, I could iterate through every web page and scrape all the verses to make a complete bible.

The next step was to find where the information for the IDs and mini codes were stored. I scoured the source code of the site and found a JSON file that contained the IDs of each Bible version in a respective language.

E.g. This is the JSON file for the English language https://www.bible.com/json/bible/versions/eng

However, a way to find this ID file for a selected was not currently possible. After more digging, I came across another JSON file that contained all 1,863 languages on bible.com and their corresponding tags (https://www.bible.com/json/bible/languages?filter=). The tags in this file corresponded with the tag at the end of the JSON location that had the IDs of the different bible versions of a chosen language.

After finding this it meant that I had found a way to pick any Bible version that was available on the bible.com site.

As there were so many languages I knew I would need to use a GUI of some sort for it to be easy for the user to select a language that they want to get the verses for. This is where Tkinter came in. The Tkinter package was used to create a list where a user can search and select a language from the JSON file.

After this, a list would pop up in the command line (using the inquirer package) that would contain all the versions within the selected language. Once the version has been selected the corresponding ID would be stored.

Next the tags for each of the 66 books needed to be found and also the number of chapters in each of these books. This is done in the book_chapter_verse.py file. This JSON file (https://www.bible.com/json/bible/books/1?filter=) contains all the books in the bible and their codes. This enables me to iterate through each book’s JSON file which has the number of chapters in each book. This data is stored in an array.

Now I had a way to iterate through all the books and chapters in a respective bible version and I just needed a way to scrape the site and store the verses in a .csv

For this, I had to further search the source code and use BeautifulSoup4. Noticed that each verse was wrapped in a tag with a class tag that contained “verse v#” so the script iterates through every verse within each chapter and finds the tag that has a “content” class.

TL;DR

After inspecting the source code and linked JSON file the script was able to iterate a chosen Bible version in a chosen language and scrape the verses from every page required using BeautifulSoup4.

Project Outcome

Successful Script

The result of this was that I was successfully able to create a script that pulls any Bible version from bible.com and stores all the verses in a .csv which can be successfully imported into OpenLP.

Improvements

Future improvements could include using a more efficient way to search for selected items in both lists this will reduce the time it takes for the script to run.

Previous ProjectReal-Time Squat Analysis via Computer Vision
Next ProjectFusion 360 Coursera Intro Course

Site scraper script that can store Bible versions from Bible.com in a .csv to be used in OpenLP

Motivation:

Visit GitHub Page

Skills

Process

TL;DR

TL;DR

Project Outcome

Successful Script

Improvements

Previous ProjectReal-Time Squat Analysis via Computer Vision

Next ProjectFusion 360 Coursera Intro Course