Comparing the Data Sizes of Malware Repositories
Introduction
This report looks at the large difference in the amount of data stored in the malware archives of vx-underground and VirusTotal.
Main Body
There is a huge difference in how much data these two malware repositories have collected. For example, the research group vx-underground claims to have about 30 terabytes of malware source code. On the other hand, Bernardo Quintero, the founder of VirusTotal, stated that his repository contains approximately 31 petabytes of samples provided by users. These datasets are essential for cybersecurity firms and AI researchers because they help improve detection tools and analyze how cyber-attacks change over time. To help visualize these sizes, we can imagine the data stored on standard 1-terabyte hard drives. In this scenario, the vx-underground archive would need 30 drives, creating a stack 30 inches high. However, the VirusTotal dataset would require 31,744 drives, reaching a total height of about 2,645 feet. Consequently, this stack would be almost as tall as the Burj Khalifa and more than twice as high as the Eiffel Tower.
Conclusion
The data shows a massive difference in scale, with VirusTotal holding a much larger volume of malware samples than vx-underground.
Learning
The Logic of Contrast: Moving Beyond 'But'
At an A2 level, we usually connect opposite ideas with a simple "but." However, to reach B2, you need Connectors of Contrast. These allow you to manage complex information more professionally.
1. The 'Flip' Phrase: On the other hand In the text, the author introduces vx-underground's data and then says: "On the other hand, Bernardo Quintero... stated..."
- When to use it: Use this when you are comparing two different facts or people. It signals to the reader: "I am finished with the first point; now look at the opposite point."
- B2 Tip: Always put a comma after this phrase.
2. The 'Result' Trigger: Consequently Notice how the text moves from the number of drives to the height of the tower using "Consequently."
- The Shift: A2 students use "so." B2 students use Consequently or Therefore.
- Meaning: It means "as a result of the things I just mentioned." It creates a logical chain of evidence.
3. Precise Comparison: Almost as... as Instead of saying "The stack is very tall," the author writes: "...this stack would be almost as tall as the Burj Khalifa."
- The Structure:
almost as+adjective+as+object. - Why it's B2: It shows you can compare two specific things using a scale, rather than just using basic adjectives like "big" or "huge."
Quick Upgrade Map
| A2 (Simple) | B2 (Advanced) |
|---|---|
| But | On the other hand / However |
| So | Consequently |
| Very tall | Almost as tall as [X] |