Web Crawling & Indexing Engineer

Mistral AI

Mid-level

🇫🇷 France

🇬🇧 United Kingdom

👶Paid parental leave

Web Developer

Software development

🔥 Apply now

About Mistral

- At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world.

- Our mission is to make AI ubiquitous and open.

- We are creative, low-ego, team-spirited, and have been passionate about AI for years.

- We hire people that foster in competitive environments, because they find them more fun to work in.

- We hire passionate women and men from all over the world.

- Our teams are distributed between France, UK and USA

Role Summary

- We are seeking a skilled and motivated Web Crawling and Data Indexing Engineer to join our dynamic engineering team.

- The ideal candidate will have a strong background in web scraping, data extraction, and indexing, with a focus on leveraging advanced tools and technologies to gather and process large-scale data from various web sources.

- The role is based in Paris or London

Key Responsibilities

- Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites.

- Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.

- Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives.

- Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.

- Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks.

- Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process.

- Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.

Qualifications & profile

- Bachelor’s or master’s degree in computer science, information systems, or information technology

- Strong understanding of web technologies, data structures, and algorithms.

- They should have knowledge of database management systems and data warehousing.

- Programming Languages: Proficiency in programming languages such as Python, Java, or C++ is essential.

- Masterings of Web Technologies: Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites.

- Knowledge of HTTP and HTTPS protocols

- A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary

- Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data.

- Understanding distributed systems and technologies like Hadoop or Spark Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup

- Understanding how search engines work and how to optimize web crawling.

- Experience in Machine Learning to improve the efficiency and accuracy of web crawling

- Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data.

Benefits

- Daily lunch vouchers

- Contribution to a Gympass subscription

- Monthly contribution to a mobility pass

- Full health insurance for you and your family

- Generous parental leave policy

🔥 Apply now

Mistral AI

European company training large generative models for providing them to the industry.

Artificial Intelligence

Large Enterprise

Technology

mistral.ai

🏭technology, information and internet

🎂2023

Other jobs at Mistral AI

🇫🇷👶

Product Lead

🇫🇷

Brand Designer

🇫🇷👶

Technical Program Manager - Science Operations

🇫🇷👶

Technical Project Manager - Cloud

View all Mistral AI jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🇫🇷 France

🇬🇧 United Kingdom

Web Developer

No spam. No ads. Unsubscribe anytime.

Similar jobs

🇬🇧💰Added 5 days ago

Web Engineer

Deblock - Stealth mode startup building the future of Retail Banking at the crossroad of Fintech and Crypto. All Revolut & Ledger executives.

ReactTypeScriptweb toolingAgileWeb3Crypto

Remote🇬🇧💰👶🏖️Added 12 days ago

Web Developer

Paddle offers SaaS companies a completely different approach to their payment infrastructure. (technology, information and internet)

ReactCSSHTMLFigmaGSAPGitHeadless CMSSaaSJavascript

🇬🇧Added 14 days ago

CMS Engineer

Evelyn Partners - The UK’s leading integrated wealth management and professional services group, with over 186 years of experience in helping generations of people and businesses to thrive.

ASP.NetJavascriptReactResponsive Web DesignUmbracoAzureMVC Design PatternWeb APIsGitJira + 3

🇬🇧Added a month ago

CMS Engineer

Evelyn Partners - The UK’s leading integrated wealth management and professional services group, with over 186 years of experience in helping generations of people and businesses to thrive.

ASP.NetJavascriptReactUmbracoAzureMVCGitJiraCloudAgile

Remote🇬🇧Added 2 months ago

Apprentice Web Developer

Carbon Six Digital and Rubber Cheese - A group of agencies using marginal gains to continuously improve customer experience(it services and it consulting)

UmbracoWordpressHubspotAzureAWSAgileCloudCrypto

Web Crawling & Indexing Engineer

Mistral AI

LinkedIn

Other jobs at Mistral AI

Notifications about similar jobs

Similar jobs