Spring Data + Solr Cloud + Zookeeper + MongoDB + Ubuntu Integration

Datetime:2016-08-23 02:01:31          Topic: Solr  MongoDB           Share

I had some years of experience with Solr version 4.x. But Solr version 5 and version 6 are much different from the older version. So I decided to refresh my knowledge by integrating Solr Cloud 6.0.1, MongoDB 3.2.7, Spring Data Solr, which is part of Spring Boot 1.4.0.M3.

My goal is to set up a clustered MongoDB to store the data, a Solr Cloud to index the data in MongoDB in real time. I will use mongo-connector to replicate data from MongoDB to Solr Cloud. Then, I will develop a simple Spring Data Solr application that talks to Solr Cloud to search the data in MongoDB.

Please feel free to give me any feedback or ask me any questions about it.  Hope this article is helpful for your next project.

Install JDK 1.8 If You Don't Have on Your Ubuntu:

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

Install MongoDB 3.2.12:

  1. Follow these instructions to install MongoDB on

Ubuntu https://docs.mongodb.com/v3.0/tutorial/install-mongodb-on-ubuntu/

  1. Edit /etc/mongod.conf to setup replica set. I just installed one instance of MongoDB. You can install multiple instances that share the same replica set.
# Where and how to store data.
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# network interfaces
net:
  port: 27017
  bindIp: 127.0.0.1

# replica sets

replication:
  replSetName: rs0
  1. Restart MongoDB
$ sudo service mongod restart
  1. Run mongo to connect mongod server, then run
$ mongo

rs.initiate()
rs.status()

You will see something like this:

{
    "set" : "rs0",
    "date" : ISODate("2016-06-14T07:41:49.307Z"),
    "myState" : 1,
    "term" : NumberLong(1),
    "heartbeatIntervalMillis" : NumberLong(2000),
    "members" : [
    {
        "_id" : 0,
        "name" : "ubuntu:27017",
        "health" : 1,
        "state" : 1,
        "stateStr" : "PRIMARY",
        "uptime" : 400763,
        "optime" : {
            "ts" : Timestamp(1465716890, 2227),
            "t" : NumberLong(1)
        },
        "optimeDate" : ISODate("2016-06-12T07:34:50Z"),
        "electionTime" : Timestamp(1465694725, 2),
        "electionDate" : ISODate("2016-06-12T01:25:25Z"),
        "configVersion" : 1,
        "self" : true
    }
    ],
    "ok" : 1
}
  1. Download sample MongoDB JSON data from here:

https://raw.githubusercontent.com/mongodb/docs-assets/primer-dataset/primer-dataset.json

  1. Import sample data into MongoDB:
$ mongoimport --db test --collection restaurant --drop --file ~/primer-dataset.json
  1. Run mongo, run the following to verify if you have sample data imported correctlly.
$mongo

use test
db.restaurant.findOne()
  1. The sample data will look like this:
{
    "_id" : ObjectId("575d1095ae7da76b8fb71b1a"),
    "address" : {
        "building" : "1007",
        "coord" : [
        -73.856077,
        40.848447
        ],
        "street" : "Morris Park Ave",
        "zipcode" : "10462"
    },
    "borough" : "Bronx",
    "cuisine" : "Bakery",
    "grades" : [
    {
        "date" : ISODate("2014-03-03T00:00:00Z"),
        "grade" : "A",
        "score" : 2
    },
    {
        "date" : ISODate("2013-09-11T00:00:00Z"),
        "grade" : "A",
        "score" : 6
    },
    {
        "date" : ISODate("2013-01-24T00:00:00Z"),
        "grade" : "A",
        "score" : 10
    },
    {
        "date" : ISODate("2011-11-23T00:00:00Z"),
        "grade" : "A",
        "score" : 9
    },
    {
        "date" : ISODate("2011-03-10T00:00:00Z"),
        "grade" : "B",
        "score" : 14
    }
    ],
    "name" : "Morris Park Bake Shop",
    "restaurant_id" : "30075445"
}

Install Zookeeper 3.4.6:

  1. Follow instruction of Setting Up a Single ZooKeeper from here (you can install multiple Zookeeper instances if you want):

https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble

  1. Start Zookeeper:
$ /opt/zookeeper-3.4.6/bin/zkServer.sh start/stop/restart

Add Solr Configuration Files to Zookeeper

  1. Download my sample code from:  https://github.com/lcbdl/resaurant.git
  2. It's a good idea to track the change of you configurations in a version control system so every time you update the files, you commit to both version control system and zookeeper.
  3. You have to have the following in solrconfig.xml:
<schemaFactory class="ClassicIndexSchemaFactory"/>

Solr 6 uses managed_schema by default. But to index MongoDB, we have to have a schema.xml.

So we need this to force Solr to use classic schema.xml:

<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />

mongo-connector uses this to retrieve schema.xml from solr

  1. schema.xml
......
   <!-- 
MongoDB use _id as the primary key name.
   -->
   <field name="_id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

   <!-- metadata used by mongo-connector -->
   <field name="_ts" type="long" indexed="true" stored="true" required="true" multiValued="false" />
   <field name="ns" type="string" indexed="true" stored="true" required="true" multiValued="false" />

   <!-- fields for the document -->
   <field name="restaurant_id" type="string" indexed="true" stored="true"/>
   <field name="name" type="text_general" indexed="true" stored="true"/>
   <field name="borough" type="string" indexed="true" stored="true"/>
   <field name="cuisine" type="string" indexed="true" stored="true"/>

   <field name="address.building" type="string" indexed="true" stored="true"/>
   <dynamicField name="address.coord.*" type="float" indexed="false" stored="false"/>
   <field name="address.street" type="string" indexed="true" stored="true"/>
   <field name="address.zipcode" type="string" indexed="true" stored="true"/>

   <dynamicField name="*.date" type="date" indexed="false" stored="false" />
   <dynamicField name="*.grade" type="string" indexed="false" stored="false" />
   <dynamicField name="*.score" type="int" indexed="false" stored="false" />

   <field name="grades.dates" type="date" indexed="true" stored="true" multiValued="true" />
   <field name="grades.grades" type="string" indexed="true" stored="true" multiValued="true" />
   <field name="grades.scores" type="int" indexed="true" stored="true" multiValued="true" />

   .......
   <!-- 
      Specify what field is the unique key in solr
   -->
   <uniqueKey>_id</uniqueKey>

   <copyField source="name" dest="text"/>
   <copyField source="borough" dest="text"/>
   <copyField source="cuisine" dest="text"/>
   <copyField source="address.building" dest="text"/>
   <copyField source="address.street" dest="text"/>
   <copyField source="address.zipcode" dest="text"/>
   <copyField source="address.coord.*" dest="text"/>

   <copyField source="*.date" dest="grades.dates" />
   <copyField source="*.grade" dest="grades.grades" />
   <copyField source="*.score" dest="grades.scores" />

   ......

It's really challenging to map array embedded JSON document in Solr. For the below example, I used:

<dynamicField name="*.date" type="date" indexed="false" stored="false" />

and:

<field name="grades.dates" type="date" indexed="true" stored="true" multiValued="true" />

then:

 <copyField source="*.date" dest="grades.dates" />

Solr only supports dynamic field name with either leading or trailing asterisks. It doesn't support something like grades.*.date.

"grades" : [
{
    "date" : ISODate("2014-03-03T00:00:00Z"),
    "grade" : "A",
    "score" : 2
},
{
    "date" : ISODate("2013-09-11T00:00:00Z"),
    "grade" : "A",
    "score" : 6
}]

Install Solr Cloud

  1. Download Solr-6.0.1.tgz file from:

http://www.us.apache.org/dist/lucene/solr/6.0.1/solr-6.0.1.tgz

$ wget http://www.us.apache.org/dist/lucene/solr/6.0.1/solr-6.0.1.tgz
  1. Extract the installation script from the tgz file:
$ tar xzf solr-6.0.1.tgz solr-6.0.1/bin/install_solr_service.sh --strip-components=2
  1. Install 2 Solr instances:
$ sudo ./install_solr_service.sh solr-6.0.1.tgz -i /opt -d /var/solr1 -u <username> -s solr1 -p 8983
$ sudo ./install_solr_service.sh solr-6.0.1.tgz -i /opt -d /var/solr2 -u <username> -s solr2 -p 8984
  1. Update /opt/solr-6.0.1/bin/solr.in.sh
$ vi /opt/solr-6.0.1/bin/solr.in.sh

Add the following line in it:

ZK_HOST=&quot;localhost:2181&quot;

  1. Start Solr instances:
$ sudo service solr1 start

$ sudo service solr2 start
  1. Check solr1 and solr2 status, and make sure you have output like below:
$ sudo service solr1 status
Found 1 Solr nodes:

Solr process 4922 running on port 8983
{
    "solr_home":"/var/solr1/data",
    "version":"6.0.1 c7510a0fdd93329ec04c853c8557f4a3f2309eaf - sarowe - 2016-05-23 19:40:37",
    "startTime":"2016-06-12T03:34:56.375Z",
    "uptime":"0 days, 21 hours, 37 minutes, 50 seconds",
    "memory":"214.7 MB (%43.8) of 490.7 MB",
    "cloud":{
        "ZooKeeper":"localhost:2181",
        "liveNodes":"2",
        "collections":"0"
    }
}
  1. Create collection 

Open your browser, and go to this URL: http://192.168.1.151:8983/solr/admin/collections?action=CREATE&name=restaurant&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=restaurant

Install Mongo-connector and Connect Solr and Mongodb

  1. Install pip if you don't have pip in your Ubuntu:
$ apt-get update

$ apt-get install python-pip
  1. Install mongo-connector:
$ pip install mongo-connector
  1. Run mongo-connector (You can create a ubuntu server and start when system start up).

Mongo-connector will populate all existing data from MongoDB to Solr, also update Solr data whenever the data in MongoDB changes.

$ mongo-connector --auto-commit-interval=0 -m localhost:27017 -t http://localhost:8983/solr/restaurant -d solr_doc_manager
  1. Test if you have data in Solr by access this URL, and you will see the result:

http://192.168.1.151:8983/solr/restaurant/select?indent=on&q=*:*&wt=json

{
    "responseHeader":{
        "zkConnected":true,
        "status":0,
        "QTime":19,
        "params":{
            "q":"*:*",
            "indent":"on",
            "wt":"json"}},
        "response":{"numFound":25359,"start":0,"maxScore":1.0,"docs":[
        {
            "grades.grades":["A",
            "A",
            "A",
            "B"],
            "restaurant_id":"40396126",
            "grades.scores":[7,
            21,
            7,
            2],
            "borough":"Manhattan",
            "address.street":"West   57 Street",
            "cuisine":"American ",
            "grades.dates":["2013-07-12T00:00:00Z",
            "2012-07-17T00:00:00Z",
            "2012-03-07T00:00:00Z",
            "2014-08-01T00:00:00Z"],
            "address.building":"205",
            "_ts":6295206107744831667,
            "address.zipcode":"10019",
            "ns":"test.restaurant",
            "name":"Europa Cafe",
            "_id":"575d1095ae7da76b8fb71f02",
            "_version_":1536925504605519872},
    ......

Create a Spring Data Solr Applciation to Run a Query Against Solr Cloud

  1. Download my sample code to you workspace folder:
$ git clone https://github.com/lcbdl/resaurant.git
  1. Import maven project into your Java IDE (I used Spring STS)
  2. Right click on RestaurantRepositoryTest.java, and Run As -> JUnit Test.  

You will see a good test result if everything is fine.

Key Components in the Code

You can find my full sample code here: https://github.com/lcbdl/resaurant

  1. Model class Restaurant.java:
@SolrDocument(solrCoreName="restaurant")  // Solr collection name
public class Restaurant {

    @Field("_id")                         // Specify field name in solr
    @Id                                   // This is required
    private String id;

    @Field("name")
    private String name;

    @Field("restaurant_id")
    private String restaurantId;

    @Field("borough")
    private String borough;

      ......
  1. Repository interface RestaurantRepository.java:
public interface RestaurantRepository extends SolrCrudRepository<Restaurant, String> {

    List<Restaurant> findByName(String name);

}
  1. Spring Boot Main Applciation Class:
@SpringBootApplication
public class RestaurantApplication extends SpringBootServletInitializer {

    @Override
    protected SpringApplicationBuilder configure(SpringApplicationBuilder application) {
        return application.sources(RestaurantApplication.class);
    }

    public static void main(String[] args) {
        SpringApplication.run(RestaurantApplication.class, args);
    }
}
  1. SpringSolrConfiguration.java:
@Configuration
@EnableSolrRepositories(basePackages = { "ca.knc.restaurant.repository.solr" }, multicoreSupport = true)
public class SpringSolrConfig {

    @Value("${spring.data.solr.zk-host}")
    private String zkHost;

    @Bean
    public SolrClient solrClient() {
        return new CloudSolrClient(zkHost);
    }

    @Bean
    public SolrTemplate solrTemplate(SolrClient solrClient) throws Exception {
        return new SolrTemplate(solrClient);
    }

}
  1. application.properties:
# SOLR (SolrProperties)
spring.data.solr.zk-host=192.168.1.151:2181
  1. RestaurantRepositoryTest.java
@RunWith(SpringJUnit4ClassRunner.class)
@SpringBootTest(classes = RestaurantApplication.class)
public class RestaurantRepositoryTest {

    @Autowired
    private RestaurantRepository restaurantRepository;

    @Test
    public void findByNameTest() {
        List<Restaurant> restaurants = restaurantRepository.findByName("Morris Park Bake Shop");
        assertNotNull(restaurants);
        assertTrue(restaurants.size() > 0);
        for (Restaurant r : restaurants) {
            System.out.println(r.toString());
        }
    }

}




About List